Show HN: XPress Compress v1.0 – A config-less compression algorithm

kstenerud · on March 1, 2019

"What makes xPress stand out from the crowd is that no configuration data gets embedded into an .xpr archive. So it doesn't matter what version, OS, or architecture, you used to compress your file."

I don't understand this. For any archive format in modern use, it doesn't matter what version, os, or architecture was used to compress the file. I can decompress a zip or lha file created on an Amiga 500 no problem, 30 years after the fact. Is there something I'm missing?

zelon88 · on March 1, 2019

Sorry, the wording on the repo could probably be revised.

The zip header is a required part of the archive, and without it the rest of the data could be lost forever. .xpr archives have no header and (currently) no offsets. The compression settings can be inferred from the data inside the archive itself, making headers that describe the data unnecessary.

And while it's true that you can decompress old archives with new hardware, you can't always decompress a new archive on outdated software. Sometimes even with up-to-date software it's possible to sometimes create an archive with 7z on Linux that WinRar on Windows will not be able to open.

ComputerGuru · on March 1, 2019

I don't get it - this says it's new, but MS has had an LZW-based compression algorithm named Xpress since forever, easily found by googling. Is this really just a very bad name clash?

https://docs.microsoft.com/en-us/openspecs/windows_protocols...

zelon88 · on March 1, 2019

This is just an unfortunate coincidence. I should have been more diligent. This started as a learning experiment that I honestly never thought would make it this far.

dataflow · on March 1, 2019

Yeah I assumed it was too. It didn't make sense otherwise.

gliptic · on March 1, 2019

"The xPress algorighm is very similar to LZW, but currently not as efficient."

LZW is not a good compression algorithm in this day and age. Why was this chosen?

zelon88 · on March 1, 2019

Honestly the xPress algo came before the comparison to LZW. There was a comment by someone else that made me notice the similarities.

I made this to learn about compression. Partly to see if it was possible and partly to learn more about what makes compression work. I wasn't specifically trying to "beat" anything currently on the market. Just learn and see if there's any potential here.

aaaaaaaaaaab · on March 1, 2019

>Decompression requires nothing special configuration-wise. The dictLength is inferred during decompression. This means any config settings are decompressible without knowing anything about how the file was compressed.

I don't understand this. Can you show an example where an XPress compressed file is better in terms of portability than a ZIP file?

zelon88 · on March 1, 2019

Some zip clients may write archives using invalid settings that render the file un-recoverable by other utilities.

For example, check out this cheat sheet... (http://kb.winzip.com/kb/entry/313/)

None of that data goes into an .xpr archive. So there's no ambiguity between different clients about how to compress or extract a file. There is only one way to decompress ANY .xpr archive; to search the file for instances of the dictIndex and replace them with the corresponding data. When you run out of matches; your file is rebuilt.

benj111 · on March 1, 2019

Look up option parsing libraries (not sure what the state of the art is in python at the moment).

Options shouldn't be order dependant like that.

Have you got any comparisons to other algorithms

zelon88 · on March 1, 2019

I cringe everytime I look at that code block.

I eventually want to put that code into a loop. That will probably come at the same time as relative path support.

Just know that I'm not proud of argument handling in its current form and improvements are on my radar.

benj111 · on March 1, 2019

Yes I thought it was an interesting code block :)

Good on you for releasing it though.

I would seriously look at using a library rather than rolling your own. Sorting all the edge cases is a pain, they can automate help messages etc, gnu style long options etc. Unless of course you want to delve into the weeds of option parsing.

WorldMaker · on March 1, 2019

Since Python 3.2 argparse has been in the Python Standard Library: https://docs.python.org/3/library/argparse.html