On Mon, Oct 17, 2022 at 12:51:25AM +0000, brian m. carlson wrote: > > but if I instead do "seq 10000", then the files differ. I didn't dig > > into the actual binary to see the source of the change. It might be > > something we can tweak (e.g., if it's how a header is represented, or if > > we can change the zlib parameters to find the same compressions). > > I will say that trying to make two compression implementations produce > identical output is likely futile because it's almost always the case > that there are multiple identical ways to encode the same data. Most > implementations are going to prefer improving size over consistency, so > there's little incentive to copy the same algorithm across > implementations. I believe even GNU gzip has changed its output in the > past as better optimizations were implemented. > > I mean, don't let me stop you from trying to tweak things to see if you > can make it work, but in general I think it's likely that some > divergence is going to occur between implementations no matter what. Yeah, I definitely don't think it's something we ought to be promising, or do put a lot of work into. But if there's low-hanging fruit to reduce immediate pain in practice, it seems worth considering. -Peff