Re: clean/smudge filters for pdf files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Leo Razoumov venit, vidit, dixit 24.10.2008 03:40:
> On 10/23/08, Pierre Habouzit <madcoder@xxxxxxxxxx> wrote:
>> On Thu, Oct 23, 2008 at 07:44:39PM +0000, Leo Razoumov wrote:
>>  > I am trying to improve storage efficiency for PDF files in a git repo.
>>  > Following earlier discussions in this list I am trying to set up
>>  > proper clean/smudge filters. What follows is my current setup
>>  >
>>  > # in ~/.gitconfig
>>  > [filter "pdf"]
>>  >       clean  = "pdftk - output - uncompress"
>>  >       smudge = "pdftk - output - compress"
>>  >
>>  > # in .gitattributes
>>  > *.pdf filter=pdf
>>  >
>>  > Unfortunately, it seems as though that pdftk uncompress followed by
>>  > pdftk compress do not leave the file invariant. I tried several
>>  > uncompress+compress iterations and the file still keep changing (the
>>  > size though stays the same).
>>  > Is there any other alternative way to store PDF files in git repo more
>>  > efficiently?
>>  > Any alternative to pdftk on Linux?
>>
>>
>> actually it uses some kind of zlib algorithm so that's pretty normal you
>>  don't have the same result with a packer. Maybe one could write a tool
>>  like pristine-tar for that purpose.
>>
> 
> With zlib you get the same deterministic result as long as you use the
> same zlib packer and unpacker. With pdftk compress/uncompress seem not
> to form a bijection pair. This issue was briefly discussed on this
> list back in April 2008 but no resolution emerged.

For a different file format I use the pair "gzip -c, gunzip -c" without
any problems, so zlib is not a problem. I do see the effect that
checkouts on different machines may have different compressed files
(same gzip version), but this is a non-issue.

Your experience with pdftk confirms mine. It shuffles things around
becauses it parses the files into objects and then writes them out again
in possibly different order. This is no problem for pdf because it uses
"pointers" (it's a bijection up to reordering), but it's a weird design,
and complicates things for us.

I'm still looking for something viable, I'll let list know when I've
found something...

Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux