Re: [RFC]: Pack-file object format for individual objects (Was: Revisiting large binary files issue.)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Tue, 11 Jul 2006, Linus Torvalds wrote:
> 
>  - for low bits: CM (compression method):
> 
>         "This identifies the compression method used in the file. CM = 8
>          denotes the "deflate" compression method with a window size up
>          to 32K.  This is the method used by gzip and PNG (see
>          references [1] and [2] in Chapter 3, below, for the reference
>          documents).  CM = 15 is reserved.  It might be used in a future
>          version of this specification to indicate the presence of an
>          extra field before the compressed data."
> 
>  - four high bits are CINFO: 
> 
>         "For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
>          size, minus eight (CINFO=7 indicates a 32K window size). Values
>          of CINFO above 7 are not allowed in this version of the
>          specification.  CINFO is not defined in this specification for
>          CM not equal to 8."
> 
> so 0x78 means "deflate with 32kB window size", but I don't see anything 
> guaranteeing that we might not see something else for an object that 
> cannot be compressed, for example.

Ahh. Looking at the zlib sources, I see

    /* Write the zlib header */
    if (s->status == INIT_STATE) {

        uInt header = (Z_DEFLATED + ((s->w_bits-8)<<4)) << 8;
        uInt level_flags = (s->level-1) >> 1;
     
        if (level_flags > 3) level_flags = 3;
        header |= (level_flags << 6);
        if (s->strstart != 0) header |= PRESET_DICT;
        header += 31 - (header % 31);

        s->status = BUSY_STATE;
        putShortMSB(s, header);

(which is that first 16-bit word, MSB first). So we'll always have the 
Z-DEFLATED (8) there in the low four bits, but the high nybble will be 
"s->w_bits-8" where w_bits comes from windowBits, and I think we can 
depend on it beign 15:

    "The windowBits parameter is the base two logarithm of the window size
   (the size of the history buffer).  It should be in the range 8..15 for this
   version of the library. Larger values of this parameter result in better
   compression at the expense of memory usage. The default value is 15 if
   deflateInit is used instead."

so since we use deflateInit(), we know the window will be 15.

So I guess we _can_ depend on the first byte being 0x78 for our use.

Goodie.

		Linus
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]