RE: zsmalloc/lzo compressibility vs entropy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Dan Magenheimer
> Sent: Wednesday, March 27, 2013 3:42 PM
> To: Seth Jennings; Konrad Wilk; Minchan Kim; Bob Liu; Robert Jennings; Nitin Gupta; Wanpeng Li; Andrew
> Morton; Mel Gorman
> Cc: linux-mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: zsmalloc/lzo compressibility vs entropy
> 
> This might be obvious to those of you who are better
> mathematicians than I, but I ran some experiments
> to confirm the relationship between entropy and compressibility
> and thought I should report the results to the list.

A few new observations worth mentioning:

Since Seth long ago mentioned that the text of Moby Dick
resulted in poor (but not horribly poor) compression I thought
I'd look at some ASCII data.

I used the first sentence of the Gettysburg Address (91 characters)
and repeated it to fill a page.  Interestingly, LZO apparently
discovered the repetition... the page compressed to 118 bytes
even though the result had 15618 one-bits (fairly high entropy).

I used the full Gettysburg Address (1459 characters), again
repeated to fill a page.  LZO compressed this to 1070 bytes.
(14568 one-bits.)

To fill a page with text, I added part of the Declaration of
Independence.  No repeating text now.  This only compressed
to 2754 bytes (which, I assume, is close to Seth's observations
on Moby Dick).  14819 one-bits.

Last (for swap), to see if random ascii would compress better
than binary, I masked off the MSB in each byte of a random
page.  The mean zsize was 4116 bytes (larger than a page)
with a stddev of 51.  The one-bit mean was 14336 (7/16 of a page).

On a completely different track, I thought it would be relevant
to look at the difference between frontswap (anonymous) page
zsize distribution and cleancache (file) page zsize distribution.

Running kernbench, zsize mean was 1974 (stddev 895).

For a different benchmark, I did:

# find / | grep3

where grep3 is a simple bash script that does three separate
greps on the first argument.  Since this fills the page cache
and causes reclaiming, and reclaims are captured by cleancache
and fed to zcache, this data page stream approximates random
pages on the disk.

This "benchmark" generated a zsize mean of 2265 with stddev 1008.
Also of note: Only a fraction of a percent of cleancache pages
are zero-filled, so Wanpeng's zcache patch to handle zero-filled
pages more efficiently is very good for frontswap pages but may
have little benefit for cleancache pages.

Bottom line conclusions:  (1) Entropy is probably less a factor
for LZO-compressibility than data repetition. (2) Cleancache
data pages may have a very different zsize distribution than
frontswap data pages, anecdotally skewed to much higher zsize.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]