Re: [PATCH 1/2] staging: Add Snappy compression library

Zeev Tarantov <zeev.tarantov@xxxxxxxxx> · Wed, 20 Apr 2011 05:01:24 +0300

On Wed, Apr 20, 2011 at 04:09, Nitin Gupta <ngupta@xxxxxxxxxx> wrote:
> On 04/19/2011 08:01 PM, Zeev Tarantov wrote:
>> On Tue, Apr 19, 2011 at 14:31, Nitin Gupta<ngupta@xxxxxxxxxx>  wrote:
>>> I'm in the process of writing a simple test for all these algorithms:
>>>
>>> http://code.google.com/p/compcache/source/browse/sub-projects/fstats/fstats.c
>>>
>>> This compresses input file page-by-page and dumps compressed sizes and
>>> time
>>> taken to compress. With this data, I hope to come up with an adaptive
>>> scheme
>>> which uses different compressors for different pages such that overall
>>> compression ratio stays good while not hogging CPU like when zlib is used
>>> alone.
>>
>> I have extended the block compressor tester I've written for Dan
>> Magenheimer
>>
>> (http://driverdev.linuxdriverproject.org/pipermail/devel/2011-April/015127.html)
>> to show this data.
>> It compresses a file one page at a time, computes a simple histogram
>> of compressed size, keeps elapsed cpu time and writes the compressed
>> blocks (and index) so the original file can be restored (to prevent
>> cheating, basically).
>> Code: https://github.com/zeevt/csnappy/blob/master/block_compressor.c
>> Results:
>> https://github.com/zeevt/csnappy/blob/master/block_compressor_benchmark.txt
>>
>> Because people don't click links, results inlined:

> fstats now does all this and gnuplot does histogram :)
> Anyways, I don't have any issues with links and don't have any problems copy
> pasting your stuff here if needed for context. Still, when posting your
> patches, it would be better to keep some of these performance numbers in the
> patch description.

fstats in mercurial is at the original version, with no snappy support
and with zlib reallocating memory inside the timed inner loop.

Anyway, are the benchmarks I posted enough? Should I use more
different kinds of data? What kind of data do people want to compress
in ram?
Would making a mode of the tester that accepts pid instead of path and
compresses the read-only mapped pages in /proc/<pid>/maps be more
interesting?

>> What do you think of the submitted patch as -is?
>
> csnappy as a compile time option is definitely welcome.
>
> Manual runtime switching is not really needed -- instead of more knobs for
> this, it would be better to have some simple scheme to automatically switch
> between available compressors.

Would you please (test and) ack this, then?
http://driverdev.linuxdriverproject.org/pipermail/devel/2011-April/015126.html
Or does it need changes?

RE: adaptive compression method.
What kind of simple scheme did you have in mind? I will gladly prototype it.

If you're serious about resorting to zlib is some cases, maybe it
should be tuned for in-memory compression instead of archiving: The
streaming interface slows it down. It has adler32 checksum zram
doesn't need. The defaults are not tuned for 4KB input
(max_lazy_match, good_match, nice_match, max_chain_length).

> Thanks,
> Nitin

Thank you,
-Z.T.
_______________________________________________
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxx
http://driverdev.linuxdriverproject.org/mailman/listinfo/devel