I have submitted a kernel port of Google's Snappy compression library: http://driverdev.linuxdriverproject.org/pipermail/devel/2011-April/015122.html http://driverdev.linuxdriverproject.org/pipermail/devel/2011-April/015126.html It is significantly (x4) faster than the LZO code currently in the kernel. LZO 2.05 was recently released. I assume the kernel will upgrade when the kernel port is ready. The port is not completely trivial because of the use of unaligned memory, endianness and bitops. This version introduced the following optimizations first introduced in Snappy: 1. 64 bit memory access 2. unaligned multibyte memory access 3. 32bit multiplication in hash function 4. ctz or clz of xor for determining match length 5. when compressing, skip matching to previously seen bytes a single byte of input for every 32 incompressible bytes of input seen. Updated benchmark results, Google's Snappy test suite: testdata/alice29.txt : ZLIB: [b 1M] bytes 152089 -> 54404 35.8% comp 9.8 MB/s uncomp 138.0 MB/s LZO204: [b 1M] bytes 152089 -> 82691 54.4% comp 64.6 MB/s uncomp 206.3 MB/s LZO205: [b 1M] bytes 152089 -> 87825 57.7% comp 175.4 MB/s uncomp 240.0 MB/s CSNAPPY: [b 1M] bytes 152089 -> 90965 59.8% comp 173.7 MB/s uncomp 409.6 MB/s SNAPPY: [b 4M] bytes 152089 -> 90965 59.8% comp 174.9 MB/s uncomp 401.6 MB/s testdata/asyoulik.txt : ZLIB: [b 1M] bytes 125179 -> 48897 39.1% comp 9.0 MB/s uncomp 131.0 MB/s LZO204: [b 1M] bytes 125179 -> 73217 58.5% comp 59.6 MB/s uncomp 202.1 MB/s LZO205: [b 1M] bytes 125179 -> 77041 61.5% comp 164.4 MB/s uncomp 237.4 MB/s CSNAPPY: [b 1M] bytes 125179 -> 80207 64.1% comp 163.6 MB/s uncomp 387.7 MB/s SNAPPY: [b 4M] bytes 125179 -> 80207 64.1% comp 164.6 MB/s uncomp 378.9 MB/s testdata/cp.html : ZLIB: [b 1M] bytes 24603 -> 7961 32.4% comp 23.0 MB/s uncomp 142.0 MB/s LZO204: [b 1M] bytes 24603 -> 11621 47.2% comp 66.8 MB/s uncomp 300.0 MB/s LZO205: [b 1M] bytes 24603 -> 11909 48.4% comp 218.1 MB/s uncomp 336.9 MB/s CSNAPPY: [b 1M] bytes 24603 -> 11838 48.1% comp 228.9 MB/s uncomp 548.1 MB/s SNAPPY: [b 4M] bytes 24603 -> 11838 48.1% comp 227.6 MB/s uncomp 523.3 MB/s testdata/fields.c : ZLIB: [b 1M] bytes 11150 -> 3122 28.0% comp 25.2 MB/s uncomp 147.5 MB/s LZO204: [b 1M] bytes 11150 -> 4663 41.8% comp 86.2 MB/s uncomp 304.5 MB/s LZO205: [b 1M] bytes 11150 -> 4711 42.3% comp 253.3 MB/s uncomp 346.1 MB/s CSNAPPY: [b 1M] bytes 11150 -> 4728 42.4% comp 251.7 MB/s uncomp 536.5 MB/s SNAPPY: [b 4M] bytes 11150 -> 4728 42.4% comp 249.6 MB/s uncomp 515.2 MB/s testdata/geo.protodata : ZLIB: [b 1M] bytes 118588 -> 15131 12.8% comp 43.2 MB/s uncomp 310.1 MB/s LZO204: [b 1M] bytes 118588 -> 20026 16.9% comp 150.2 MB/s uncomp 639.7 MB/s LZO205: [b 1M] bytes 118588 -> 23965 20.2% comp 487.6 MB/s uncomp 705.7 MB/s CSNAPPY: [b 1M] bytes 118588 -> 27459 23.2% comp 469.0 MB/s uncomp 985.8 MB/s SNAPPY: [b 4M] bytes 118588 -> 27459 23.2% comp 466.1 MB/s uncomp 954.6 MB/s testdata/grammar.lsp : ZLIB: [b 1M] bytes 3721 -> 1222 32.8% comp 24.0 MB/s uncomp 109.3 MB/s LZO204: [b 1M] bytes 3721 -> 1781 47.9% comp 79.2 MB/s uncomp 360.8 MB/s LZO205: [b 1M] bytes 3721 -> 1811 48.7% comp 232.3 MB/s uncomp 442.2 MB/s CSNAPPY: [b 1M] bytes 3721 -> 1800 48.4% comp 257.6 MB/s uncomp 612.8 MB/s SNAPPY: [b 4M] bytes 3721 -> 1800 48.4% comp 250.1 MB/s uncomp 570.9 MB/s testdata/house.jpg : ZLIB: [b 1M] bytes 126958 -> 126513 99.6% comp 19.0 MB/s uncomp 231.8 MB/s LZO204: [b 1M] bytes 126958 -> 127173 100.2% comp 23.5 MB/s uncomp 1635.4 MB/s LZO205: [b 1M] bytes 126958 -> 127303 100.3% comp 1051.1 MB/s uncomp 3762.4 MB/s CSNAPPY: [b 1M] bytes 126958 -> 126803 99.9% comp 2365.1 MB/s uncomp 8190.2 MB/s SNAPPY: [b 4M] bytes 126958 -> 126803 99.9% comp 2326.8 MB/s uncomp 8402.5 MB/s testdata/html : ZLIB: [b 1M] bytes 102400 -> 13699 13.4% comp 35.6 MB/s uncomp 273.4 MB/s LZO204: [b 1M] bytes 102400 -> 21027 20.5% comp 135.7 MB/s uncomp 494.3 MB/s LZO205: [b 1M] bytes 102400 -> 22547 22.0% comp 421.6 MB/s uncomp 557.5 MB/s CSNAPPY: [b 1M] bytes 102400 -> 24140 23.6% comp 425.8 MB/s uncomp 873.0 MB/s SNAPPY: [b 4M] bytes 102400 -> 24140 23.6% comp 422.9 MB/s uncomp 845.4 MB/s testdata/html_x_4 : ZLIB: [b 1M] bytes 409600 -> 53367 13.0% comp 32.1 MB/s uncomp 277.7 MB/s LZO204: [b 1M] bytes 409600 -> 82980 20.3% comp 143.3 MB/s uncomp 487.0 MB/s LZO205: [b 1M] bytes 409600 -> 89475 21.8% comp 428.2 MB/s uncomp 556.1 MB/s CSNAPPY: [b 1M] bytes 409600 -> 96472 23.6% comp 423.4 MB/s uncomp 870.8 MB/s SNAPPY: [b 4M] bytes 409600 -> 96472 23.6% comp 418.3 MB/s uncomp 830.5 MB/s testdata/kennedy.xls : ZLIB: [b 1M] bytes 1029744 -> 203992 19.8% comp 15.8 MB/s uncomp 230.0 MB/s LZO204: [b 1M] bytes 1029744 -> 357315 34.7% comp 159.1 MB/s uncomp 624.6 MB/s LZO205: [b 1M] bytes 1029744 -> 362984 35.2% comp 413.2 MB/s uncomp 736.1 MB/s CSNAPPY: [b 1M] bytes 1029744 -> 425735 41.3% comp 354.9 MB/s uncomp 564.4 MB/s SNAPPY: [b 4M] bytes 1029744 -> 425735 41.3% comp 350.0 MB/s uncomp 513.0 MB/s testdata/kppkn.gtb : ZLIB: [b 1M] bytes 184320 -> 38751 21.0% comp 7.2 MB/s uncomp 180.9 MB/s LZO204: [b 1M] bytes 184320 -> 71671 38.9% comp 98.6 MB/s uncomp 274.8 MB/s LZO205: [b 1M] bytes 184320 -> 71445 38.8% comp 295.0 MB/s uncomp 321.9 MB/s CSNAPPY: [b 1M] bytes 184320 -> 70535 38.3% comp 271.8 MB/s uncomp 483.8 MB/s SNAPPY: [b 4M] bytes 184320 -> 70535 38.3% comp 273.9 MB/s uncomp 464.5 MB/s testdata/lcet10.txt : ZLIB: [b 1M] bytes 426754 -> 144904 34.0% comp 10.0 MB/s uncomp 142.8 MB/s LZO204: [b 1M] bytes 426754 -> 221290 51.9% comp 67.3 MB/s uncomp 212.3 MB/s LZO205: [b 1M] bytes 426754 -> 236699 55.5% comp 182.2 MB/s uncomp 248.3 MB/s CSNAPPY: [b 1M] bytes 426754 -> 243710 57.1% comp 181.7 MB/s uncomp 437.4 MB/s SNAPPY: [b 4M] bytes 426754 -> 243710 57.1% comp 183.0 MB/s uncomp 428.3 MB/s testdata/mapreduce-osdi-1.pdf : ZLIB: [b 1M] bytes 94330 -> 74928 79.4% comp 22.4 MB/s uncomp 177.9 MB/s LZO204: [b 1M] bytes 94330 -> 76999 81.6% comp 29.0 MB/s uncomp 938.7 MB/s LZO205: [b 1M] bytes 94330 -> 94704 100.4% comp 1057.4 MB/s uncomp 3974.6 MB/s CSNAPPY: [b 1M] bytes 94330 -> 77477 82.1% comp 833.6 MB/s uncomp 2115.4 MB/s SNAPPY: [b 4M] bytes 94330 -> 77477 82.1% comp 832.2 MB/s uncomp 1997.5 MB/s testdata/plrabn12.txt : ZLIB: [b 1M] bytes 481861 -> 195261 40.5% comp 7.5 MB/s uncomp 130.1 MB/s LZO204: [b 1M] bytes 481861 -> 294610 61.1% comp 59.1 MB/s uncomp 192.3 MB/s LZO205: [b 1M] bytes 481861 -> 314012 65.2% comp 155.7 MB/s uncomp 229.7 MB/s CSNAPPY: [b 1M] bytes 481861 -> 329339 68.3% comp 153.4 MB/s uncomp 363.5 MB/s SNAPPY: [b 4M] bytes 481861 -> 329339 68.3% comp 154.5 MB/s uncomp 354.9 MB/s testdata/ptt5 : ZLIB: [b 1M] bytes 513216 -> 56465 11.0% comp 25.8 MB/s uncomp 269.0 MB/s LZO204: [b 1M] bytes 513216 -> 86232 16.8% comp 139.7 MB/s uncomp 590.6 MB/s LZO205: [b 1M] bytes 513216 -> 87278 17.0% comp 551.6 MB/s uncomp 667.6 MB/s CSNAPPY: [b 1M] bytes 513216 -> 93455 18.2% comp 555.0 MB/s uncomp 845.6 MB/s SNAPPY: [b 4M] bytes 513216 -> 93455 18.2% comp 553.1 MB/s uncomp 795.0 MB/s testdata/sum : ZLIB: [b 1M] bytes 38240 -> 12990 34.0% comp 13.9 MB/s uncomp 144.6 MB/s LZO204: [b 1M] bytes 38240 -> 17686 46.2% comp 67.1 MB/s uncomp 311.0 MB/s LZO205: [b 1M] bytes 38240 -> 18086 47.3% comp 230.6 MB/s uncomp 373.5 MB/s CSNAPPY: [b 1M] bytes 38240 -> 19837 51.9% comp 228.7 MB/s uncomp 513.1 MB/s SNAPPY: [b 4M] bytes 38240 -> 19837 51.9% comp 226.7 MB/s uncomp 479.2 MB/s testdata/urls.10K : ZLIB: [b 1M] bytes 702087 -> 222613 31.7% comp 18.2 MB/s uncomp 160.0 MB/s LZO204: [b 1M] bytes 702087 -> 309320 44.1% comp 64.5 MB/s uncomp 309.2 MB/s LZO205: [b 1M] bytes 702087 -> 345814 49.3% comp 226.3 MB/s uncomp 376.5 MB/s CSNAPPY: [b 1M] bytes 702087 -> 357267 50.9% comp 240.1 MB/s uncomp 645.5 MB/s SNAPPY: [b 4M] bytes 702087 -> 357267 50.9% comp 239.3 MB/s uncomp 598.7 MB/s testdata/xargs.1 : ZLIB: [b 1M] bytes 4227 -> 1736 41.1% comp 23.2 MB/s uncomp 104.0 MB/s LZO204: [b 1M] bytes 4227 -> 2450 58.0% comp 65.2 MB/s uncomp 333.1 MB/s LZO205: [b 1M] bytes 4227 -> 2468 58.4% comp 192.3 MB/s uncomp 392.1 MB/s CSNAPPY: [b 1M] bytes 4227 -> 2509 59.4% comp 215.9 MB/s uncomp 499.1 MB/s SNAPPY: [b 4M] bytes 4227 -> 2509 59.4% comp 208.7 MB/s uncomp 477.0 MB/s These show that Snappy is ~50% faster than LZO while decompressing but when compressing they are about the same. LZO lost some of the compression ratio advantage: 2.05 is at about the half point between 2.04 and Snappy. My block compressor, working on 4KB at a time (simulating zram), on some big files from my /usr directory: compressing: /usr/lib64/chromium-browser/chrome compressor: SNAPPY #pages: 10392 > 100% :341 > 50% :8445 <= 50% :1606 0.174652181 seconds ratio: 27932299 * 100 / 42562848 = 65 % compressor: LZO #pages: 10392 > 100% :495 > 50% :8080 <= 50% :1817 0.220447504 seconds ratio: 27150908 * 100 / 42562848 = 63 % compressor: ZLIB #pages: 10392 > 100% :0 > 50% :5800 <= 50% :4592 2.395360610 seconds ratio: 20904235 * 100 / 42562848 = 49 % compressing: /usr/lib64/qt4/libQtWebKit.so.4.7.2 compressor: SNAPPY #pages: 5342 > 100% :219 > 50% :3405 <= 50% :1718 0.080079531 seconds ratio: 13290800 * 100 / 21877760 = 60 % compressor: LZO #pages: 5342 > 100% :272 > 50% :3281 <= 50% :1789 0.100200702 seconds ratio: 12737811 * 100 / 21877760 = 58 % compressor: ZLIB #pages: 5342 > 100% :142 > 50% :2464 <= 50% :2736 1.147235809 seconds ratio: 9903402 * 100 / 21877760 = 45 % compressing: /usr/lib64/llvm/libLLVM-2.9.so compressor: SNAPPY #pages: 3472 > 100% :44 > 50% :2384 <= 50% :1044 0.055121943 seconds ratio: 8493554 * 100 / 14219992 = 59 % compressor: LZO #pages: 3472 > 100% :53 > 50% :2355 <= 50% :1064 0.068662186 seconds ratio: 8213334 * 100 / 14219992 = 57 % compressor: ZLIB #pages: 3472 > 100% :12 > 50% :1728 <= 50% :1732 0.766150075 seconds ratio: 6221694 * 100 / 14219992 = 43 % compressing: /usr/lib64/xulrunner-2.0/libxul.so compressor: SNAPPY #pages: 7187 > 100% :229 > 50% :4432 <= 50% :2526 0.108149693 seconds ratio: 17455680 * 100 / 29433888 = 59 % compressor: LZO #pages: 7187 > 100% :253 > 50% :4287 <= 50% :2647 0.135244136 seconds ratio: 16596460 * 100 / 29433888 = 56 % compressor: ZLIB #pages: 7187 > 100% :1 > 50% :3021 <= 50% :4165 1.610910737 seconds ratio: 12248775 * 100 / 29433888 = 41 % compressing: /usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.1-pre9999/cc1 compressor: SNAPPY #pages: 3608 > 100% :68 > 50% :2168 <= 50% :1372 0.056032193 seconds ratio: 7728033 * 100 / 14775120 = 52 % compressor: LZO #pages: 3608 > 100% :72 > 50% :1975 <= 50% :1561 0.069680830 seconds ratio: 7384676 * 100 / 14775120 = 49 % compressor: ZLIB #pages: 3608 > 100% :2 > 50% :306 <= 50% :3300 0.789069806 seconds ratio: 5493265 * 100 / 14775120 = 37 % compressing: /usr/lib64/libnvidia-glcore.so.270.41.03 compressor: SNAPPY #pages: 6710 > 100% :74 > 50% :2614 <= 50% :4022 0.084111724 seconds ratio: 12860385 * 100 / 27481328 = 46 % compressor: LZO #pages: 6710 > 100% :89 > 50% :2436 <= 50% :4185 0.103006618 seconds ratio: 12051888 * 100 / 27481328 = 43 % compressor: ZLIB #pages: 6710 > 100% :1 > 50% :1633 <= 50% :5076 1.216785009 seconds ratio: 8641291 * 100 / 27481328 = 31 % compressing: /usr/lib64/gcc/x86_64-pc-linux-gnu/4.6.1-pre9999/libgcj.so.12.0.0 compressor: SNAPPY #pages: 15133 > 100% :190 > 50% :5105 <= 50% :9838 0.193854352 seconds ratio: 27131163 * 100 / 61982968 = 43 % compressor: LZO #pages: 15133 > 100% :201 > 50% :4323 <= 50% :10609 0.235593989 seconds ratio: 24944283 * 100 / 61982968 = 40 % compressor: ZLIB #pages: 15133 > 100% :63 > 50% :317 <= 50% :14753 2.943011502 seconds ratio: 18266667 * 100 / 61982968 = 29 % compressing: /usr/lib64/libwireshark.so.0.0.1 compressor: SNAPPY #pages: 11341 > 100% :64 > 50% :2982 <= 50% :8295 0.130238274 seconds ratio: 19576418 * 100 / 46449592 = 42 % compressor: LZO #pages: 11341 > 100% :86 > 50% :2565 <= 50% :8690 0.157854033 seconds ratio: 17765477 * 100 / 46449592 = 38 % compressor: ZLIB #pages: 11341 > 100% :1 > 50% :1219 <= 50% :10121 2.020140289 seconds ratio: 12565102 * 100 / 46449592 = 27 % compressing: /usr/share/icons/oxygen/icon-theme.cache compressor: SNAPPY #pages: 43411 > 100% :0 > 50% :7777 <= 50% :35634 0.441581102 seconds ratio: 60247441 * 100 / 177810480 = 33 % compressor: LZO #pages: 43411 > 100% :31 > 50% :7801 <= 50% :35579 0.547072992 seconds ratio: 59064132 * 100 / 177810480 = 33 % compressor: ZLIB #pages: 43411 > 100% :0 > 50% :2464 <= 50% :40947 6.256616084 seconds ratio: 42305375 * 100 / 177810480 = 23 % This shows Snappy in the configuration for zram (4KB at a time, 8KB working memory) is 20% faster than LZO 2.05 while achieving compression ratios less than 3% worse. It seems LZO now has a state of the art LZ implementation optimized for the currently popular platform. Benchmarks on other architectures, for both Snappy and the new LZO code are welcome. In light of these developments I agree that upgrading to Snappy is not worth the potential trouble, though it is faster and is tested in kernel-space on ppc32 and arm (in qemu). I would like to thank Nitin Gupta for his ack: http://driverdev.linuxdriverproject.org/pipermail/devel/2011-April/015546.html If there is interest in merging Snappy, I am more than willing to continue working to address any issues anyone cares to raise (currently I am aware of none). Zram still needs a faster entropy-coder than zlib. Maybe something can be done. -Z.T. _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/devel