On Tue, May 13, 2014 at 10:58 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > > Note that there was a patch a week or so ago fixing a problem in RBD when > running ZFS on top of RBD (IIRC it had to do with length 0 bios or > something?). Was this ZFS on RBD or ZFS underneat the ceph-osd daemons? > Sorry if I was unclear on this. ZFS was running underneath the ceph-osds. Today we switched to ext4 instead. More specifically we where interested in the compression features of ZFS since people claim it can both increase throughput, lower latency and lower disk usage with no significant drawbacks. This is because LZO (and other algos of the same class, like google snappy) immediately aborts if the data doesn't look trivially compressible which make it comparable in speed to memcpy. It would be interesting to look at experimentally supporting this in the osds themselves. It could have theoretical huge benefits when storing disk images and large binary files which tend to contain a lot of uninitialized regions full of zeroes. Let's take a super biased example, a really fresh mysql innodb database with wordpress installed: > $ent ibdata1 > Entropy = 0.950773 bits per byte. > Optimum compression would reduce the size > of this 18874368 byte file by 88 percent. > Chi square distribution for 18874368 samples is 3847986113.90, and randomly > would exceed this value 0.01 percent of the times. > Arithmetic mean value of data bytes is 16.0581 (127.5 = random). > Monte Carlo value for Pi is 3.832257589 (error 21.98 percent). > Serial correlation coefficient is 0.972634 (totally uncorrelated = 0.0). --------- As you can see, according to ent this reference innodb data file is very compressable. Standard LZO on a tmpfs: > $time lzop ibdata1 > 0.01user 0.00system 0:00.04elapsed 35%CPU (0avgtext+0avgdata 1160maxresident)k > 0inputs+0outputs (0major+372minor)pagefaults 0swaps > $ls -l > 18874368 May 13 23:20 ibdata1 > 536126 May 13 23:20 ibdata1.lzo The time to compress is even hard to measure for 19 MB of data because it's so fast the signal disappears in the noise. This reply got a bit long but I wanted to share some thoughts I had. Thank you for your time, Hannes -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html