Looping everyone else in here on some simple write performance tests on top of btrfs. 4mb writes, journal on ssd. Seekwatcher movies at http://nhm.ceph.com/movies/btrfs-dd-experiment/ Four tests here. Raw disk bw is around 135 MB/sec. > >> a: cluster > >> - flusher on > >> - no vm tuning > >> - initially fast, then crazy slow, then ... > >> - commit/sync every ~5s Super fast at first, then a long lull where we're waiting for btrfs snap to complete, but disk tput is like 15 MB/sec (from iostat). But in the movie it looks like and sequential.. maybe the ios are too small or badly aligned/timed or something? Need to look at the raw blktrace data to see where the seeks are. This may be a problem in commit_transaction (fs/btrfs/transaction.c) doing the flushing inefficiently? > >> b: much much later > >> - consistent 40 MB/sec 20 minutes later, same run... things settle into 40 MB/sec. Can clearly see the funny gaps Yehuda was talking about I suspect the seekiness there is appending to the pg logs? Maybe we can verify that somehow? Hopefully we can convince btrfs we don't care about fragmentation of those files (basically never read) and write them more efficiently. > >> c: > >> - no ceph cluster, just a for loop: > >> for f in `seq 1 10000` ; do dd if=/dev/zero of=$f bs=4M count=1 ; attr -s > >> foo -V bar $f ; echo asdf >> foo.txt ; attr -s foo -V $f foo.txt ; done > >> and > >> while true ; do sync ; sleep 5 ; done > >> - 20 MB dirty_bytes limit (so that VM does writeout quickly) > >> -> 60 MB/sec (initially fast, though, more like 120 MB/sec) Weird we don't see the seeks here.. maybe they're just closer than in b? Also, here i have one "log" (foo.txt) vs dozens of them (for each pg) in b. This clearly points to improvements to be had in btrfs (it's not all our fault). Should repeat this test on XFS. > >> d: > >> - filestore flusher = false > >> - 20mb dirty_bytes > >> -> 55 MB/sec, slowly dropping This is letting the VM do the writeout. Presumably those dots are the pg logs again, but written out less politely. Wonder why... c and d aren't *too* far off, which suggests this is mostly a btrfs allocation issue. But a vs b tells me there is also something unfortunately going on with how we throttle the io coming in and manage the flushing. :/ Let's get a simple workload reproducer together here (either workloadgen or a simple script) that we can harass linux-btrfs@vger with? sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html