On Mon, Dec 24, 2012 at 01:20:34AM -0700, Thomas Fjellstrom wrote: > On Fri Dec 21, 2012, Thomas Fjellstrom wrote: > > I'm setting up a little home NAS here, and I've been thinking about using > > bcache to speed up the random access bits on the "big" raid6 array (7x2TB). > > > > How does one get started using bcache (custom patched kernel?), and what is > > the recommended setup for use with mdraid? I remember reading ages ago that > > it was recommended that each component device was attached directly to the > > cache, and then mdraid put on top, but a quick google suggests putting the > > cache on top of the raid instead. > > > > Also, is it possible to add a cache to an existing volume yet? I have a > > smaller array (7x1TB) that I wouldn't mind adding the cache layer to. > > I just tried a basic setup with the cache ontop of the raid6. I ran a quick > iozone test with the default debian sid (3.2.35) kernel, the bcache (3.2.28) > kernel without bcache enabled, and with bcache enabled (See below). > > Here's a little information: > > system info: > Intel S1200KP Motherboard > Intel Core i3 2120 CPU > 16GB DDR3 1333 ECC > IBM M1015 in IT mode > 7 x 2TB Seagate Barracuda HDDs > 1 x 240 GB Samsung 470 SSD > > > kernel: fresh git checkout of the bcache repo, 3.2.28 > > > > Raid Info: > /dev/md0: > Version : 1.2 > Creation Time : Sat Dec 22 03:38:05 2012 > Raid Level : raid6 > Array Size : 9766914560 (9314.46 GiB 10001.32 GB) > Used Dev Size : 1953382912 (1862.89 GiB 2000.26 GB) > Raid Devices : 7 > Total Devices : 7 > Persistence : Superblock is persistent > > Update Time : Mon Dec 24 00:22:28 2012 > State : clean > Active Devices : 7 > Working Devices : 7 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 512K > > Name : mrbig:0 (local to host mrbig) > UUID : 547c30d1:3af4b2ec:14712d0b:88e4337a > Events : 10591 > > Number Major Minor RaidDevice State > 0 8 0 0 active sync /dev/sda > 1 8 16 1 active sync /dev/sdb > 2 8 32 2 active sync /dev/sdc > 3 8 48 3 active sync /dev/sdd > 4 8 80 4 active sync /dev/sdf > 5 8 96 5 active sync /dev/sdg > 6 8 112 6 active sync /dev/sdh > > > > > Fs info: > root@mrbig:~/build/bcache-tools# xfs_info /dev/bcache0 > meta-data=/dev/bcache0 isize=256 agcount=10, agsize=268435328 blks > = sectsz=512 attr=2 > data = bsize=4096 blocks=2441728638, imaxpct=5 > = sunit=128 swidth=640 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal bsize=4096 blocks=521728, version=2 > = sectsz=512 sunit=8 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > > > > iozone -a -s 32G -r 8M > random random bkwd record stride > KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread > w/o cache (debian kernel 3.2.35-1) > 33554432 8192 212507 210382 630327 630852 372807 161710 388319 4922757 617347 210642 217122 717279 716150 > w/ cache (bcache git kernel 3.2.28): > 33554432 8192 248376 231717 268560 269966 123718 132210 148030 4888983 152240 230099 238223 276254 282441 > w/o cache (bcache git kernel 3.2.28): > 33554432 8192 277607 259159 709837 702192 399889 151629 399779 4846688 655210 251297 245953 783930 778595 > > Note: I disabled the cache before the last test, unregistered the device and > "stop"ed the cache. I also changed the config slightly for the bcache kernel, > I started out with the debian config, and then switched the preemption option > to server, which may be the reason for the performance difference between the > two non cached tests. > > I probably messed up the setup somehow. If anyone has some tips or suggestions > I'd appreciate some input. So you probably didn't put bcache in writeback mode, which would explain the write numbers being slightly worse. Something I noticed myself with bcache on top of a raid6 is that in writeback mode sequential write throughput was significantly worse - due to the ssd not having as much write bandwidth as the raid6 and bcache's writeback having no knowledge of the stripe layout. This is something I'd like to fix, if I ever get time. Normal operation (i.e. with mostly random writes) was vastly improved, though. Not sure why your read numbers are worse, though - I haven't used iozone myself so I'm not sure what exactly it's doing. It'd useful to know what iozone's reads look like - how many in flight at a time, how big they are, etc. I suppose it'd be informative to have a benchmark where bcache is enabled but all the reads are cache misses, and bcache isn't writing any of the cache misses to the cache. I think I'd need to add another cache mode for that, though (call it "readaround" I suppose). I wouldn't worry _too_ much about iozone's numbers, I suspect whatever it's doing differently to get such bad read numbers isn't terribly representative. I'd benchmark whatever you're using the server for, if you can. Still be good to know what's going on, there's certainly something that ought to be fixed. Oh, one thing that comes to mind - there's an issue with pure read workloads in the current stable branch, where inserting data from a cache miss will fail to update the index if the btree node is full (but after the data has been written to the cache). This shows up in benchmarks, because they tend to test reads and writes separately, but it's not an issue in any real world workload I know of because any amount of write traffic keeps it from showing up, as the btree nodes will split when necessary on writes. I have a fix for this in the dev branch, and I think it's stable but the dev branch needs more testing. -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html