On Thu, Jan 23, 2014 at 06:24:39AM -0600, Stan Hoeppner wrote: > > In case you don't believe me, I just switched my drives from the PMP to > > directly connected to the motherboard and a marvel card, and my rebuild > > speed changed from 19MB/s to 99MB/s. > > (I made no other setting changes, but I did try your changes without > > saving them before and after the PMP change and will report below) > > Why would you assume I wouldn't believe you? You seemed incredulous that PMPs could make things so slow :) > > Thanks for that one. > > It made no speed difference on the PMP or without, but can't hurt to do anyway. > > If you're not writing it won't. The problem here is that you're > apparently using a non-destructive resync as a performance benchmark. > Don't do that. It's representative of nothing but read-only resync speed. Let me think about this: the resync is done at build array time. If all the drives are full of 0's indeed there will be nothing to write. Given that, I think you're right. > Increasing stripe_cache_size above the default as I suggested will > ALWAYS increase write speed, often by a factor of 2-3x or more on modern > hardware. It should speed up destructive resyncs considerably, as well > as normal write IO. Once your array has settled down after the inits > and resyncs and what not, run some parallel FIO write tests with the > default of 256 and then with 2048. You can try 4096 as well, but with 5 > rusty drives 4096 will probably cause a slight tailing off of > throughput. 2048 should be your sweet spot. You can also just time a > few large parallel file copies. You'll be amazed at the gains. Will do, thanks. > The reason is simply that the default of 256 was selected some ~10 years > ago when disks were much slower. Increasing this default has been a > topic of much discussion recently, because bumping it up increases > throughput for everyone, substantially, even with 3 disk RAID5 arrays. Great to hear that the default may hopefully be increased for all. > > As you did point out, the array will be faster when I use it because the > > encryption will be sharded over my CPUs, but rebuilding is going to create 5 encryption > > threads whereas if md5 is first and encryption is on top, rebuilds do > > not involve any encryption on CPU. > > > > So it depends what's more important. > > Yep. If you post what CPU you're using I can probably give you a good > idea if one core is sufficient for dmcrypt. Oh, I did forget to post that. That server is a low power-ish dual core with 4 HT units: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i3-2100T CPU @ 2.50GHz stepping : 7 microcode : 0x28 cpu MHz : 2500.000 cache size : 3072 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave avx lahf_lm arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid bogomips : 5150.14 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: > I'll also reiterate that encrypting a 16TB array device is silly when > you can simply carve off an LV for files that need to be encrypted, and > run dmcrypt only against that LV. You can always expand an LV. This is > a huge performance win for all other files, such your media collections, > which don't need to be encrypted. I use btrfs for LV management, so it's easier to encrypt the entire pool. I also encrypt any data on any drive at this point, kind of like I wash my hands. I'm not saying it's the right thing to do for all, but it's my personal choice. I've seen too many drives end up on ebay with data, and I don't want to have to worry about this later, or even erasing my own drives before sending them back to warranty, especially in cases where maybe I can't erase them, but the manufacturer can read them anyway. You get the idea... I've used LVM for too many years (15 was it?) and I'm happy to switch away now :) (I know thin snapshots were recently added, but basically I've been not super happy with LVM performance, and LVM snapshots have been abysmal if you keep them long term). Also, this is off topic here, but I like the fact that I can compute snapshot diffs with btfrs and use that for super fast backups of changed blocks instead of a very slow rsync that has to scan millions of inodes (which is what I've been doing so far). > >> Question #2: > >> In order to copy data from a working system, I connected the drives via an external > >> enclosure which uses a SATA PMP. As a result, things are slow: > >> > >> md5 : active raid5 dm-7[5] dm-6[3] dm-5[2] dm-4[1] dm-2[0] > >> 15627526144 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/4] [UUUU_] > >> [>....................] recovery = 0.9% (35709052/3906881536) finish=3406.6min speed=18939K/sec > >> bitmap: 0/30 pages [0KB], 65536KB chunk > >> > >> 2.5 days for an init or rebuild is going to be painful. > > With stripe_cache_size=2048 this should drop from 2.5 days to less than > a day. It didn't since it PMP limited, but I made that change for the other reasons you suggested. > > Still curious on this: if the drives are brand new, is it safe to assume > > t> hey're full of 0's and tell mdadm to skip the re-init? > > (parity of X x 0 = 0) > > No, for a few reasons: > > 1. Because not all bits are always 0 out of the factory. > 2. Bad sectors may exist and need to be discovered/remapped > 3. With the increased stripe_cache_size, and if your CPU turns out to > be fast enough for dmcrypt in front of md, resync speed won't be as much > of an issue, eliminating your motivation for skipping the init. All fair points, thanks for explaining. For now, I put dmcrypt on top of md5, I get 100MB/s raw block write speed (actually just writing a big file in btrfs and going through all the layers) even though it's only using one CPU thread for encryption instead of 2 or more if each disk were encrypted under the md5 layer. Since 100MB/s was also the resync speed I was getting without encryption involved, looks like a single CPU thread can keep up with the raw IO of the array, so I guess I'll leave things that way. As another test gargamel:/mnt/btrfs_pool1# dd if=/dev/md5 of=/dev/null bs=1M count=1024 1073741824 bytes (1.1 GB) copied, 9.78191 s, 110 MB/s So it looks like 100-110MB/s is the read and write speed limit of that array. The drives are rated for 150MB/s each so I'm not too sure which limit I'm hitting, but 100MB/s is fast enough for my intended use. Thanks for you answers again, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html