On Sat, Dec 21, 2013 at 05:18:42AM -0600, Stan Hoeppner wrote: > I renamed the subject as your question doesn't really apply to XFS, or > the OP, but to md-RAID. > > On 12/20/2013 4:43 PM, Arkadiusz Miśkiewicz wrote: > > > I wonder why kernel is giving defaults that everyone repeatly recommends to > > change/increase? Has anyone tried to bugreport that for stripe_cache_size > > case? > > The answer is balancing default md-RAID5/6 write performance against > kernel RAM consumption, with more weight given to the latter. The formula: > > ((4096*stripe_cache_size)*num_drives)= RAM consumed for stripe cache > > High stripe_cache_size values will cause the kernel to eat non trivial > amounts of RAM for the stripe cache buffer. This table demonstrates the > effect today for typical RAID5/6 disk counts. > > stripe_cache_size drives RAM consumed > 256 4 4 MB > 8 8 MB > 16 16 MB > 512 4 8 MB > 8 16 MB > 16 32 MB > 1024 4 16 MB > 8 32 MB > 16 64 MB > 2048 4 32 MB > 8 64 MB > 16 128 MB > 4096 4 64 MB > 8 128 MB > 16 256 MB > > The powers that be, Linus in particular, are not fond of default > settings that create a lot of kernel memory structures. The default > md-RAID5/6 stripe_cache-size yields 1MB consumed per member device. > > With SSDs becoming mainstream, and becoming ever faster, at some point > the md-RAID5/6 architecture will have to be redesigned because of the > memory footprint required for performance. Currently the required size > of the stripe cache appears directly proportional to the aggregate write > throughput of the RAID devices. Thus the optimal value will vary > greatly from one system to another depending on the throughput of the > drives. > > For example, I assisted a user with 5x Intel SSDs back in January and > his system required 4096, or 80MB of RAM for stripe cache, to reach > maximum write throughput of the devices. This yielded 600MB/s or 60% > greater throughput than 2048, or 40MB RAM for cache. In his case 60MB > more RAM than the default was well worth the increase as the machine was > an iSCSI target server with 8GB RAM. > > In the previous case with 5x rust RAID6 the 2048 value seemed optimal > (though not yet verified), requiring 40MB less RAM than the 5x Intel > SSDs. For a 3 modern rust RAID5 the default of 256, or 3MB, is close to > optimal but maybe a little low. Consider that 256 has been the default > for a very long time, and was selected back when average drive > throughput was much much lower, as in 50MB/s or less, SSDs hadn't yet > been invented, and system memories were much smaller. > > Due to the massive difference in throughput between rust and SSD, any > meaningful change in the default really requires new code to sniff out > what type of devices constitute the array, if that's possible, and it > probably isn't, and set a lowish default accordingly. Again, SSDs > didn't exist when md-RAID was coded, nor when this default was set, and > this throws a big monkey wrench into these spokes. Hi Stan, nice analytical report, as usual... My dumb suggestion would be to simply use udev to setup the drives. Everything, stripe_cache, read_ahead, stcerr, etc. can be configured, I suppose, by udev rules. bye, -- piergiorgio -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html