On Wed, Dec 21 2016, Jure Erznožnik wrote: > Mr Brown, > > Let me begin with: please give me your paypal address or something so > that I can at least buy you a beer or something Thanks :-) My first inclination is to say "no thanks" as I am very adequately compensated by SUSE, and it is part of my role at SUSE to ensure the upstream kernel remains healthy. Encouraging a health community is part of that (and I often learn something while helping people fix things). But my second inclination is to recognize that gratitude is an important part of human interactions, and that a community is strong when gratitude is appropriately given and received. It is not my place to direct others how they should show gratitude. So I'll tell you my paypal address is neil@xxxxxxxxxx and that I'm more likely to enjoy hot chocolate than beer, but I'll also emphasize that there is no expectation attached to this information. :-) > > > Your analysis and discovery that iSCSI is the origin of writes got me > thinking: how can he see that on md0 device if that device has two > more layers (bcache + LVM) before iSCSI even comes into play. Maybe > the system propagates the origin down the block devices or something, > totally not relevant here. So I embarked on a journey of total data > destruction by disabling one layer at a time. I started by simply > detaching bcache as that was the first thing on the list - and was > non-destructive to boot :) > > I have found the culprit: > It is bcache that does the one second writes. I have yet to find the > exact parameters that influence this behaviour, but the output of > writeback_rate_debug is EXTREMELY clear: it's writing a bit of data > each second, reducing the dirty cache by that tiny amount. This is > what causes the write "amplification" resulting in clicks long after a > write has been done - because bcache only writes tiny amounts each > second instead of flushing the entire cache at once when the time > comes. Now that we have an understanding of what is happening, I can recommend that you increase /sys/block/md0/md/safe_mode_delay. It is measured in seconds. If you make it larger than the period of the bcache writes, it should stop the 'ticking' you mentioned. NeilBrown
Attachment:
signature.asc
Description: PGP signature