May I ask this another way: echo 65536 > /sys/block/bcache0/bcache/writeback_rate This causes bcache to flush dirty data pretty much instantly thereby eliminating the issue. After that writeback_rate reverts back to 512. Other than creating a cron job, is there any way I could set minimum writeback_rate value to 65536? Thanks, Jure On Thu, Dec 22, 2016 at 8:25 AM, Jure Erznožnik <jure.erznoznik@xxxxxxxxx> wrote: >>>We could have better hysteresis though, so we're not doing that slow steady trickle of writes. > > There is nothing between /dev/md0 and /dev/bcache0, the entire array > is cached, no partitions. LVM is set up on top of bcache and iostat > shows "first" traffic at the /dev/md0. While the trickle is going on, > there's no traffic on bcache device or LVM partitions. I have now > modified sequential_cutoff to ensure that everything is cached (though > an 800K write was cached even before). > > I have documented some logs in the original post here: > http://unix.stackexchange.com/questions/329477/what-is-grinding-my-hdds-and-how-do-i-stop-it > > If this is not the source of tiny writes to the array, can you suggest > where else I could start looking? > > Thanks, > Jure > > On Wed, Dec 21, 2016 at 11:22 PM, Kent Overstreet > <kent.overstreet@xxxxxxxxx> wrote: >> On Wed, Dec 21, 2016 at 02:36:02PM +0100, Jure Erznožnik wrote: >>> Hello, >>> >>> I apologise if this is something known, but my searching across the >>> internet has revealed no answer for my issue, so I am attempting to >>> find one here. >>> >>> uname -a: Linux htpc 4.8.0-32-generic #34-Ubuntu SMP Tue Dec 13 >>> 14:30:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux >>> bcache-tools version: 1.0.8-2 (as provided in ubuntu yakkety apt repository) >>> >>> I have placed bcache in writeback mode over an mdadm array, followed >>> by LVM and actual volumes that are then used by various services. The >>> problem I'm experiencing is that for every write I make into the >>> array, bcache makes periodic writes every second a few KB (less than >>> 20KB/s) to the backing device. >>> >>> All bcache parameters are at default, here I list the writeback relevant ones: >>> writeback_delay=30 >>> writeback_percent=10 >>> writeback_rate=512 (reverts to soon 512 even if changed) >>> writeback_rate_d_term=30 >>> writeback_rate_p_term_inverse=6000 >>> writeback_rate_update_seconds=5 >>> writeback_running=5 >>> >>> I don't see how writeback would be running every second, except if >>> that's implied by writeback_rate. Increasing that to a large value >>> temporarily causes the cache to flush much faster thus reducing the >>> number of disk "clicks". It reverts to 512 again as soon as dirty_data >>> goes below the large value. >>> >>> looking at writeback_rate_debug when the one-second flushes start, I >>> can see that a few kilobytes are being flushed each second. Values of >>> "writeback_rate_debug->dirty" field during one such session: 880k, >>> 784k, 624k, 524k, 460k, 408k, 300k, 160k, 128k (128k remains and >>> doesn't get flushed) >>> >>> I'm not sure what size one block is, but I configured the cache device >>> with 4KB block size, so here's what I expected to happen: >>> 30 seconds after the 880k write to disk, writeback should trigger and >>> write up to 512*4KB = 2MB of data to the disk. Since the write was >>> only 880k, that would be written in one go. Instead I got at least 8 >>> writes, each with only a few kilobytes. >>> >>> I have three questions about this: >>> 1. What am I missing? Why does the data get flushed so slowly? These >>> flushes can take hours for larger writes causing the disks to >>> constantly work with only kilobytes per second. >> >> It's because when writeback_percent is nonzero, we try to keep some amount of >> dirty data in the cache: the assumption is that recent writes are more likely to >> either be overwritten, or to have new data written that's contiguous or nearly >> contiguous, so we'll do less work if we delay writeback. >> >> We could have better hysteresis though, so we're not doing that slow steady >> trickle of writes. >> >>> 2. I'd like bcache to flush the dirty data (entirely) ASAP after the >>> writeback_delay. How can I tell it to do that? >> >> Set writeback_percent to 0. >> >> The downside though is that scanning for dirty data when there's very little >> dirty data is expensive, and we have to block foreground writes while we're >> scanning - so doing that will adversely affect performance. >> >>> 3. Is it possible to configure it such that the flushing would only >>> take place if backing device wasn't under heavy read use at the time? >>> I don't mind dirty data residing on SSD if that allows for faster >>> overall operation. >> >> Unfortunately, we don't have anything like that implemented. >> >> That would be a really nice feature, but it'd be difficult to get right, since >> it requires knowing the future (if we issue this write, will it end up blocking >> a read? To answer that, we have to know if a read is going to come in before the >> write completes). We can guess - we can estimate how much read traffic is going >> to come in in the next few seconds based on how much read traffic we've seen >> recently, on the assumption that read traffic is bursty - on timescales long >> enough to be useful - and not completely random. However, this would mean we'd >> be adding yet another feedback control loop to writeback - such things are >> tricky to get right, and adding another would make the overall behaviour of >> writeback even more complicated and difficult to understand and debug. >> >> Ideally, we'd be able to just issue writeback writes with an appropriate IO >> priority and the IO scheduler would just do the right thing - it just wouldn't >> issue writeback writes if there was a higher priority read to be issued (that >> is, any foreground read). >> >> Unfortunately, this doesn't work in practice because of the writeback caching >> that disk drives do: the (kernel side) IO scheduler has no ability to schedule >> writes because writes just go into the disks's write cache, and then the disk >> itself schedules it later (and the disk has no knowledge of IO priorities). -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html