On Wed, Sep 20, 2017 at 3:13 AM Coly Li <i@xxxxxxx> wrote: > > So my question is, do you observe 8-120 times more real writeback > throughput in period of every 10 seconds ? In real world workloads I end up with no keys of length less than 4096. Do you? I would like the minimum writeback rate to be a realistic value that may actually be attained. Right now, in practice, bcache writes **much faster**. Previously, you were concerned that setting the target for number of writes would result in increased backing disk workload--- the new values are based on measurement that in all cases this code writes back slower and loads the disk less than the old code, so it should mitigate your concern. > Again, I'd like to see exact data here, because this patch is about > performance tuning. OK. Run iostat 1 on any workload. See if you ever see a write rate of less than 4k/sec to the backing disk. I've run a variety of workloads and never have. I can send you very long iostat 1 logs if it would help ;) When the system is writing much more than the control system is asking for, the control system is effectively disengaged. This patch increases the range of control authority by allowing the minimal interval to be 2.5x slower. > I just see the minimum rate increases from 1 to 8 sectors, and minimum > delay increase from 1 to 2.5 seconds. I don't see an exact problem this > patch wants to solve. Is it to write back dirty data faster, or save > more power from cached device ? So one thing that's really bad that's happening currently is that when the disk is idle, is that the disk is repeatedly undergoing load/unload cycles (very long seeks). The disks I have-- Seagate ST6000VN0041 Ironwolf NAS 6TB-- seek off the active portion of the platter when idle for 750ms. So the disks are making a very loud clunk at one second intervals during idle writeback, which is not good for the drives. When writing back faster, they are quieter. This will at least do it 2.5x less often. In a subsequent patch, there's additional things I'd like to do-- like be willing to do no write after a wakeup if we are very far ahead. That is, to allow the effective value to be much larger than 2.5 seconds. This would potentially allow spindown on laptops, etc. But this change is still worthwhile on its own. Another thing that would be helpful would be to issue more than 1 write at a time, so that queue depth doesn't always equal 1. Queue depth=4 has about 40-50% more throughput--- that is, it completes the 4 IOPs in about 2.5x the time of one-- when writes are clustered to the same region of the disk/short-stroked. However this has a potential cost in latency, so it needs to be carefully traded off. > [snipped some] > > I post a proposal patch to linux-bcache@xxxxxxxxxxxxxxx, (see > https://www.spinics.net/lists/linux-bcache/msg04837.html), which sets > the writeback rate to maximum value if there is no front end I/O request > for a while (normally it might means more idle time in future). Then the > dirty data on cache can be cleaned up ASAP. And if there is frond end > request coming, set the writeback rate to minimum value, and let the PD > (now it is PI) controller to adjust its value. By this method, if there > is no front end I/O, dirty data can be cleaned much faster and hard disk > also has much more chance to spin down. While this could clear out dirty data sooner, I'm not sure this is always a good idea. For intermittent, write-heavy workloads it's good, but it substantially increases the chance that occasional random reads have to write a long time-- which is usually the opposite from what an IO scheduler tries to achieve. > > Thanks. > > -- > Coly Li Mike -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html