On 5/27/22 11:05 AM, colyli wrote: > ? 2022-05-27 23:49?Jens Axboe ??? >> On 5/27/22 9:28 AM, Coly Li wrote: >>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c >>> index d138a2d73240..c51671abe74e 100644 >>> --- a/drivers/md/bcache/writeback.c >>> +++ b/drivers/md/bcache/writeback.c >>> @@ -214,6 +214,7 @@ static void update_writeback_rate(struct work_struct *work) >>> struct cached_dev, >>> writeback_rate_update); >>> struct cache_set *c = dc->disk.c; >>> + bool contention = false; >>> >>> /* >>> * should check BCACHE_DEV_RATE_DW_RUNNING before calling >>> @@ -243,13 +244,41 @@ static void update_writeback_rate(struct work_struct *work) >>> * in maximum writeback rate number(s). >>> */ >>> if (!set_at_max_writeback_rate(c, dc)) { >>> - down_read(&dc->writeback_lock); >>> - __update_writeback_rate(dc); >>> - update_gc_after_writeback(c); >>> - up_read(&dc->writeback_lock); >>> + /* >>> + * When contention happens on dc->writeback_lock with >>> + * the writeback thread, this kwork may be blocked for >>> + * very long time if there are too many dirty data to >>> + * writeback, and kerne message will complain a (bogus) >>> + * software lockup kernel message. To avoid potential >>> + * starving, if down_read_trylock() fails, writeback >>> + * rate updating will be skipped for dc->retry_max times >>> + * at most while delay this worker a bit longer time. >>> + * If dc->retry_max times are tried and the trylock >>> + * still fails, then call down_read() to wait for >>> + * dc->writeback_lock. >>> + */ >>> + if (!down_read_trylock((&dc->writeback_lock))) { >>> + contention = true; >>> + dc->retry_nr++; >>> + if (dc->retry_nr > dc->retry_max) >>> + down_read(&dc->writeback_lock); >>> + } >>> + >>> + if (!contention || dc->retry_nr > dc->retry_max) { >>> + __update_writeback_rate(dc); >>> + update_gc_after_writeback(c); >>> + up_read(&dc->writeback_lock); >>> + dc->retry_nr = 0; >>> + } >>> } >>> } >> > > Hi Jens, > > Thanks for looking into this :-) > >> This is really not very pretty. First of all, why bother with storing a >> max retry value in there? Doesn't seem like it'd ever be different per > > It is because the probability of the lock contention on > dc->writeback_lock depends on the I/O speed backing device. From my > observation during the tests, for fast backing device with larger > cache device, its writeback thread may work harder to flush more dirty > data to backing device, the lock contention happens more and longer, > so the writeback rate update kworker has to wait longer time before > acquires dc->writeback_lock. So its dc->retry_max should be larger > then slow backing device. > > Therefore I'd like to have a tunable per-backing-device retry_max. And > the syses interface will be added when users/customers want it. The > use case is from SAP HANA users, I have report that they observe the > soft lockup warning for dc->writeback_lock contention and worry about > whether data is corrupted (indeed, of course not). The initial patch has 5 as the default, and there are no sysfs knobs. If you ever need a sysfs knob, by all means make it a variable and make it configurable too. But don't do it upfront where the '5' suitabled named would do the job. >> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c >> index 9ee0005874cd..cbc01372c7a1 100644 >> --- a/drivers/md/bcache/writeback.c >> +++ b/drivers/md/bcache/writeback.c >> @@ -235,19 +235,27 @@ static void update_writeback_rate(struct >> work_struct *work) >> return; >> } >> >> - if (atomic_read(&dc->has_dirty) && dc->writeback_percent) { >> + if (atomic_read(&dc->has_dirty) && dc->writeback_percent && >> + !set_at_max_writeback_rate(c, dc)) { >> /* >> * If the whole cache set is idle, set_at_max_writeback_rate() >> * will set writeback rate to a max number. Then it is >> * unncessary to update writeback rate for an idle cache set >> * in maximum writeback rate number(s). >> */ >> - if (!set_at_max_writeback_rate(c, dc)) { > > The reason I didn't place '!set_at_max_writeback_rate' with other items in > previous if() was for the above code comment. If I moved it to previous > if() without other items, I was not comfortable to place the code comments > neither before or after the if() check. So I used a separated if() check for > '!set_at_max_writeback_rate'. > > From your change, it seems placing the code comments behind is fine (or > better), can I understand in this way? I try to learn and follow your way > to handle such code comments situation. Just put it higher up if you want, it doesn't really matter, or leave it where it is. >> __update_writeback_rate(dc); >> update_gc_after_writeback(c); >> up_read(&dc->writeback_lock); >> - } >> + } while (0); > > Aha, this is cool! I never though of using do{}while(0) and break in > such a genius way! Sure I will use this, thanks for the hint :-) > > After you reply my defense of dc->retry_max, and the question of code > comments location, I will update and test the patch again, and > re-sbumit to you. > > Thanks for your constructive suggestion, especially the do{}while(0) > part! I would do something similar to my change and drop the 'dc' addition for the max retries as it, by definition, can only be one value right now. For all I know, you'll never need to change it again, and then you're just wasting memory and making the code harder to read by putting it in dc instead of just having this define. -- Jens Axboe