> On Fri 08-11-24 11:19:49, Jim Zhao wrote: > > > On Wed 23-10-24 18:00:32, Jim Zhao wrote: > > > > With the strictlimit flag, wb_thresh acts as a hard limit in > > > > balance_dirty_pages() and wb_position_ratio(). When device write > > > > operations are inactive, wb_thresh can drop to 0, causing writes to > > > > be blocked. The issue occasionally occurs in fuse fs, particularly > > > > with network backends, the write thread is blocked frequently during > > > > a period. To address it, this patch raises the minimum wb_thresh to a > > > > controllable level, similar to the non-strictlimit case. > > > > > > > > Signed-off-by: Jim Zhao <jimzhao.ai@xxxxxxxxx> > > > > > > ... > > > > > > > + /* > > > > + * With strictlimit flag, the wb_thresh is treated as > > > > + * a hard limit in balance_dirty_pages() and wb_position_ratio(). > > > > + * It's possible that wb_thresh is close to zero, not because > > > > + * the device is slow, but because it has been inactive. > > > > + * To prevent occasional writes from being blocked, we raise wb_thresh. > > > > + */ > > > > + if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) { > > > > + unsigned long limit = hard_dirty_limit(dom, dtc->thresh); > > > > + u64 wb_scale_thresh = 0; > > > > + > > > > + if (limit > dtc->dirty) > > > > + wb_scale_thresh = (limit - dtc->dirty) / 100; > > > > + wb_thresh = max(wb_thresh, min(wb_scale_thresh, wb_max_thresh / 4)); > > > > + } > > > > > > What you propose makes sense in principle although I'd say this is mostly a > > > userspace setup issue - with strictlimit enabled, you're kind of expected > > > to set min_ratio exactly if you want to avoid these startup issues. But I > > > tend to agree that we can provide a bit of a slack for a bdi without > > > min_ratio configured to ramp up. > > > > > > But I'd rather pick the logic like: > > > > > > /* > > > * If bdi does not have min_ratio configured and it was inactive, > > > * bump its min_ratio to 0.1% to provide it some room to ramp up. > > > */ > > > if (!wb_min_ratio && !numerator) > > > wb_min_ratio = min(BDI_RATIO_SCALE / 10, wb_max_ratio / 2); > > > > > > That would seem like a bit more systematic way than the formula you propose > > > above... > > > > Thanks for the advice. > > Here's the explanation of the formula: > > 1. when writes are small and intermittent,wb_thresh can approach 0, not > > just 0, making the numerator value difficult to verify. > > I see, ok. > > > 2. The ramp-up margin, whether 0.1% or another value, needs > > consideration. > > I based this on the logic of wb_position_ratio in the non-strictlimit > > scenario: wb_thresh = max(wb_thresh, (limit - dtc->dirty) / 8); It seems > > provides more room and ensures ramping up within a controllable range. > > I see, thanks for explanation. So I was thinking how to make the code more > consistent instead of adding another special constant and workaround. What > I'd suggest is: > > 1) There's already code that's supposed to handle ramping up with > strictlimit in wb_update_dirty_ratelimit(): > > /* > * For strictlimit case, calculations above were based on wb counters > * and limits (starting from pos_ratio = wb_position_ratio() and up to > * balanced_dirty_ratelimit = task_ratelimit * write_bw / dirty_rate). > * Hence, to calculate "step" properly, we have to use wb_dirty as > * "dirty" and wb_setpoint as "setpoint". > * > * We rampup dirty_ratelimit forcibly if wb_dirty is low because > * it's possible that wb_thresh is close to zero due to inactivity > * of backing device. > */ > if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) { > dirty = dtc->wb_dirty; > if (dtc->wb_dirty < 8) > setpoint = dtc->wb_dirty + 1; > else > setpoint = (dtc->wb_thresh + dtc->wb_bg_thresh) / 2; > } > > Now I agree that increasing wb_thresh directly is more understandable and > transparent so I'd just drop this special case. yes, I agree. > 2) I'd just handle all the bumping of wb_thresh in a single place instead > of having is spread over multiple places. So __wb_calc_thresh() could have > a code like: > > wb_thresh = (thresh * (100 * BDI_RATIO_SCALE - bdi_min_ratio)) / (100 * BDI_RATIO_SCALE) > wb_thresh *= numerator; > wb_thresh = div64_ul(wb_thresh, denominator); > > wb_min_max_ratio(dtc->wb, &wb_min_ratio, &wb_max_ratio); > > wb_thresh += (thresh * wb_min_ratio) / (100 * BDI_RATIO_SCALE); > limit = hard_dirty_limit(dtc_dom(dtc), dtc->thresh); > /* > * It's very possible that wb_thresh is close to 0 not because the > * device is slow, but that it has remained inactive for long time. > * Honour such devices a reasonable good (hopefully IO efficient) > * threshold, so that the occasional writes won't be blocked and active > * writes can rampup the threshold quickly. > */ > if (limit > dtc->dirty) > wb_thresh = max(wb_thresh, (limit - dtc->dirty) / 8); > if (wb_thresh > (thresh * wb_max_ratio) / (100 * BDI_RATIO_SCALE)) > wb_thresh = thresh * wb_max_ratio / (100 * BDI_RATIO_SCALE); > > and we can drop the bumping from wb_position)_ratio(). This way have the > wb_thresh bumping in a single logical place. Since we still limit wb_tresh > with max_ratio, untrusted bdis for which max_ratio should be configured > (otherwise they can grow amount of dirty pages upto global treshold anyway) > are still under control. > > If we really wanted, we could introduce a different bumping in case of > strictlimit, but at this point I don't think it is warranted so I'd leave > that as an option if someone comes with a situation where this bumping > proves to be too aggressive. Thank you, this is very helpful. And I have 2 concerns: 1. In the current non-strictlimit logic, wb_thresh is only bumped within wb_position_ratio() for calculating pos_ratio, and this bump isn’t restricted by max_ratio. I’m unsure if moving this adjustment to __wb_calc_thresh() would effect existing behavior. Would it be possible to keep the current logic for non-strictlimit case? 2. Regarding the formula: wb_thresh = max(wb_thresh, (limit - dtc->dirty) / 8); Consider a case: With 100 fuse devices(with high max_ratio) experiencing high writeback delays, the pages being written back are accounted in NR_WRITEBACK_TEMP, not dtc->dirty. As a result, the bumped wb_thresh may remain high. While individual devices are under control, the total could exceed expectations. Although lowering the max_ratio can avoid this issue, how about reducing the bumped wb_thresh? The formula in my patch: wb_scale_thresh = (limit - dtc->dirty) / 100; The intention is to use the default fuse max_ratio(1%) as the multiplier. Thanks Jim Zhao