Pavel Emelyanov <xemul@xxxxxxxxxxxxx> writes: > On 07/17/2012 11:11 PM, Miklos Szeredi wrote: >> >> Okay, maybe I'm blind but if this is true, then how is >> balance_dirty_pages() supposed to ensure that the per-bdi limit is not >> exceeded? > > The balance_dirty_pages logic is _very_ roughly the the following: > > Let this_bdi be a bdi the current task is writing to > Let D be the total amount of dirty and writeback memory (and writeback_tmp after this patch) > Let L be the limit of dirty memory (L = ram_size * ratio) > Let d be the amount of dirty and writeback on this_bdi > And let l be the limit of dirty memory on this_bdi > > With that the balancer logic look like > > while (1) { > if (D < L) > return; > > start_background_writeback(this_bdi); > > if (d < l) > return; > > timeout = get_sleep_timeout(d, l, D, L); > shcedule_timeout(timeout); > } > > The d and l are calculated out of the D and L using this_bdi and > global IO completions proportions (with more complexity, but still). > > Thus, since we throttle tasks looking ad d and l only we cannot affect > all the bdis in the system by live-locking a single one of them. > > Accounting for writeback_tmp is required since the D should become > high when there are lots of pages in-flight in FUSE. Otherwise, the > balance_dirty_pages will not limit the task writing on a fuse mount. Okay, that makes sense, and it's certainly an improvement from the current situation. What I'm worried about is that with the above algorithm a filesystem's "d" can grow as high as "L" if only that filesystem is dirtying memory. If that filesystem is very slow or broken and other filesystems start dirtying data then they are left with only a fraction of the original limit. They won't deadlock, but performance will be affected. So ideally I'd like to see more strict per-bdi limit enforcement for fuse (the per-bdi limit is just 1% of "L" by default on fuse). Thanks, Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html