On Fri, 25 Sep 2009 10:09:52 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote: > On Thu, 24 Sep 2009 14:33:15 -0700 > Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > Test5 (Fairness for async writes, Buffered Write Vs Buffered Write) > > > =================================================================== > > > Fairness for async writes is tricky and biggest reason is that async writes > > > are cached in higher layers (page cahe) as well as possibly in file system > > > layer also (btrfs, xfs etc), and are dispatched to lower layers not necessarily > > > in proportional manner. > > > > > > For example, consider two dd threads reading /dev/zero as input file and doing > > > writes of huge files. Very soon we will cross vm_dirty_ratio and dd thread will > > > be forced to write out some pages to disk before more pages can be dirtied. But > > > not necessarily dirty pages of same thread are picked. It can very well pick > > > the inode of lesser priority dd thread and do some writeout. So effectively > > > higher weight dd is doing writeouts of lower weight dd pages and we don't see > > > service differentation. > > > > > > IOW, the core problem with buffered write fairness is that higher weight thread > > > does not throw enought IO traffic at IO controller to keep the queue > > > continuously backlogged. In my testing, there are many .2 to .8 second > > > intervals where higher weight queue is empty and in that duration lower weight > > > queue get lots of job done giving the impression that there was no service > > > differentiation. > > > > > > In summary, from IO controller point of view async writes support is there. > > > Because page cache has not been designed in such a manner that higher > > > prio/weight writer can do more write out as compared to lower prio/weight > > > writer, gettting service differentiation is hard and it is visible in some > > > cases and not visible in some cases. > > > > Here's where it all falls to pieces. > > > > For async writeback we just don't care about IO priorities. Because > > from the point of view of the userspace task, the write was async! It > > occurred at memory bandwidth speed. > > > > It's only when the kernel's dirty memory thresholds start to get > > exceeded that we start to care about prioritisation. And at that time, > > all dirty memory (within a memcg?) is equal - a high-ioprio dirty page > > consumes just as much memory as a low-ioprio dirty page. > > > > So when balance_dirty_pages() hits, what do we want to do? > > > > I suppose that all we can do is to block low-ioprio processes more > > agressively at the VFS layer, to reduce the rate at which they're > > dirtying memory so as to give high-ioprio processes more of the disk > > bandwidth. > > > > But you've gone and implemented all of this stuff at the io-controller > > level and not at the VFS level so you're, umm, screwed. > > > > I think I must support dirty-ratio in memcg layer. But not yet. OR...I'll add a bufferred-write-cgroup to track bufferred writebacks. And add a control knob as bufferred_write.nr_dirty_thresh to limit the number of dirty pages generetad via a cgroup. Because memcg just records a owner of pages but not records who makes them dirty, this may be better. Maybe I can reuse page_cgroup and Ryo's blockio cgroup code. But I'm not sure how I should treat I/Os generated out by kswapd. Thanks, -Kame -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel