On Thu, Sep 18, 2008 at 05:18:50PM +0200, Andrea Righi wrote: > Vivek Goyal wrote: > > On Thu, Sep 18, 2008 at 04:37:41PM +0200, Andrea Righi wrote: > >> Vivek Goyal wrote: > >>> On Thu, Sep 18, 2008 at 09:04:18PM +0900, Ryo Tsuruta wrote: > >>>> Hi All, > >>>> > >>>> I have got excellent results of dm-ioband, that controls the disk I/O > >>>> bandwidth even when it accepts delayed write requests. > >>>> > >>>> In this time, I ran some benchmarks with a high-end storage. The > >>>> reason was to avoid a performance bottleneck due to mechanical factors > >>>> such as seek time. > >>>> > >>>> You can see the details of the benchmarks at: > >>>> http://people.valinux.co.jp/~ryov/dm-ioband/hps/ > >>>> > >>> Hi Ryo, > >>> > >>> I had a query about dm-ioband patches. IIUC, dm-ioband patches will break > >>> the notion of process priority in CFQ because now dm-ioband device will > >>> hold the bio and issue these to lower layers later based on which bio's > >>> become ready. Hence actual bio submitting context might be different and > >>> because cfq derives the io_context from current task, it will be broken. > >>> > >>> To mitigate that problem, we probably need to implement Fernando's > >>> suggestion of putting io_context pointer in bio. > >>> > >>> Have you already done something to solve this issue? > >>> > >>> Secondly, why do we have to create an additional dm-ioband device for > >>> every device we want to control using rules. This looks little odd > >>> atleast to me. Can't we keep it in line with rest of the controllers > >>> where task grouping takes place using cgroup and rules are specified in > >>> cgroup itself (The way Andrea Righi does for io-throttling patches)? > >>> > >>> To avoid creation of stacking another device (dm-ioband) on top of every > >>> device we want to subject to rules, I was thinking of maintaining an > >>> rb-tree per request queue. Requests will first go into this rb-tree upon > >>> __make_request() and then will filter down to elevator associated with the > >>> queue (if there is one). This will provide us the control of releasing > >>> bio's to elevaor based on policies (proportional weight, max bandwidth > >>> etc) and no need of stacking additional block device. > >>> > >>> I am working on some experimental proof of concept patches. It will take > >>> some time though. > >>> > >>> I was thinking of following. > >>> > >>> - Adopt the Andrea Righi's style of specifying rules for devices and > >>> group the tasks using cgroups. > >>> > >>> - To begin with, adopt dm-ioband's approach of proportional bandwidth > >>> controller. It makes sense to me limit the bandwidth usage only in > >>> case of contention. If there is really a need to limit max bandwidth, > >>> then probably we can do something to implement additional rules or > >>> implement some policy switcher where user can decide what kind of > >>> policies need to be implemented. > >>> > >>> - Get rid of dm-ioband and instead buffer requests on an rb-tree on every > >>> request queue which is controlled by some kind of cgroup rules. > >>> > >>> It would be good to discuss above approach now whether it makes sense or > >>> not. I think it is kind of fusion of io-throttling and dm-ioband patches > >>> with additional idea of doing io-control just above elevator on the request > >>> queue using an rb-tree. > >> Thanks Vivek. All sounds reasonable to me and I think this is be the right way > >> to proceed. > >> > >> I'll try to design and implement your rb-tree per request-queue idea into my > >> io-throttle controller, maybe we can reuse it also for a more generic solution. > >> Feel free to send me your experimental proof of concept if you want, even if > >> it's not yet complete, I can review it, test and contribute. > > > > Currently I have taken code from bio-cgroup to implement cgroups and to > > provide functionality to associate a bio to a cgroup. I need this to be > > able to queue the bio's at right node in the rb-tree and then also to be > > able to take a decision when is the right time to release few requests. > > > > Right now in crude implementation, I am working on making system boot. > > Once patches are at least in little bit working shape, I will send it to you > > to have a look. > > > > Thanks > > Vivek > > I wonder... wouldn't be simpler to just use the memory controller > to retrieve this information starting from struct page? > > I mean, following this path (in short, obviously using the appropriate > interfaces for locking and referencing the different objects): > > cgrp = page->page_cgroup->mem_cgroup->css.cgroup > Andrea, Ok, you are first retrieving cgroup associated page owner and then retrieving repsective iothrottle state using that cgroup, (cgroup_to_iothrottle). I have yet to dive deeper into cgroup data structures but does it work if iothrottle and memory controller are mounted on separate hierarchies? bio-cgroup guys are also doing similar thing in the sense retrieving relevant pointer through page and page_cgroup and use that to reach bio_cgroup strucutre. The difference is that they don't retrieve first css object of mem_cgroup instead they directly store the pointer of bio_cgroup in page_cgroup (When page is being charged in memory controller). While page is being charged, determine the bio_cgroup, associated with the task and store this info in page->page_cgroup->bio_cgroup. static inline struct bio_cgroup *bio_cgroup_from_task(struct task_struct *p) { return container_of(task_subsys_state(p, bio_cgroup_subsys_id), struct bio_cgroup, css); } At any later point, one can look at bio and reach respective bio_cgroup by. bio->page->page_cgroup->bio_cgroup. Looks like now we are getting rid of page_cgroup pointer in "struct page" and we shall have to change the implementation accordingly. Thanks Vivek -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel