On Tue, Jul 21, 2009 at 7:01 AM, Vivek Goyal<vgoyal@xxxxxxxxxx> wrote: > On Mon, Jul 20, 2009 at 10:55:31PM -0700, Nauman Rafique wrote: >> On Mon, Jul 20, 2009 at 10:37 PM, Gui >> Jianfeng<guijianfeng@xxxxxxxxxxxxxx> wrote: >> > Vivek Goyal wrote: >> >> o Currently a request queue has got fixed number of request descriptors for >> >> sync and async requests. Once the request descriptors are consumed, new >> >> processes are put to sleep and they effectively become serialized. Because >> >> sync and async queues are separate, async requests don't impact sync ones >> >> but if one is looking for fairness between async requests, that is not >> >> achievable if request queue descriptors become bottleneck. >> >> >> >> o Make request descriptor's per io group so that if there is lots of IO >> >> going on in one cgroup, it does not impact the IO of other group. >> >> >> >> o This is just one relatively simple way of doing things. This patch will >> >> probably change after the feedback. Folks have raised concerns that in >> >> hierchical setup, child's request descriptors should be capped by parent's >> >> request descriptors. May be we need to have per cgroup per device files >> >> in cgroups where one can specify the upper limit of request descriptors >> >> and whenever a cgroup is created one needs to assign request descritor >> >> limit making sure total sum of child's request descriptor is not more than >> >> of parent. >> >> >> >> I guess something like memory controller. Anyway, that would be the next >> >> step. For the time being, we have implemented something simpler as follows. >> >> >> >> o This patch implements the per cgroup request descriptors. request pool per >> >> queue is still common but every group will have its own wait list and its >> >> own count of request descriptors allocated to that group for sync and async >> >> queues. So effectively request_list becomes per io group property and not a >> >> global request queue feature. >> >> >> >> o Currently one can define q->nr_requests to limit request descriptors >> >> allocated for the queue. Now there is another tunable q->nr_group_requests >> >> which controls the requests descriptr limit per group. q->nr_requests >> >> supercedes q->nr_group_requests to make sure if there are lots of groups >> >> present, we don't end up allocating too many request descriptors on the >> >> queue. >> >> >> > >> > Hi Vivek, >> > >> > In order to prevent q->nr_requests from becoming the bottle-neck of allocating >> > requests, whether we can update nr_requests accordingly when allocating or removing >> > a cgroup? >> >> Vivek, >> I agree with Gui here. In fact, it does not make much sense to keep >> the nr_requests limit if we already have per cgroup limit in place. >> This change also simplifies code quite a bit, as we can get rid of all >> that sleep_on_global logic. >> > > Hi Nauman, Gui, > > There were few reasons to keep a total limit on number of request > descriptors (q->nr_requests) apart from per group limit. > > - We have this notion of queue being congested or not depending on out of > q->nr_requests how many are currently being used. Writeback threads, > some filesystems and other places make use of this information to either > not to block or to avoid pushing too much of data on device if queue is > congested. > > With q->nr_requests removed, how do you define queue full and congested > semantics? We can still keep q->nr_requests around, but don't use that number to deny request descriptor allocation; only use it for defining queue full and congested semantics. > > - I think slee_on_global logic makes sense even without q->nr_requests. > Assume that a group allows request descriptor allocation but due to lack > of memory, allocation fails. Where do you make this process wait to > attempt next time? Making all such failed processes on gloabl list on > queue instead of per group list makes more sense to me for following > reasons. > > - If this is the first request allocation from the group and we > make the process sleep on group list, it will never be woken up > as no request from that group will complete. > > - If there are many processes who failed request descriptor > allocation, when some request completes, I think it is more > fair to wake these up in FIFO manner to try out allocation again > instead of waiting for request to complete from the group > process belongs to. The reason being that io controller did not > fail the request descriptor allocation. > > So even if you get rid of q->nr_requests, you still shall have to have > some logic of global wait list where failed allocations can wait. > > - It is backward compatible and there are less chances of higher layers > being broken due to this. > > > Gui, I think automatic updation of q->nr_requests is probably not a very > good thing. It is user defined tunable and user does not expect this to > change automatically. > > At this point of time I really can't think of simpler and cleaner way. > Ideas are welcome. > > Thanks > Vivek > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel