On Wed, Jun 29, 2011 at 09:29:55AM +0800, Vivek Goyal wrote: > On Wed, Jun 29, 2011 at 09:04:55AM +0800, Shaohua Li wrote: > > [..] > > > We idle on last queue on sync-noidle tree. So we idle on fysnc queue as > > > it is last queue on sync-noidle tree. That's how we provide protection > > > to all sync-noidle queues against sync-idle queues. Instead of idling > > > on individual quues we do idling in group and that is on service tree. > > Ok. but this looks silly. We are idling in a noidle service tree or a > > group (backed by the last queue of the tree or group) because we assume > > the tree or group can dispatch a request soon. But if the think time of > > the tree or group is big, the assumption isn't true. Doing idle here is > > blind. I thought we can extend the think time check for both service > > tree and group. > > We can implement the thinktime for noidle service tree and group idle as > well. That's not a problem, though I am yet to be convinced that thinktime > still makes sense for the group. I guess it will just mean that in the > past have you done a bunch of IO with gap between IO less than 8ms. If > yes, then we expect you to do more IO in future. Frankly speaking, I am > not too sure that how past IO pattern predicts the future IO pattern > of the group. > > But anyway, the point is, even if you we implement it, it will not solve > the fsync issue at hand. The reason I explained in previous mail. We > will be oscillating between high think time and low thinktime depending > on whether we are idling or not. There is no correlation between think > time of fsync thread and idling here. > > I think you are banking on the fact that after fsync, journaling thread > IO can take more than 8ms hence delaying next IO to fsync thread, pushing > its thinktim more than 8ms hence we will not idle on fsync thread at > all. It is just one corner case and I think it is broken in multiple > cases. > > - If filesystem barriers are disabled or backend storage has battery > backup then journal IO most likely will go in cache and barriers > will be ignored. In that case write will finish almost instantly > and we will get next IO from fsync thread very soon hence pushing > down thinktime of fsync thread which will enable idling and we will > be back to the problem we are trying to solve. > > - Fsync thread might be submitting string of IOs (say 10-12) before it > moves to journal thread to commit meta data. In that case we might > have lowered thinktime of fsync hence enable idle. > > So implementing think time for service tree/group might be a good idea > in general but it will not solve this IO dependecny issue across cgroups. Ok, fair enough. I'll give a try and check how things change with the fsync workload. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html