On 2011-04-12 18:58, hch@xxxxxxxxxxxxx wrote: > On Tue, Apr 12, 2011 at 11:31:17PM +1000, Dave Chinner wrote: >> I don't think so. e.g. in the XFS allocation path we do btree block >> readahead, then go do the real work. The real work can end up with a >> deeper stack before blocking on locks or completions unrelated to >> the readahead, leading to schedule() being called and an unplug >> being issued at that point. You might think it contrived, but if >> you can't provide a guarantee that it can't happen then I have to >> assume it will happen. > > In addition to the stack issue, which is a killer to this also has > latency implications. Before we could submit a synchronous metadata > read request inside readpage or writepage and kick it off to the disk > immediately, while now it won't get submitted until we block the next > time, i.e. have done some more work that could have been used for > doing I/O in the background. With the kblockd offload not only have > we spent more time but at the point where we finally kick it we > also need another context switch. It seem like we really need to > go through the filesystems and explicitly flush the plugging queue > for such cases. In fact a bio flag marking things as synchronous > metadata reads would help, but then again we need to clean up our > existing bio flags first.. I think it would be a good idea to audit the SYNC cases, and if feasible let that retain the 'immediate kick off' logic. If not, have some way to signal that at least. Essentially allow some fine grained control of what goes into the plug and what does not. -- Jens Axboe -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel