On Wed, Oct 07, 2009 at 07:23:02PM +0800, Nick Piggin wrote: > On Wed, Oct 07, 2009 at 06:47:11PM +0800, Wu Fengguang wrote: > > On Wed, Oct 07, 2009 at 06:21:30PM +0800, Nick Piggin wrote: > > > On Wed, Oct 07, 2009 at 11:17:06AM +0100, David Howells wrote: > > > > Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote: > > > > > > > > > Convert wbc.range_cyclic to new behavior: when past EOF, abort writeback > > > > > of the inode, which instructs writeback_single_inode() to delay it for > > > > > a while if necessary. > > > > > > > > > > It removes one inefficient .range_cyclic IO pattern when writeback_index > > > > > wraps: > > > > > submit [10000-10100], (wrap), submit [0-100] > > > > > In which the submitted pages may be consisted of two distant ranges. > > > > > > > > > > It also prevents submitting pointless IO for busy overwriters. > > > > > > > > > > CC: David Howells <dhowells@xxxxxxxxxx> > > > > > Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> > > > > > > > > Acked-by: David Howells <dhowells@xxxxxxxxxx> > > > > > > I don't see why. Then the inode is given less write bandwidth than > > > those which don't wrap (or wrap on "nice" boundaries). > > > > The "return on wrapped" behavior itself only offers a natural seek > > boundary to the upper layer. It's mainly the "whether to delay" > > policy that will affect (overall) bandwidth. > > > > If we choose to not sleep, and to go on with other inodes and then > > back to this inode, no bandwidth will be lost. > > > > If we have done work with other inodes (if any), and choose to sleep > > for a while before restarting this inode, then we could lose bandwidth. > > The plus side is, we possibly avoid submitting extra IO if this inode > > is being busy overwritten. So it's a tradeoff. > > > > The behavior after this patchset is, to keep busy as long as we can > > write any pages (in patch 38/45). So we still opt for bandwidth :) > > No I mean bandwidth fairness between inodes. I guess it's the old semantics that has bandwidth fairness problem :) Imagine write chunk size is 4MB, and inode A/B with size 6MB/8MB. The old semantics will have write sequence 4MB for A; 4MB for B; other inodes; 4MB for A; 4MB for B; other inodes; 4MB for A; 4MB for B; other inodes; while the new sequence would be 4MB for A; 4MB for B; other inodes; 2MB for A; 4MB for B; other inodes; 4MB for A; 4MB for B; other inodes; 2MB for A; 4MB for B; other inodes; On average, each page in A used to get more write chance than B's. Now with no-wrap, A and B's pages have the same chance to be writeback. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html