On Mon, 2010-10-04 at 21:13 +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > Currently the size of the speculative preallocation during delayed > allocation is fixed by either the allocsize mount option of a > default size. We are seeing a lot of cases where we need to > recommend using the allocsize mount option to prevent fragmentation > when buffered writes land in the same AG. > > Rather than using a fixed preallocation size by default (up to 64k), > make it dynamic by exponentially increasing it on each subsequent > preallocation. This will result in the preallocation size increasing > as the file increases, so for streaming writes we are much more > likely to get large preallocations exactly when we need it to reduce > fragementation. It should also prevent the need for using the > allocsize mount option for most workloads involving concurrent > streaming writes. I have some comments, below. > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- . . . > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index 2057614..b2e4782 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c . . . > @@ -427,11 +431,25 @@ xfs_iomap_eof_want_preallocate( > if ((imap[n].br_startblock != HOLESTARTBLOCK) && > (imap[n].br_startblock != DELAYSTARTBLOCK)) > return 0; > + > start_fsb += imap[n].br_blockcount; > count_fsb -= imap[n].br_blockcount; > + > + /* count delalloc blocks beyond EOF */ > + if (imap[n].br_startblock == DELAYSTARTBLOCK) > + found_delalloc += imap[n].br_blockcount; > } > } > - *prealloc = 1; At this point, count_fsb will be 0 (a necessary condition for loop termination, since count_fsb is unsigned). Since found_delalloc is initially 0 (and is also unsigned), we can safely assume that found_delalloc >= count_fsb. The only case in which found_delalloc <= count_fsb is if found_delalloc is also 0 (a case you cover separately, first, below). Furthermore, *prealloc was assigned the value 0 at the top. So I think this bottom section can be simplified to: if (!found_delalloc) *prealloc = 1; But if that's the case, then maybe the loop can simply return as soon as it finds a delayed allocation entry. In other words, the net effect of this is that you only tell the caller we want preallocation if *no* preallocated blocks beyond EOF exist. That may be fine, but it doesn't seem to match your understanding based on your code, so I just wanted to call your attention to it. > + if (!found_delalloc) { > + /* haven't got any prealloc, so need some */ > + *prealloc = 1; > + } else if (found_delalloc <= count_fsb) { > + /* almost run out of prealloc */ > + *prealloc = 1; > + } else { > + /* still lots of prealloc left */ > + *prealloc = 0; > + } > return 0; > } > > @@ -469,6 +487,7 @@ xfs_iomap_write_delay( > extsz = xfs_get_extsz_hint(ip); > offset_fsb = XFS_B_TO_FSBT(mp, offset); > > + This is hunk should be killed. It just adds an unwanted blank line. > error = xfs_iomap_eof_want_preallocate(mp, ip, offset, count, > ioflag, imap, XFS_WRITE_IMAPS, &prealloc); > if (error) > @@ -476,9 +495,25 @@ xfs_iomap_write_delay( > > retry: > if (prealloc) { > + xfs_fileoff_t alloc_blocks = 0; > + /* > + * If we don't have a user specified preallocation size, dynamically > + * increase the preallocation size as we do more preallocation. > + * Cap the maximum size at a single extent. > + */ > + if (!(mp->m_flags & XFS_MOUNT_DFLT_IOSIZE)) { I note that this is circumventing the special code in xfs_set_rw_sizes() that tries to set up a different (smaller) size in the event the "sync" (generic) mount option was supplied (indicated by XFS_MOUNT_SYNC). If that is a good thing, then I suggest the code in xfs_set_rw_sizes() go away. But it would be good to have the case for making that change stated. > + alloc_blocks = XFS_FILEOFF_MIN(MAXEXTLEN, > + (ip->i_last_prealloc * 4)); > + } So this is the spot that begs the question of whether the the default I/O size mount option is needed any more. The net effect of your change (assuming no "allocsize" mount option is in effect) is: - Initially, ip->i_last_prealloc will be 0. Therefore the first time through, the preallocated blocks beyond the end will be based on m_writeio_blocks (either 16KB or 64KB, dependent on whether XFS_MOUNT_WSYNC was specified). - Thereafter, whenever more preallocated-at-EOF blocks are needed, the number allocated will be 4 times more than the last time (growing exponentially), limited by the maximum extent size. I guess the reason one might want the "allocsize" mount option now becomes the opposite of why one might have wanted it before. I.e., it would be used to *reduce* the size of the preallocated range beyond EOF, which I could envision might be reasonable in some circumstances. > + if (alloc_blocks == 0) > + alloc_blocks = mp->m_writeio_blocks; > + ip->i_last_prealloc = alloc_blocks; > + > aligned_offset = XFS_WRITEIO_ALIGN(mp, (offset + count - 1)); > ioalign = XFS_B_TO_FSBT(mp, aligned_offset); > - last_fsb = ioalign + mp->m_writeio_blocks; > + last_fsb = ioalign + alloc_blocks; > + printk("ino %lld, ioalign 0x%llx, alloc_blocks 0x%llx\n", > + ip->i_ino, ioalign, alloc_blocks); Kill the debug printk() call. > } else { > last_fsb = XFS_B_TO_FSB(mp, ((xfs_ufsize_t)(offset + count))); > } _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs