[XFS updates] XFS development tree branch, for-next, updated. v3.4-rc2-53-g1c3b227

xfs@xxxxxxxxxxx · Wed, 9 May 2012 17:36:43 -0500

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "XFS development tree".

The branch, for-next has been updated
  1c3b227 xfs: make XBF_MAPPED the default behaviour
  ea832f2 xfs: flush outstanding buffers on log mount failure
  72240a7 xfs: Properly exclude IO type flags from buffer flags
  5e57a1a xfs: clean up xfs_bit.h includes
  3dae55d xfs: move xfs_do_force_shutdown() and kill xfs_rw.c
  29ca42b xfs: move xfs_get_extsz_hint() and kill xfs_rw.h
  98eaacc xfs: move xfs_fsb_to_db to xfs_bmap.h
  425dcd6 xfs: clean up busy extent naming
  e459df5 xfs: move busy extent handling to it's own file
  98cab1d xfs: move xfsagino_t to xfs_types.h
  b020dc6 xfs: use iolock on XFS_IOC_ALLOCSP calls
  bab5041 xfs: kill XBF_DONTBLOCK
  3c19da8 xfs: kill xfs_read_buf()
  2f5a6c9 xfs: kill XBF_LOCK
  b0f292d xfs: kill xfs_buf_btoc
  22b58e2 xfs: use blocks for storing the desired IO size
  8d6e476 xfs: use blocks for counting length of buffers
  6473c07 xfs: kill b_file_offset
  57295b4 xfs: clean up buffer get/read call API
  c6dde1f xfs: use kmem_zone_zalloc for buffers
  e5c8aaf xfs: fix incorrect b_offset initialisation
  0889cf5 xfs: check for buffer errors before waiting
  d247348 xfs: fix buffer lookup race on allocation failure
  04016b6 xfs: Use preallocation for inodes with extsz hints
  512839c xfs: limit specualtive delalloc to maxioffset
  e32b8bc xfs: don't assert on delalloc regions beyond EOF
  70452be xfs: prevent needless mount warning causing test failures
  976b494 xfs: punch new delalloc blocks out of failed writes inside EOF.
      from  df9e825962c70e77098587fce8c9fe8a71367425 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 1c3b2277f286be34c9faaf47a48b86c4580c3690
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:59:07 2012 +1000

    xfs: make XBF_MAPPED the default behaviour

    Rather than specifying XBF_MAPPED for almost all buffers, introduce
    XBF_UNMAPPED for the couple of users that use unmapped buffers.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit ea832f2ede3acf387c472d324f62a45a01c0f218
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:59:06 2012 +1000

    xfs: flush outstanding buffers on log mount failure

    When we fail to mount the log in xfs_mountfs(), we tear down all the
    infrastructure we have already allocated. However, the process of
    mounting the log may have progressed to the point of reading,
    caching and modifying buffers in memory. Hence before we can free
    all the infrastructure, we have to flush and remove all the buffers
    from memory.

    Problem first reported by Eric Sandeen, later a different incarnation
    was reported by Ben Myers.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 72240a7b0f8748e94c3664aa3b5f77d28fa30d4d
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:59:05 2012 +1000

    xfs: Properly exclude IO type flags from buffer flags

    Recent event tracing during a debugging session showed that flags
    that define the IO type for a buffer are leaking into the flags on
    the buffer incorrectly. Fix the flag exclusion mask in
    xfs_buf_alloc() to avoid problems that may be caused by such
    leakage.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 5e57a1ab68267e22861358587e62094bc08683be
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:59:04 2012 +1000

    xfs: clean up xfs_bit.h includes

    With the removal of xfs_rw.h and other changes over time, xfs_bit.h
    is being included in many files that don't actually need it. Clean
    up the includes as necessary.

    Also move the only-used-once xfs_ialloc_find_free() static inline
    function out of a header file that is widely included to reduce
    the number of needless dependencies on xfs_bit.h.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 3dae55deaa08e4645df61d063239e6abfffddf9e
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:59:03 2012 +1000

    xfs: move xfs_do_force_shutdown() and kill xfs_rw.c

    xfs_do_force_shutdown now is the only thing in xfs_rw.c. There is no
    need to keep it in it's own file anymore, so move it to xfs_fsops.c
    next to xfs_fs_goingdown() and kill xfs_rw.c.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 29ca42b7a6925754bc6c8889fe776afac4751ee3
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:59:02 2012 +1000

    xfs: move xfs_get_extsz_hint() and kill xfs_rw.h

    The only thing left in xfs_rw.h is a function prototype for an inode
    function.  Move that to xfs_inode.h, and kill xfs_rw.h.

    Also move the function implementing the prototype from xfs_rw.c to
    xfs_inode.c so we only have one function left in xfs_rw.c

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 98eaacc0b0f63d4761872326302712e804625357
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:59:01 2012 +1000

    xfs: move xfs_fsb_to_db to xfs_bmap.h

    This is the only remaining useful function in xfs_rw.h, so move it
    to a header file responsible for block mapping functions that the
    callers already include. Soon we can get rid of xfs_rw.h.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 425dcd6c2289c48f13795ed1c49ca813d687f9de
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Sun Apr 29 10:41:10 2012 +0000

    xfs: clean up busy extent naming

    Now that the busy extent tracking has been moved out of the
    allocation files, clean up the namespace it uses to
    "xfs_extent_busy" rather than a mix of "xfs_busy" and
    "xfs_alloc_busy".

    Signed-off-by: Dave Chinner<dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit e459df5b5b93676ee80db0f3f43b963f31237dab
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Sun Apr 29 10:39:43 2012 +0000

    xfs: move busy extent handling to it's own file

    To make it easier to handle userspace code merges, move all the busy
    extent handling out of the allocation code and into it's own file.
    The userspace code does not need the busy extent code, so this
    simplifies the merging of the kernel code into the userspace
    xfsprogs library.

    Because the busy extent code has been almost completely rewritten
    over the past couple of years, also update the copyright on this new
    file to include the authors that made all those changes.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 98cab1d0a10c86a7adf51b3b0334cfba9f9c7c0b
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:58:58 2012 +1000

    xfs: move xfsagino_t to xfs_types.h

    Untangle the header file includes a bit by moving the definition of
    xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
    xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
    xfs_ag.h.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit b020dc692ae638cae00315190e8b4784f237da12
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:58:57 2012 +1000

    xfs: use iolock on XFS_IOC_ALLOCSP calls

    fsstress has a particular effective way of stopping debug XFS
    kernels. We keep seeing assert failures due finding delayed
    allocation extents where there should be none. This shows up when
    extracting extent maps and we are holding all the locks we should be
    to prevent races, so this really makes no sense to see these errors.

    After checking that fsstress does not use mmap, it occurred to me
    that fsstress uses something that no sane application uses - the
    XFS_IOC_ALLOCSP ioctl interfaces for preallocation. These interfaces
    do allocation of blocks beyond EOF without using preallocation, and
    then call setattr to extend and zero the allocated blocks.

    THe problem here is this is a buffered write, and hence the
    allocation is a delayed allocation. Unlike the buffered IO path, the
    allocation and zeroing are not serialised using the IOLOCK. Hence
    the ALLOCSP operation can race with operations holding the iolock to
    prevent buffered IO operations from occurring.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit bab5041a7ca4384bb3f240cb5726fb8163bf37f5
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:58:56 2012 +1000

    xfs: kill XBF_DONTBLOCK

    Just about all callers of xfs_buf_read() and xfs_buf_get() use XBF_DONTBLOCK.
    This is used to make memory allocation use GFP_NOFS rather than GFP_KERNEL to
    avoid recursion through memory reclaim back into the filesystem.

    All the blocking get calls in growfs occur inside a transaction, even though
    they are no part of the transaction, so all allocation will be GFP_NOFS due to
    the task flag PF_TRANS being set. The blocking read calls occur during log
    recovery, so they will probably be unaffected by converting to GFP_NOFS
    allocations.

    Hence make XBF_DONTBLOCK behaviour always occur for buffers and kill the flag.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 3c19da829298c4b9b90c79b3ca764b25b8d0f6e0
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:58:55 2012 +1000

    xfs: kill xfs_read_buf()

    xfs_read_buf() is effectively the same as xfs_trans_read_buf() when called
    outside a transaction context. The error handling is slightly different in that
    xfs_read_buf stales the errored buffer it gets back, but there is probably good
    reason for xfs_trans_read_buf() for doing this.

    Hence update xfs_trans_read_buf() to the same error handling as xfs_read_buf(),
    and convert all the callers of xfs_read_buf() to use the former function. We can
    then remove xfs_read_buf().

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 2f5a6c9211a9dbaa8ff7fcfba45100f4c788a443
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:58:54 2012 +1000

    xfs: kill XBF_LOCK

    Buffers are always returned locked from the lookup routines. Hence
    we don't need to tell the lookup routines to return locked buffers,
    on to try and lock them. Remove XBF_LOCK from all the callers and
    from internal buffer cache usage.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit b0f292dd35ee872cd72071f598a8816b05edb4b6
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:58:53 2012 +1000

    xfs: kill xfs_buf_btoc

    xfs_buf_btoc and friends are simple macros that do basic block
    to page index conversion and vice versa. These aren't widely used,
    and we use open coded masking and shifting everywhere else. Hence
    remove the macros and open code the work they do.

    Also, use of PAGE_CACHE_{SIZE|SHIFT|MASK} for these macros is now
    incorrect - we are using pages directly and not the page cache, so
    use PAGE_{SIZE|MASK|SHIFT} instead.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 22b58e2413d7ce756a388ce6c20b697351032e70
Author: Dave Chinner <david@xxxxxxxxxxxxx>
Date:   Mon Apr 23 15:58:52 2012 +1000

    xfs: use blocks for storing the desired IO size

    Now that we pass block counts everywhere, and index buffers by block
    number and length in units of blocks, convert the desired IO size
    into block counts rather than bytes. Convert the code to use block
    counts, and those that need byte counts get converted at the time of
    use.

    Rename the b_desired_count variable to something closer to it's
    purpose - b_io_length - as it is only used to specify the length of
    an IO for a subset of the buffer.  The only time this is used is for
    log IO - both writing iclogs and during log recovery. In all other
    cases, the b_io_length matches b_length, and hence a lot of code
    confuses the two. e.g. the buf item code uses the io count
    exclusively when it should be using the buffer length. Fix these
    apprpriately as they are found.

    Also, remove the XFS_BUF_{SET_}COUNT() macros that are just wrappers
    around the desired IO length. They only serve to make the code
    shouty loud, don't actually add any real value, and are often used
    incorrectly.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 8d6e476d8472d3568bf7ca077dd04a1e79e5fcd4
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:58:51 2012 +1000

    xfs: use blocks for counting length of buffers

    Now that we pass block counts everywhere, and index buffers by block
    number, track the length of the buffer in units of blocks rather
    than bytes. Convert the code to use block counts, and those that
    need byte counts get converted at the time of use.

    Also, remove the XFS_BUF_{SET_}SIZE() macros that are just wrappers
    around the buffer length. They only serve to make the code shouty
    loud and don't actually add any real value.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 6473c07e8f10300985ab31d7f3f802ea44f81f1a
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:58:50 2012 +1000

    xfs: kill b_file_offset

    Seeing as we pass block numbers around everywhere in the buffer
    cache now, it makes no sense to index everything by byte offset.
    Replace all the byte offset indexing with block number based
    indexing, and replace all uses of the byte offset with direct
    conversion from the block index.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 57295b48971d743de29c628a07cf45684960a218
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:58:49 2012 +1000

    xfs: clean up buffer get/read call API

    The xfs_buf_get/read API is not consistent in the units it uses, and
    does not use appropriate or consistent units/types for the
    variables.

    Convert the API to use disk addresses and block counts for all
    buffer get and read calls. Use consistent naming for all the
    functions and their declarations, and convert the internal functions
    to use disk addresses and block counts to avoid need to convert them
    from one type to another and back again.

    Fix all the callers to use disk addresses and block counts. In many
    cases, this removes an additional conversion from the function call
    as the callers already have a block count.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit c6dde1ff76fc9ccca482d1b3fd3d789c3fc63ce5
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:58:48 2012 +1000

    xfs: use kmem_zone_zalloc for buffers

    To replace the alloc/memset pair.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit e5c8aaf31382ced6d1864ee7366941b95f8a4427
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:58:47 2012 +1000

    xfs: fix incorrect b_offset initialisation

    Because we no longer use the page cache for buffering, there is no
    direct block number to page offset relationship anymore.
    xfs_buf_get_pages is still setting up b_offset as if there was some
    relationship, and that is leading to incorrectly setting up
    *uncached* buffers that don't overwrite b_offset once they've had
    pages allocated.

    For cached buffers, the first block of the buffer is always at offset
    zero into the allocated memory. This is true for sub-page sized
    buffers, as well as for multiple-page buffers.

    For uncached buffers, b_offset is only non-zero when we are
    associating specific memory to the buffers, and that is set
    correctly by the code setting up the buffer.

    Hence remove the setting of b_offset in xfs_buf_get_pages, because
    it is now always the wrong thing to do.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 0889cf58eafe204906fdd21c20b1e1bb3399d6cd
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:58:46 2012 +1000

    xfs: check for buffer errors before waiting

    If we call xfs_buf_iowait() on a buffer that failed dispatch due to
    an IO error, it will wait forever for an Io that does not exist.
    This is hndled in xfs_buf_read, but there is other code that calls
    xfs_buf_iowait directly that doesn't.

    Rather than make the call sites have to handle checking for dispatch
    errors and then checking for completion errors, make
    xfs_buf_iowait() check for dispatch errors on the buffer before
    waiting. This means we handle both dispatch and completion errors
    with one set of error handling at the caller sites.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit d24734883ba6fdf872e2e3fc44c2310e4e013218
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:58:45 2012 +1000

    xfs: fix buffer lookup race on allocation failure

    When memory allocation fails to add the page array or tht epages to
    a buffer during xfs_buf_get(), the buffer is left in the cache in a
    partially initialised state. There is enough state left for the next
    lookup on that buffer to find the buffer, and for the buffer to then
    be used without finishing the initialisation.  As a result, when an
    attempt to do IO on the buffer occurs, it fails with EIO because
    there are no pages attached to the buffer.

    We cannot remove the buffer from the cache immediately and free it,
    because there may already be a racing lookup that is blocked on the
    buffer lock. Hence the moment we unlock the buffer to then free it,
    the other user is woken and we have a use-after-free situation.

    To avoid this race condition altogether, allocate the pages for the
    buffer before we insert it into the cache.  This then means that we
    don't have an allocation  failure case to deal after the buffer is
    already present in the cache, and hence avoid the problem
    altogether.  In most cases we won't have racing inserts for the same
    buffer, and so won't increase the memory pressure allocation before
    insertion may entail.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 04016b6f92a314cb0e9a239ddaece2d55be7d16c
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Apr 23 15:58:44 2012 +1000

    xfs: Use preallocation for inodes with extsz hints

    xfstest 229 exposes a problem with buffered IO, delayed allocation
    and extent size hints. That is when we do delayed allocation during
    buffered IO, we reserve space for the extent size hint alignment and
    allocate the physical space to align the extent, but we do not zero
    the regions of the extent that aren't written by the write(2)
    syscall. The result is that we expose stale data in unwritten
    regions of the extent size hints.

    There are two ways to fix this. The first is to detect that we are
    doing unaligned writes, check if there is already a mapping or data
    over the extent size hint range, and if not zero the page cache
    first before then doing the real write. This can be very expensive
    for large extent size hints, especially if the subsequent writes
    fill then entire extent size before the data is written to disk.

    The second, and simpler way, is simply to turn off delayed
    allocation when the extent size hint is set and use preallocation
    instead. This results in unwritten extents being laid down on disk
    and so only the written portions will be converted. This matches the
    behaviour for direct IO, and will also work for the real time
    device. The disadvantage of this approach is that for small extent
    size hints we can get file fragmentation, but in general extent size
    hints are fairly large (e.g. stripe width sized) so this isn't a big
    deal.

    Implement the second approach as it is simple and effective.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 512839c554d05cc6004ebc9b47f8a2f06e26b78c
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Sun Apr 29 22:43:19 2012 +1000

    xfs: limit specualtive delalloc to maxioffset

    Speculative delayed allocation beyond EOF near the maximum supported
    file offset can result in creating delalloc extents beyond
    mp->m_maxioffset (8EB). These can never be trimmed during
    xfs_free_eof_blocks() because they are beyond mp->m_maxioffset, and
    that results in assert failures in xfs_fs_destroy_inode() due to
    delalloc blocks still being present. xfstests 071 exposes this
    problem.

    Limit speculative delalloc to mp->m_maxioffset to avoid this
    problem.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit e32b8bcc3bcb81b36419f5c9e93c874b34f7370b
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Sun Apr 29 21:16:17 2012 +1000

    xfs: don't assert on delalloc regions beyond EOF

    When we are doing speculative delayed allocation beyond EOF,
    conversion of the region allocated beyond EOF is dependent on the
    largest free space extent available. If the largest free extent is
    smaller than the delalloc range, then after allocation we leave
    a delalloc extent that starts beyond EOF. This extent cannot *ever*
    be converted by flushing data, and so will remain there until either
    the EOF moves into the extent or it is truncated away.

    Hence if xfs_getbmap() runs on such an inode and is asked to return
    extents beyond EOF, it will assert fail on this extent even though
    there is nothing xfs_getbmap() can do to convert it to a real
    extent. Hence we should simply report these delalloc extents rather
    than assert that there should be none.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 70452be22a8d2f3bfab854f7b21eeb92758988cc
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Apr 27 19:45:22 2012 +1000

    xfs: prevent needless mount warning causing test failures

    Often mounting small filesystem with small logs will emit a warning
    such as:

    XFS (vdb): Invalid block length (0x2000) for buffer

    during log recovery. This causes tests to randomly fail because this
    output causes the clean filesystem checks on test completion to
    think the filesystem is inconsistent.

    The cause of the error is simply that log recovery is asking for a
    buffer size that is larger than the log when zeroing the tail. This
    is because the buffer size is rounded up, and if the right head and
    tail conditions exist then the buffer size can be larger than the log.
    Limit the variable size xlog_get_bp() callers to requesting buffers
    smaller than the log.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 976b494b009919df75b2c19bec82b422a9efac5a
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Apr 27 19:45:21 2012 +1000

    xfs: punch new delalloc blocks out of failed writes inside EOF.

    When a partial write inside EOF fails, it can leave delayed
    allocation blocks lying around because they don't get punched back
    out. This leads to assert failures like:

    XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 847

    when evicting inodes from the cache. This can be trivially triggered
    by xfstests 083, which takes between 5 and 15 executions on a 512
    byte block size filesystem to trip over this. Debugging shows a
    failed write due to ENOSPC calling xfs_vm_write_failed such as:

    [ 5012.329024] ino 0xa0026: vwf to 0x17000, sze 0x1c85ae

    and no action is taken on it. This leaves behind a delayed
    allocation extent that has no page covering it and no data in it:

    [ 5015.867162] ino 0xa0026: blks: 0x83 delay blocks 0x1, size 0x2538c0
    [ 5015.868293] ext 0: off 0x4a, fsb 0x50306, len 0x1
    [ 5015.869095] ext 1: off 0x4b, fsb 0x7899, len 0x6b
    [ 5015.869900] ext 2: off 0xb6, fsb 0xffffffffe0008, len 0x1
                                        ^^^^^^^^^^^^^^^
    [ 5015.871027] ext 3: off 0x36e, fsb 0x7a27, len 0xd
    [ 5015.872206] ext 4: off 0x4cf, fsb 0x7a1d, len 0xa

    So the delayed allocation extent is one block long at offset
    0x16c00. Tracing shows that a bigger write:

    xfs_file_buffered_write: size 0x1c85ae offset 0x959d count 0x1ca3f ioflags

    allocates the block, and then fails with ENOSPC trying to allocate
    the last block on the page, leading to a failed write with stale
    delalloc blocks on it.

    Because we've had an ENOSPC when trying to allocate 0x16e00, it
    means that we are never goinge to call ->write_end on the page and
    so the allocated new buffer will not get marked dirty or have the
    buffer_new state cleared. In other works, what the above write is
    supposed to end up with is this mapping for the page:

        +------+------+------+------+------+------+------+------+
          UMA    UMA    UMA    UMA    UMA    UMA    UND    FAIL

    where:  U = uptodate
            M = mapped
            N = new
            A = allocated
            D = delalloc
            FAIL = block we ENOSPC'd on.

    and the key point being the buffer_new() state for the newly
    allocated delayed allocation block. Except it doesn't - we're not
    marking buffers new correctly.

    That buffer_new() problem goes back to the xfs_iomap removal days,
    where xfs_iomap() used to return a "new" status for any map with
    newly allocated blocks, so that __xfs_get_blocks() could call
    set_buffer_new() on it. We still have the "new" variable and the
    check for it in the set_buffer_new() logic - except we never set it
    now!

    Hence that newly allocated delalloc block doesn't have the new flag
    set on it, so when the write fails we cannot tell which blocks we
    are supposed to punch out. WHy do we need the buffer_new flag? Well,
    that's because we can have this case:

        +------+------+------+------+------+------+------+------+
          UMD    UMD    UMD    UMD    UMD    UMD    UND    FAIL

    where all the UMD buffers contain valid data from a previously
    successful write() system call. We only want to punch the UND buffer
    because that's the only one that we added in this write and it was
    only this write that failed.

    That implies that even the old buffer_new() logic was wrong -
    because it would result in all those UMD buffers on the page having
    set_buffer_new() called on them even though they aren't new. Hence
    we shoul donly be calling set_buffer_new() for delalloc buffers that
    were allocated (i.e. were a hole before xfs_iomap_write_delay() was
    called).

    So, fix this set_buffer_new logic according to how we need it to
    work for handling failed writes correctly. Also, restore the new
    buffer logic handling for blocks allocated via
    xfs_iomap_write_direct(), because it should still set the buffer_new
    flag appropriately for newly allocated blocks, too.

    SO, now we have the buffer_new() being set appropriately in
    __xfs_get_blocks(), we can detect the exact delalloc ranges that
    we allocated in a failed write, and hence can now do a walk of the
    buffers on a page to find them.

    Except, it's not that easy. When block_write_begin() fails, it
    unlocks and releases the page that we just had an error on, so we
    can't use that page to handle errors anymore. We have to get access
    to the page while it is still locked to walk the buffers. Hence we
    have to open code block_write_begin() in xfs_vm_write_begin() to be
    able to insert xfs_vm_write_failed() is the right place.

    With that, we can pass the page and write range to
    xfs_vm_write_failed() and walk the buffers on the page, looking for
    delalloc buffers that are either new or beyond EOF and punch them
    out. Handling buffers beyond EOF ensures we still handle the
    existing case that xfs_vm_write_failed() handles.

    Of special note is the truncate_pagecache() handling - that only
    should be done for pages outside EOF - pages within EOF can still
    contain valid, dirty data so we must not punch them out of the
    cache.

    That just leaves the xfs_vm_write_end() failure handling.
    The only failure case here is that we didn't copy the entire range,
    and generic_write_end() handles that by zeroing the region of the
    page that wasn't copied, we don't have to punch out blocks within
    the file because they are guaranteed to contain zeros. Hence we only
    have to handle the existing "beyond EOF" case and don't need access
    to the buffers on the page. Hence it remains largely unchanged.

    Note that xfs_getbmap() can still trip over delalloc blocks beyond
    EOF that are left there by speculative delayed allocation. Hence
    this bug fix does not solve all known issues with bmap vs delalloc,
    but it does fix all the the known accidental occurances of the
    problem.

    Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

-----------------------------------------------------------------------

Summary of changes:
 fs/xfs/Makefile            |    2 +-
 fs/xfs/xfs_ag.h            |   18 --
 fs/xfs/xfs_alloc.c         |  585 +-----------------------------------------
 fs/xfs/xfs_alloc.h         |   28 --
 fs/xfs/xfs_alloc_btree.c   |    9 +-
 fs/xfs/xfs_aops.c          |  178 +++++++++----
 fs/xfs/xfs_attr.c          |   25 +-
 fs/xfs/xfs_attr_leaf.c     |    3 +-
 fs/xfs/xfs_bmap.c          |   30 ++-
 fs/xfs/xfs_bmap.h          |    3 +
 fs/xfs/xfs_bmap_btree.c    |    1 -
 fs/xfs/xfs_btree.c         |    1 -
 fs/xfs/xfs_buf.c           |  252 +++++++++---------
 fs/xfs/xfs_buf.h           |   68 ++---
 fs/xfs/xfs_buf_item.c      |   16 +-
 fs/xfs/xfs_da_btree.c      |   17 +-
 fs/xfs/xfs_dfrag.c         |    2 -
 fs/xfs/xfs_dir2.c          |    1 -
 fs/xfs/xfs_dir2_block.c    |    1 -
 fs/xfs/xfs_dir2_data.c     |    1 -
 fs/xfs/xfs_dir2_leaf.c     |    1 -
 fs/xfs/xfs_dir2_node.c     |    1 -
 fs/xfs/xfs_dir2_sf.c       |    1 -
 fs/xfs/xfs_discard.c       |    6 +-
 fs/xfs/xfs_dquot.c         |    1 -
 fs/xfs/xfs_dquot_item.c    |    2 -
 fs/xfs/xfs_error.c         |    1 -
 fs/xfs/xfs_export.c        |    1 -
 fs/xfs/xfs_extent_busy.c   |  603 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_extent_busy.h   |   65 +++++
 fs/xfs/xfs_extfree_item.c  |    1 -
 fs/xfs/xfs_file.c          |    2 -
 fs/xfs/xfs_fsops.c         |   82 +++++-
 fs/xfs/xfs_ialloc.c        |   10 +-
 fs/xfs/xfs_ialloc.h        |    9 -
 fs/xfs/xfs_ialloc_btree.c  |    1 -
 fs/xfs/xfs_iget.c          |    1 -
 fs/xfs/xfs_inode.c         |   32 ++-
 fs/xfs/xfs_inode.h         |    2 +
 fs/xfs/xfs_inode_item.c    |    2 -
 fs/xfs/xfs_inum.h          |    5 -
 fs/xfs/xfs_ioctl.c         |    2 -
 fs/xfs/xfs_ioctl32.c       |    2 -
 fs/xfs/xfs_iomap.c         |   12 +-
 fs/xfs/xfs_iops.c          |    3 -
 fs/xfs/xfs_itable.c        |    1 -
 fs/xfs/xfs_log.c           |   16 +-
 fs/xfs/xfs_log_cil.c       |    9 +-
 fs/xfs/xfs_log_recover.c   |   54 ++--
 fs/xfs/xfs_message.c       |    1 -
 fs/xfs/xfs_mount.c         |   21 +-
 fs/xfs/xfs_qm.c            |    1 -
 fs/xfs/xfs_qm_bhv.c        |    2 -
 fs/xfs/xfs_qm_syscalls.c   |    1 -
 fs/xfs/xfs_quotaops.c      |    1 -
 fs/xfs/xfs_rename.c        |    1 -
 fs/xfs/xfs_rtalloc.c       |   10 +-
 fs/xfs/xfs_rw.c            |  156 ------------
 fs/xfs/xfs_rw.h            |   47 ----
 fs/xfs/xfs_super.c         |    1 -
 fs/xfs/xfs_sync.c          |    1 -
 fs/xfs/xfs_trace.c         |    2 -
 fs/xfs/xfs_trace.h         |   30 +--
 fs/xfs/xfs_trans.c         |    7 +-
 fs/xfs/xfs_trans_ail.c     |    1 -
 fs/xfs/xfs_trans_buf.c     |   40 +--
 fs/xfs/xfs_trans_dquot.c   |    2 -
 fs/xfs/xfs_trans_extfree.c |    1 -
 fs/xfs/xfs_trans_inode.c   |    2 -
 fs/xfs/xfs_types.h         |    5 +
 fs/xfs/xfs_utils.c         |    2 -
 fs/xfs/xfs_vnodeops.c      |   29 ++-
 72 files changed, 1234 insertions(+), 1299 deletions(-)
 create mode 100644 fs/xfs/xfs_extent_busy.c
 create mode 100644 fs/xfs/xfs_extent_busy.h
 delete mode 100644 fs/xfs/xfs_rw.c
 delete mode 100644 fs/xfs/xfs_rw.h

hooks/post-receive
-- 
XFS development tree

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs