[XFS updates] XFS development tree branch, for-linus, updated. v3.7-rc1-78-gf9668a0

xfs@xxxxxxxxxxx · Tue, 11 Dec 2012 11:05:33 -0600

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "XFS development tree".

The branch, for-linus has been updated
  discards  d69043c42d8c6414fa28ad18d99973aa6c1c2e24 (commit)
  discards  3daed8bc3e49b9695ae931b9f472b5b90d1965b3 (commit)
  discards  42e2976f131d65555d5c1d6c3d47facc63577814 (commit)
  discards  6ce377afd1755eae5c93410ca9a1121dfead7b87 (commit)
  discards  03b1293edad462ad1ad62bcc5160c76758e450d5 (commit)
  discards  4b62acfe99e158fb7812982d1cf90a075710a92c (commit)
  discards  ca250b1b3d711936d7dae9e97871f2261347f82d (commit)
  discards  1e7acbb7bc1ae7c1c62fd1310b3176a820225056 (commit)
  discards  eaef854335ce09956e930fe4a193327417edc6c9 (commit)
  discards  1f3c785c3adb7d2b109ec7c8f10081d1294b03d3 (commit)
  discards  326c03555b914ff153ba5b40df87fd6e28e7e367 (commit)
  discards  408cc4e97a3ccd172d2d676e4b585badf439271b (commit)
  discards  7e9620f21d8c9e389fd6845487e07d5df898a2e4 (commit)
  f9668a0 xfs: fix sparse reported log CRC endian issue
  b870553 xfs: fix stray dquot unlock when reclaiming dquots
  437a255 xfs: fix direct IO nested transaction deadlock.
  ef9d873 xfs: byte range granularity for XFS_IOC_ZERO_RANGE
  7c4cebe xfs: inode allocation should use unmapped buffers.
  0e446be xfs: add CRC checks to the log
  bc02e86 xfs: add CRC infrastructure
  1813dd6 xfs: convert buffer verifiers to an ops structure.
  b0f539d xfs: connect up write verifiers to new buffers
  612cfbf xfs: add pre-write metadata buffer verifier callbacks
  cfb0285 xfs: add buffer pre-write callback
  da6958c xfs: Add verifiers to dir2 data readahead.
  d9392a4 xfs: add xfs_da_node verification
  ad14c33 xfs: factor and verify attr leaf reads
  e6f7667 xfs: factor dir2 leaf read
  e481357 xfs: factor out dir2 data block reading
  2025207 xfs: factor dir2 free block reading
  82025d7 xfs: verify dir2 block format buffers
  20f7e9f xfs: factor dir2 block read operations
  4bb20a8 xfs: add verifier callback to directory read code
  c631919 xfs: verify dquot blocks as they are read from disk
  3d3e6f6 xfs: verify btree blocks as they are read from disk
  af133e8 xfs: verify inode buffers as they are read from disk
  bb80c6d xfs: verify AGFL blocks as they are read from disk
  3702ce6 xfs: verify AGI blocks as they are read from disk
  5d5f527 xfs: verify AGF blocks as they are read from disk
  9802182 xfs: verify superblocks as they are read from disk
  eab4e63 xfs: uncached buffer reads need to return an error
  c3f8fc7 xfs: make buffer read verication an IO completion function
  fb59581 xfs: remove xfs_flushinval_pages
  4bc1ea6 xfs: remove xfs_flush_pages
  95eacf0 xfs: remove xfs_wait_on_pages()
  d6638ae xfs: reverse the check on XFS_IOC_ZERO_RANGE
  f5b8911 xfs: remove xfs_tosspages
  de49768 xfs: make growfs initialise the AGFL header
  fd23683 xfs: growfs: use uncached buffers for new headers
  b64f3a3 xfs: use btree block initialisation functions in growfs
  ee73259 xfs: add more attribute tree trace points.
  37eb17e xfs: drop buffer io reference when a bad bio is built
  7bf7f35 xfs: fix broken error handling in xfs_vm_writepage
  07428d7 xfs: fix attr tree double split corruption
  579b62f xfs: add background scanning to clear eofblocks inodes
  00ca79a xfs: add minimum file size filtering to eofblocks scan
  1b55604 xfs: support multiple inode id filtering in eofblocks scan
  3e3f9f5 xfs: add inode id filtering to eofblocks scan
  8ca149d xfs: add XFS_IOC_FREE_EOFBLOCKS ioctl
  41176a6 xfs: create function to scan and clear EOFBLOCKS inodes
  40165e2 xfs: make xfs_free_eofblocks() non-static, return EAGAIN on trylock failure
  72b53ef xfs: create helper to check whether to free eofblocks on inode
  a454f74 xfs: support a tag-based inode_ag_iterator
  27b5286 xfs: add EOFBLOCKS inode tagging/untagging
  69a58a4 xfs: report projid32bit feature in geometry call
  009507b xfs: fix reading of wrapped log data
  137fff0 xfs: fix buffer shudown reference count mismatch
  b6aff29 xfs: don't vmap inode cluster buffers during free
  4c05f9a xfs: invalidate allocbt blocks moved to the free list
  cd856db xfs: Update inode alloc comments
  531c3bd xfs: silence uninitialised f.file warning.
  1375cb6 xfs: growfs: don't read garbage for new secondary superblocks
  e04426b xfs: move allocation stack switch up to xfs_bmapi_allocate
  2455881 xfs: introduce XFS_BMAPI_STACK_SWITCH
  a004168 xfs: zero allocation_args on the kernel stack
  d35e88f xfs: only update the last_sync_lsn when a transaction completes
  33479e0 xfs: remove xfs_iget.c
  fa96aca xfs: move inode locking functions to xfs_inode.c
  6d8b79c xfs: rename xfs_sync.[ch] to xfs_icache.[ch]
  c75921a xfs: xfs_quiesce_attr() should quiesce the log like unmount
  c7eea6f xfs: move xfs_quiesce_attr() into xfs_super.c
  34061f5 xfs: xfs_sync_fsdata is redundant
  5889608 xfs: syncd workqueue is no more
  9aa0500 xfs: xfs_sync_data is redundant.
  cf2931d xfs: Bring some sanity to log unmounting
  f661f1e xfs: sync work is now only periodic log work
  7f7bebe xfs: don't run the sync work if the filesystem is read-only
  7e18530 xfs: rationalise xfs_mount_wq users
  33c7a2b xfs: xfs_syncd_stop must die
      from  d69043c42d8c6414fa28ad18d99973aa6c1c2e24 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit f9668a09e32ac6d2aa22f44cc310e430a8f4a40f
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Wed Nov 28 13:01:03 2012 +1100

    xfs: fix sparse reported log CRC endian issue

    Not a bug as such, just warning noise from the xlog_cksum()
    returning a __be32 type when it should be returning a __le32 type.

    On Wed, Nov 28, 2012 at 08:30:59AM -0500, Christoph Hellwig wrote:
    > But why are we storing the crc field little endian while all other on
    > disk formats are big endian? (And yes I realize it might as well have
    > been me who did that back in the idea, but I still have no idea why)

    Because the CRC always returns the calcuation LE format, even on BE
    systems. So rather than always having to byte swap it everywhere and
    have all the force casts and anootations for sparse, it seems simpler to
    just make it a __le32 everywhere....

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Ben Myers <bpm@xxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit b870553cdecb26d5291af09602352b763e323df2
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Wed Nov 28 13:01:02 2012 +1100

    xfs: fix stray dquot unlock when reclaiming dquots

    When we fail to get a dquot lock during reclaim, we jump to an error
    handler that unlocks the dquot. This is wrong as we didn't lock the
    dquot, and unlocking it means who-ever is holding the lock has had
    it silently taken away, and hence it results in a lock imbalance.

    Found by inspection while modifying the code for the numa-lru
    patchset. This fixes a random hang I've been seeing on xfstest 232
    for the past several months.

    cc: <stable@xxxxxxxxxxxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 437a255aa23766666aec78af63be4c253faa8d57
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Wed Nov 28 13:01:00 2012 +1100

    xfs: fix direct IO nested transaction deadlock.

    The direct IO path can do a nested transaction reservation when
    writing past the EOF. The first transaction is the append
    transaction for setting the filesize at IO completion, but we can
    also need a transaction for allocation of blocks. If the log is low
    on space due to reservations and small log, the append transaction
    can be granted after wating for space as the only active transaction
    in the system. This then attempts a reservation for an allocation,
    which there isn't space in the log for, and the reservation sleeps.
    The result is that there is nothing left in the system to wake up
    all the processes waiting for log space to come free.

    The stack trace that shows this deadlock is relatively innocuous:

     xlog_grant_head_wait
     xlog_grant_head_check
     xfs_log_reserve
     xfs_trans_reserve
     xfs_iomap_write_direct
     __xfs_get_blocks
     xfs_get_blocks_direct
     do_blockdev_direct_IO
     __blockdev_direct_IO
     xfs_vm_direct_IO
     generic_file_direct_write
     xfs_file_dio_aio_writ
     xfs_file_aio_write
     do_sync_write
     vfs_write

    This was discovered on a filesystem with a log of only 10MB, and a
    log stripe unit of 256k whih increased the base reservations by
    512k. Hence a allocation transaction requires 1.2MB of log space to
    be available instead of only 260k, and so greatly increased the
    chance that there wouldn't be enough log space available for the
    nested transaction to succeed. The key to reproducing it is this
    mkfs command:

    mkfs.xfs -f -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b $SCRATCH_DEV

    The test case was a 1000 fsstress processes running with random
    freeze and unfreezes every few seconds. Thanks to Eryu Guan
    (eguan@xxxxxxxxxx) for writing the test that found this on a system
    with a somewhat unique default configuration....

    cc: <stable@xxxxxxxxxxxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Andrew Dahl <adahl@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit ef9d873344ff9f5084eacb9f3735982314dfda9e
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Thu Nov 29 15:26:33 2012 +1100

    xfs: byte range granularity for XFS_IOC_ZERO_RANGE

    XFS_IOC_ZERO_RANGE simply does not work properly for non page cache
    aligned ranges. Neither test 242 or 290 exercise this correctly, so
    the behaviour is completely busted even though the tests pass.

    Fix it to support full byte range granularity as was originally
    intended for this ioctl.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 7c4cebe8e02dd0b0e655605442bbe9268db9ed4f
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Nov 23 14:24:23 2012 +1100

    xfs: inode allocation should use unmapped buffers.

    Inode buffers do not need to be mapped as inodes are read or written
    directly from/to the pages underlying the buffer. This fixes a
    regression introduced by commit 611c994 ("xfs: make XBF_MAPPED the
    default behaviour").

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 0e446be44806240c779666591bb9e8cb0e86a50d
Author: Christoph Hellwig <hch@xxxxxx>
Date:   Mon Nov 12 22:54:24 2012 +1100

    xfs: add CRC checks to the log

    Implement CRCs for the log buffers.  We re-use a field in
    struct xlog_rec_header that was used for a weak checksum of the
    log buffer payload in debug builds before.

    The new checksumming uses the crc32c checksum we will use elsewhere
    in XFS, and also protects the record header and addition cycle data.

    Due to this there are some interesting changes in xlog_sync, as we
    need to do the cycle wrapping for the split buffer case much earlier,
    as we would touch the buffer after generating the checksum otherwise.

    The CRC calculation is always enabled, even for non-CRC filesystems,
    as adding this CRC does not change the log format. On non-CRC
    filesystems, only issue an alert if a CRC mismatch is found and
    allow recovery to continue - this will act as an indicator that
    log recovery problems are a result of log corruption. On CRC enabled
    filesystems, however, log recovery will fail.

    Note that existing debug kernels will write a simple checksum value
    to the log, so the first time this is run on a filesystem taht was
    last used on a debug kernel it will through CRC mismatch warning
    errors. These can be ignored.

    Initially based on a patch from Dave Chinner, then modified
    significantly by Christoph Hellwig.  Modified again by Dave Chinner
    to get to this version.

    Signed-off-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit bc02e8693d875c2a9b0037cfd37fe0b726d26403
Author: Christoph Hellwig <hch@xxxxxx>
Date:   Fri Nov 16 09:20:37 2012 +1100

    xfs: add CRC infrastructure

     - add a mount feature bit for CRC enabled filesystems
     - add some helpers for generating and verifying the CRCs
     - add a copy_uuid helper

    The checksumming helpers are loosely based on similar ones in sctp,
    all other bits come from Dave Chinner.

    Signed-off-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 1813dd64057490e7a0678a885c4fe6d02f78bdc1
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Wed Nov 14 17:54:40 2012 +1100

    xfs: convert buffer verifiers to an ops structure.

    To separate the verifiers from iodone functions and associate read
    and write verifiers at the same time, introduce a buffer verifier
    operations structure to the xfs_buf.

    This avoids the need for assigning the write verifier, clearing the
    iodone function and re-running ioend processing in the read
    verifier, and gets rid of the nasty "b_pre_io" name for the write
    verifier function pointer. If we ever need to, it will also be
    easier to add further content specific callbacks to a buffer with an
    ops structure in place.

    We also avoid needing to export verifier functions, instead we
    can simply export the ops structures for those that are needed
    outside the function they are defined in.

    This patch also fixes a directory block readahead verifier issue
    it exposed.

    This patch also adds ops callbacks to the inode/alloc btree blocks
    initialised by growfs. These will need more work before they will
    work with CRCs.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit b0f539de9fcc543a3ffa40bc22bf51aca6ea6183
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Wed Nov 14 17:53:49 2012 +1100

    xfs: connect up write verifiers to new buffers

    Metadata buffers that are read from disk have write verifiers
    already attached to them, but newly allocated buffers do not. Add
    appropriate write verifiers to all new metadata buffers.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Ben Myers <bpm@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 612cfbfe174a89d565363fff7f3961a2dda5fb71
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Wed Nov 14 17:52:32 2012 +1100

    xfs: add pre-write metadata buffer verifier callbacks

    These verifiers are essentially the same code as the read verifiers,
    but do not require ioend processing. Hence factor the read verifier
    functions and add a new write verifier wrapper that is used as the
    callback.

    This is done as one large patch for all verifiers rather than one
    patch per verifier as the change is largely mechanical. This
    includes hooking up the write verifier via the read verifier
    function.

    Hooking up the write verifier for buffers obtained via
    xfs_trans_get_buf() will be done in a separate patch as that touches
    code in many different places rather than just the verifier
    functions.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit cfb02852226aa449fe27075caffe88726507668c
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:19 2012 +1100

    xfs: add buffer pre-write callback

    Add a callback to the buffer write path to enable verification of
    the buffer and CRC calculation prior to issuing the write to the
    underlying storage.

    If the callback function detects some kind of failure or error
    condition, it must mark the buffer with an error so that the caller
    can take appropriate action. In the case of xfs_buf_ioapply(), a
    corrupt metadta buffer willt rigger a shutdown of the filesystem,
    because something is clearly wrong and we can't allow corrupt
    metadata to be written to disk.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit da6958c873ecd846d71fafbfe0f6168bb9c2c99e
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:18 2012 +1100

    xfs: Add verifiers to dir2 data readahead.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit d9392a4bb75503fc2adbb5237c3df940c6467eb2
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:17 2012 +1100

    xfs: add xfs_da_node verification

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit ad14c33ac862601c4c22755ed3b59f1906b134e5
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:16 2012 +1100

    xfs: factor and verify attr leaf reads

    Some reads are not converted yet because it isn't obvious ahead of
    time what the format of the block is going to be. Need to determine
    how to tell if the first block in the tree is a node or leaf format
    block. That will be done in later patches.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit e6f7667c4eef42b6f5bc6cdeb31d0bab62fe5f79
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:15 2012 +1100

    xfs: factor dir2 leaf read

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit e4813572640e27d3a5cce3f06751a9f54f77aaa5
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:14 2012 +1100

    xfs: factor out dir2 data block reading

    And add a verifier callback function while there.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 2025207ca6738a1217126ef14af9d104433f9824
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:13 2012 +1100

    xfs: factor dir2 free block reading

    Also factor out the updating of the free block when removing entries
    from leaf blocks, and add a verifier callback for reads.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 82025d7f79148fe66a1594a0ebe4ab38152cf9e6
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:12 2012 +1100

    xfs: verify dir2 block format buffers

    Add a dir2 block format read verifier. To fully verify every block
    when read, call xfs_dir2_data_check() on them. Change
    xfs_dir2_data_check() to do runtime checking, convert ASSERT()
    checks to XFS_WANT_CORRUPTED_RETURN(), which will trigger an ASSERT
    failure on debug kernels, but on production kernels will dump an
    error to dmesg and return EFSCORRUPTED to the caller.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 20f7e9f3726a27cccade65c28265eef8ca50eecb
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:11 2012 +1100

    xfs: factor dir2 block read operations

    In preparation for verifying dir2 block format buffers, factor
    the read operations out of the block operations (lookup, addname,
    getdents) and some of the additional logic to make it easier to
    understand an dmodify the code.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Ben Myers <bpm@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 4bb20a83a2a5ac4dcb62780c9950e47939956126
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:10 2012 +1100

    xfs: add verifier callback to directory read code

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit c6319198702350a2215a8c0cacd6cc4283728a1b
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Wed Nov 14 17:50:13 2012 +1100

    xfs: verify dquot blocks as they are read from disk

    Add a dquot buffer verify callback function and pass it into the
    buffer read functions. This checks all the dquots in a buffer, but
    cannot completely verify the dquot ids are correct. Also, errors
    cannot be repaired, so an additional function is added to repair bad
    dquots in the buffer if such an error is detected in a context where
    repair is allowed.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 3d3e6f64e22c94115d47de670611bcd3ecda3796
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:08 2012 +1100

    xfs: verify btree blocks as they are read from disk

    Add an btree block verify callback function and pass it into the
    buffer read functions. Because each different btree block type
    requires different verification, add a function to the ops structure
    that is called from the generic code.

    Also, propagate the verification callback functions through the
    readahead functions, and into the external bmap and bulkstat inode
    readahead code that uses the generic btree buffer read functions.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit af133e8606d32c2aed43870491ebbdc56feec8a8
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:07 2012 +1100

    xfs: verify inode buffers as they are read from disk

    Add an inode buffer verify callback function and pass it into the
    buffer read functions. Inodes are special in that the verbose checks
    will be done when reading the inode, but we still need to sanity
    check the buffer when that is first read. Always verify the magic
    numbers in all inodes in the buffer, rather than jus ton debug
    kernels.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit bb80c6d79a3b0f9b6c3236a4bec021c72615bfd1
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:06 2012 +1100

    xfs: verify AGFL blocks as they are read from disk

    Add an AGFL block verify callback function and pass it into the
    buffer read functions.

    While this commit adds verification code to the AGFL, it cannot be
    used reliably until the CRC format change comes along as mkfs does
    not initialise the full AGFL. Hence it can be full of garbage at the
    first mount and will fail verification right now. CRC enabled
    filesystems won't have this problem, so leave the code that has
    already been written ifdef'd out until the proper time.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 3702ce6ed71cd60451ab278088863456dcb0dd99
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:05 2012 +1100

    xfs: verify AGI blocks as they are read from disk

    Add an AGI block verify callback function and pass it into the
    buffer read functions. Remove the now redundant verification code
    that is currently in use.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 5d5f527d13369d0047d52b7ac4ddee4f8c0ad173
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Wed Nov 14 17:44:56 2012 +1100

    xfs: verify AGF blocks as they are read from disk

    Add an AGF block verify callback function and pass it into the
    buffer read functions. This replaces the existing verification that
    is done after the read completes.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 98021821a502db347bd9c7671beeee6e8ce07ea6
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:03 2012 +1100

    xfs: verify superblocks as they are read from disk

    Add a superblock verify callback function and pass it into the
    buffer read functions. Remove the now redundant verification code
    that is currently in use.

    Adding verification shows that secondary superblocks never have
    their "sb_inprogress" flag cleared by mkfs.xfs, so when validating
    the secondary superblocks during a grow operation we have to avoid
    checking this field. Even if we fix mkfs, we will still have to
    ignore this field for verification purposes unless a version of mkfs
    that does not have this bug was used.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit eab4e63368b4cfa597dbdac66d1a7a836a693b7d
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:02 2012 +1100

    xfs: uncached buffer reads need to return an error

    With verification being done as an IO completion callback, different
    errors can be returned from a read. Uncached reads only return a
    buffer or NULL on failure, which means the verification error cannot
    be returned to the caller.

    Split the error handling for these reads into two - a failure to get
    a buffer will still return NULL, but a read error will return a
    referenced buffer with b_error set rather than NULL. The caller is
    responsible for checking the error state of the buffer returned.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit c3f8fc73ac97b76a12692088ef9cace9af8422c0
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:01 2012 +1100

    xfs: make buffer read verication an IO completion function

    Add a verifier function callback capability to the buffer read
    interfaces.  This will be used by the callers to supply a function
    that verifies the contents of the buffer when it is read from disk.
    This patch does not provide callback functions, but simply modifies
    the interfaces to allow them to be called.

    The reason for adding this to the read interfaces is that it is very
    difficult to tell fom the outside is a buffer was just read from
    disk or whether we just pulled it out of cache. Supplying a callbck
    allows the buffer cache to use it's internal knowledge of the buffer
    to execute it only when the buffer is read from disk.

    It is intended that the verifier functions will mark the buffer with
    an EFSCORRUPTED error when verification fails. This allows the
    reading context to distinguish a verification error from an IO
    error, and potentially take further actions on the buffer (e.g.
    attempt repair) based on the error reported.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit fb59581404ab7ec5075299065c22cb211a9262a9
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:53:57 2012 +1100

    xfs: remove xfs_flushinval_pages

    It's just a simple wrapper around VFS functionality, and is actually
    bugging in that it doesn't remove mappings before invalidating the
    page cache. Remove it and replace it with the correct VFS
    functionality.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Andrew Dahl <adahl@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 4bc1ea6b8ddd4f2bd78944fbe5a1042ac14b1f5f
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:53:56 2012 +1100

    xfs: remove xfs_flush_pages

    It is a complex wrapper around VFS functions, but there are VFS
    functions that provide exactly the same functionality. Call the VFS
    functions directly and remove the unnecessary indirection and
    complexity.

    We don't need to care about clearing the XFS_ITRUNCATED flag, as
    that is done during .writepages. Hence is cleared by the VFS
    writeback path if there is anything to write back during the flush.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Andrew Dahl <adahl@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 95eacf0f71b7682a05b8242c49c68e8e4bb673e3
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:53:55 2012 +1100

    xfs: remove xfs_wait_on_pages()

    It's just a simple wrapper around a VFS function that is only called
    by another function in xfs_fs_subr.c. Remove it and call the VFS
    function directly.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Andrew Dahl <adahl@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit d6638ae244f6323fcdf85e72eb4a5af6f6212893
Author: Andrew Dahl <adahl@xxxxxxx>
Date:   Wed Nov 14 12:52:26 2012 -0600

    xfs: reverse the check on XFS_IOC_ZERO_RANGE

    Reversing the check on XFS_IOC_ZERO_RANGE.

    Range should be zeroed if the start is less than or equal to the end.

    Signed-off-by: Andrew Dahl <adahl@xxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit f5b8911b67eb4f15d95d5e5324d376d4a49d56e8
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Wed Nov 14 17:42:47 2012 +1100

    xfs: remove xfs_tosspages

    It's a buggy, unnecessary wrapper that is duplicating
    truncate_pagecache_range().

    When replacing the call in xfs_change_file_space(), also ensure that
    the length being allocated/freed is always positive before making
    any changes. These checks are done in the lower extent manipulation
    functions, too, but we need to do them before any page cache
    operations.

    Reported-by: Andrew Dahl <adahl@xxxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-By: Andrew Dahl <adahl@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit de497688daaabbab425a8a969528272ec1d962a6
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:54:00 2012 +1100

    xfs: make growfs initialise the AGFL header

    For verification purposes, AGFLs need to be initialised to a known
    set of values. For upcoming CRC changes, they are also headers that
    need to be initialised. Currently, growfs does neither for the AGFLs
    - it ignores them completely. Add initialisation of the AGFL to be
    full of invalid block numbers (NULLAGBLOCK) to put the
    infrastructure in place needed for CRC support.

    Includes a comment clarification from Jeff Liu.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by Rich Johnston <rjohnston@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit fd23683c3b1ab905cba61ea2981c156f4bf52845
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:53:59 2012 +1100

    xfs: growfs: use uncached buffers for new headers

    When writing the new AG headers to disk, we can't attach write
    verifiers because they have a dependency on the struct xfs-perag
    being attached to the buffer to be fully initialised and growfs
    can't fully initialise them until later in the process.

    The simplest way to avoid this problem is to use uncached buffers
    for writing the new headers. These buffers don't have the xfs-perag
    attached to them, so it's simple to detect in the write verifier and
    be able to skip the checks that need the xfs-perag.

    This enables us to attach the appropriate buffer ops to the buffer
    and hence calculate CRCs on the way to disk. IT also means that the
    buffer is torn down immediately, and so the first access to the AG
    headers will re-read the header from disk and perform full
    verification of the buffer. This way we also can catch corruptions
    due to problems that went undetected in growfs.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by Rich Johnston <rjohnston@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit b64f3a390d3477517cbff7d613e551705540769b
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Nov 13 16:40:27 2012 -0600

    xfs: use btree block initialisation functions in growfs

    Factor xfs_btree_init_block() to be independent of the btree cursor,
    and use the function to initialise btree blocks in the growfs code.
    This makes adding support for different format btree blocks simple.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by Rich Johnston <rjohnston@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit ee73259b401317117e7f5d4834c270b10b12bc8e
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:53:53 2012 +1100

    xfs: add more attribute tree trace points.

    Added when debugging recent attribute tree problems to more finely
    trace code execution through the maze of twisty passages that makes
    up the attr code.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 37eb17e604ac7398bbb133c82f281475d704fff7
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:09:46 2012 +1100

    xfs: drop buffer io reference when a bad bio is built

    Error handling in xfs_buf_ioapply_map() does not handle IO reference
    counts correctly. We increment the b_io_remaining count before
    building the bio, but then fail to decrement it in the failure case.
    This leads to the buffer never running IO completion and releasing
    the reference that the IO holds, so at unmount we can leak the
    buffer. This leak is captured by this assert failure during unmount:

    XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 273

    This is not a new bug - the b_io_remaining accounting has had this
    problem for a long, long time - it's just very hard to get a
    zero length bio being built by this code...

    Further, the buffer IO error can be overwritten on a multi-segment
    buffer by subsequent bio completions for partial sections of the
    buffer. Hence we should only set the buffer error status if the
    buffer is not already carrying an error status. This ensures that a
    partial IO error on a multi-segment buffer will not be lost. This
    part of the problem is a regression, however.

    cc: <stable@xxxxxxxxxxxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 7bf7f352194252e6f05981d44fb8cb55668606cd
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:09:45 2012 +1100

    xfs: fix broken error handling in xfs_vm_writepage

    When we shut down the filesystem, it might first be detected in
    writeback when we are allocating a inode size transaction. This
    happens after we have moved all the pages into the writeback state
    and unlocked them. Unfortunately, if we fail to set up the
    transaction we then abort writeback and try to invalidate the
    current page. This then triggers are BUG() in block_invalidatepage()
    because we are trying to invalidate an unlocked page.

    Fixing this is a bit of a chicken and egg problem - we can't
    allocate the transaction until we've clustered all the pages into
    the IO and we know the size of it (i.e. whether the last block of
    the IO is beyond the current EOF or not). However, we don't want to
    hold pages locked for long periods of time, especially while we lock
    other pages to cluster them into the write.

    To fix this, we need to make a clear delineation in writeback where
    errors can only be handled by IO completion processing. That is,
    once we have marked a page for writeback and unlocked it, we have to
    report errors via IO completion because we've already started the
    IO. We may not have submitted any IO, but we've changed the page
    state to indicate that it is under IO so we must now use the IO
    completion path to report errors.

    To do this, add an error field to xfs_submit_ioend() to pass it the
    error that occurred during the building on the ioend chain. When
    this is non-zero, mark each ioend with the error and call
    xfs_finish_ioend() directly rather than building bios. This will
    immediately push the ioends through completion processing with the
    error that has occurred.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 07428d7f0ca46087f7f1efa895322bb9dc1ac21d
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Nov 12 22:09:44 2012 +1100

    xfs: fix attr tree double split corruption

    In certain circumstances, a double split of an attribute tree is
    needed to insert or replace an attribute. In rare situations, this
    can go wrong, leaving the attribute tree corrupted. In this case,
    the attr being replaced is the last attr in a leaf node, and the
    replacement is larger so doesn't fit in the same leaf node.
    When we have the initial condition of a node format attribute
    btree with two leaves at index 1 and 2. Call them L1 and L2.  The
    leaf L1 is completely full, there is not a single byte of free space
    in it. L2 is mostly empty.  The attribute being replaced - call it X
    - is the last attribute in L1.

    The way an attribute replace is executed is that the replacement
    attribute - call it Y - is first inserted into the tree, but has an
    INCOMPLETE flag set on it so that list traversals ignore it. Once
    this transaction is committed, a second transaction it run to
    atomically mark Y as COMPLETE and X as INCOMPLETE, so that a
    traversal will now find Y and skip X. Once that transaction is
    committed, attribute X is then removed.

    So, the initial condition is:

         +--------+     +--------+
         |   L1   |     |   L2   |
         | fwd: 2 |---->| fwd: 0 |
         | bwd: 0 |<----| bwd: 1 |
         | fsp: 0 |     | fsp: N |
         |--------|     |--------|
         | attr A |     | attr 1 |
         |--------|     |--------|
         | attr B |     | attr 2 |
         |--------|     |--------|
         ..........     ..........
         |--------|     |--------|
         | attr X |     | attr n |
         +--------+     +--------+

    So now we go to replace X, and see that L1:fsp = 0 - it is full so
    we can't insert Y in the same leaf. So we record the the location of
    attribute X so we can track it for later use, then we split L1 into
    L1 and L3 and reblance across the two leafs. We end with:

         +--------+     +--------+     +--------+
         |   L1   |     |   L3   |     |   L2   |
         | fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
         | bwd: 0 |<----| bwd: 1 |<----| bwd: 3 |
         | fsp: M |     | fsp: J |     | fsp: N |
         |--------|     |--------|     |--------|
         | attr A |     | attr X |     | attr 1 |
         |--------|     +--------+     |--------|
         | attr B |                    | attr 2 |
         |--------|                    |--------|
         ..........                    ..........
         |--------|                    |--------|
         | attr W |                    | attr n |
         +--------+                    +--------+

    And we track that the original attribute is now at L3:0.

    We then try to insert Y into L1 again, and find that there isn't
    enough room because the new attribute is larger than the old one.
    Hence we have to split again to make room for Y. We end up with
    this:

         +--------+     +--------+     +--------+     +--------+
         |   L1   |     |   L4   |     |   L3   |     |   L2   |
         | fwd: 4 |---->| fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
         | bwd: 0 |<----| bwd: 1 |<----| bwd: 4 |<----| bwd: 3 |
         | fsp: M |     | fsp: J |     | fsp: J |     | fsp: N |
         |--------|     |--------|     |--------|     |--------|
         | attr A |     | attr Y |     | attr X |     | attr 1 |
         |--------|     + INCOMP +     +--------+     |--------|
         | attr B |     +--------+                    | attr 2 |
         |--------|                                   |--------|
         ..........                                   ..........
         |--------|                                   |--------|
         | attr W |                                   | attr n |
         +--------+                                   +--------+

    And now we have the new (incomplete) attribute @ L4:0, and the
    original attribute at L3:0. At this point, the first transaction is
    committed, and we move to the flipping of the flags.

    This is where we are supposed to end up with this:

         +--------+     +--------+     +--------+     +--------+
         |   L1   |     |   L4   |     |   L3   |     |   L2   |
         | fwd: 4 |---->| fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
         | bwd: 0 |<----| bwd: 1 |<----| bwd: 4 |<----| bwd: 3 |
         | fsp: M |     | fsp: J |     | fsp: J |     | fsp: N |
         |--------|     |--------|     |--------|     |--------|
         | attr A |     | attr Y |     | attr X |     | attr 1 |
         |--------|     +--------+     + INCOMP +     |--------|
         | attr B |                    +--------+     | attr 2 |
         |--------|                                   |--------|
         ..........                                   ..........
         |--------|                                   |--------|
         | attr W |                                   | attr n |
         +--------+                                   +--------+

    But that doesn't happen properly - the attribute tracking indexes
    are not pointing to the right locations. What we end up with is both
    the old attribute to be removed pointing at L4:0 and the new
    attribute at L4:1.  On a debug kernel, this assert fails like so:

    XFS: Assertion failed: args->index2 < be16_to_cpu(leaf2->hdr.count), file: fs/xfs/xfs_attr_leaf.c, line: 2725

    because the new attribute location does not exist. On a production
    kernel, this goes unnoticed and the code proceeds ahead merrily and
    removes L4 because it thinks that is the block that is no longer
    needed. This leaves the hash index node pointing to entries
    L1, L4 and L2, but only blocks L1, L3 and L2 to exist. Further, the
    leaf level sibling list is L1 <-> L4 <-> L2, but L4 is now free
    space, and so everything is busted. This corruption is caused by the
    removal of the old attribute triggering a join - it joins everything
    correctly but then frees the wrong block.

    xfs_repair will report something like:

    bad sibling back pointer for block 4 in attribute fork for inode 131
    problem with attribute contents in inode 131
    would clear attr fork
    bad nblocks 8 for inode 131, would reset to 3
    bad anextents 4 for inode 131, would reset to 0

    The problem lies in the assignment of the old/new blocks for
    tracking purposes when the double leaf split occurs. The first split
    tries to place the new attribute inside the current leaf (i.e.
    "inleaf == true") and moves the old attribute (X) to the new block.
    This sets up the old block/index to L1:X, and newly allocated
    block to L3:0. It then moves attr X to the new block and tries to
    insert attr Y at the old index. That fails, so it splits again.

    With the second split, the rebalance ends up placing the new attr in
    the second new block - L4:0 - and this is where the code goes wrong.
    What is does is it sets both the new and old block index to the
    second new block. Hence it inserts attr Y at the right place (L4:0)
    but overwrites the current location of the attr to replace that is
    held in the new block index (currently L3:0). It over writes it with
    L4:1 - the index we later assert fail on.

    Hopefully this table will show this in a foramt that is a bit easier
    to understand:

    Split		old attr index		new attr index
    		vanilla	patched		vanilla	patched
    before 1st	L1:26	L1:26		N/A	N/A
    after 1st	L3:0	L3:0		L1:26	L1:26
    after 2nd	L4:0	L3:0		L4:1	L4:0
                    ^^^^			^^^^
    		wrong			wrong

    The fix is surprisingly simple, for all this analysis - just stop
    the rebalance on the out-of leaf case from overwriting the new attr
    index - it's already correct for the double split case.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 579b62faa5fb16ffeeb88cda5e2c4e95730881af
Author: Brian Foster <bfoster@xxxxxxxxxx>
Date:   Tue Nov 6 09:50:47 2012 -0500

    xfs: add background scanning to clear eofblocks inodes

    Create a new mount workqueue and delayed_work to enable background
    scanning and freeing of eofblocks inodes. The scanner kicks in once
    speculative preallocation occurs and stops requeueing itself when
    no eofblocks inodes exist.

    The scan interval is based on the new
    'speculative_prealloc_lifetime' tunable (default to 5m). The
    background scanner performs unfiltered, best effort scans (which
    skips inodes under lock contention or with a dirty cache mapping).

    Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 00ca79a04bef1a1b30ef8afd992d905b6d986caf
Author: Brian Foster <bfoster@xxxxxxxxxx>
Date:   Wed Nov 7 12:21:14 2012 -0500

    xfs: add minimum file size filtering to eofblocks scan

    Support minimum file size filtering in the eofblocks scan. The
    caller must set the XFS_EOF_FLAGS_MINFILESIZE flags bit and minimum
    file size value in bytes.

    Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 1b5560488d1ab7c932f6f99385b41116838c3486
Author: Brian Foster <bfoster@xxxxxxxxxx>
Date:   Tue Nov 6 09:50:45 2012 -0500

    xfs: support multiple inode id filtering in eofblocks scan

    Enhance the eofblocks scan code to filter based on multiply specified
    inode id values. When multiple inode id values are specified, only
    inodes that match all id values are selected.

    Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 3e3f9f5863548e870edfcc72e7617ac8ddcad44a
Author: Brian Foster <bfoster@xxxxxxxxxx>
Date:   Wed Nov 7 12:21:13 2012 -0500

    xfs: add inode id filtering to eofblocks scan

    Support inode ID filtering in the eofblocks scan. The caller must
    set the associated XFS_EOF_FLAGS_*ID bit and ID field.

    Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 8ca149de80478441352a8622ea15fae7de703ced
Author: Brian Foster <bfoster@xxxxxxxxxx>
Date:   Wed Nov 7 12:21:12 2012 -0500

    xfs: add XFS_IOC_FREE_EOFBLOCKS ioctl

    The XFS_IOC_FREE_EOFBLOCKS ioctl allows users to invoke an EOFBLOCKS
    scan. The xfs_eofblocks structure is defined to support the command
    parameters (scan mode).

    Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 41176a68e3f710630feace536d0277a092e206b5
Author: Brian Foster <bfoster@xxxxxxxxxx>
Date:   Tue Nov 6 09:50:42 2012 -0500

    xfs: create function to scan and clear EOFBLOCKS inodes

    xfs_inodes_free_eofblocks() implements scanning functionality for
    EOFBLOCKS inodes. It uses the AG iterator to walk the tagged inodes
    and free post-EOF blocks via the xfs_inode_free_eofblocks() execute
    function. The scan can be invoked in best-effort mode or wait
    (force) mode.

    A best-effort scan (default) handles all inodes that do not have a
    dirty cache and we successfully acquire the io lock via trylock. In
    wait mode, we continue to cycle through an AG until all inodes are
    handled.

    Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
    Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 40165e27617e2a98bf8588001d2f2872fae2fee2
Author: Brian Foster <bfoster@xxxxxxxxxx>
Date:   Tue Nov 6 09:50:41 2012 -0500

    xfs: make xfs_free_eofblocks() non-static, return EAGAIN on trylock failure

    Turn xfs_free_eofblocks() into a non-static function, return EAGAIN to
    indicate trylock failure and make sure this error is not propagated in
    xfs_release().

    Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
    Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 72b53efa4a6125a4c334871c58268c430605819a
Author: Brian Foster <bfoster@xxxxxxxxxx>
Date:   Tue Nov 6 09:50:40 2012 -0500

    xfs: create helper to check whether to free eofblocks on inode

    This check is used in multiple places to determine whether we
    should check for (and potentially free) post EOF blocks on an
    inode. Add a helper to consolidate the check.

    Note that when we remove an inode from the cache (xfs_inactive()),
    we are required to trim post-EOF blocks even if the inode is marked
    preallocated or append-only to maintain correct space accounting.
    The 'force' parameter to xfs_can_free_eofblocks() specifies whether
    we should ignore the prealloc/append-only status of the inode.

    Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
    Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit a454f7428ffa03c8e1321124d9074101b7290be6
Author: Brian Foster <bfoster@xxxxxxxxxx>
Date:   Tue Nov 6 09:50:39 2012 -0500

    xfs: support a tag-based inode_ag_iterator

    Genericize xfs_inode_ag_walk() to support an optional radix tree tag
    and args argument for the execute function. Create a new wrapper
    called xfs_inode_ag_iterator_tag() that performs a tag based walk
    of perag's and inodes.

    Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
    Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 27b52867925e3aaed090063c1c58a7537e6373f3
Author: Brian Foster <bfoster@xxxxxxxxxx>
Date:   Tue Nov 6 09:50:38 2012 -0500

    xfs: add EOFBLOCKS inode tagging/untagging

    Add the XFS_ICI_EOFBLOCKS_TAG inode tag to identify inodes with
    speculatively preallocated blocks beyond EOF. An inode is tagged
    when speculative preallocation occurs and untagged either via
    truncate down or when post-EOF blocks are freed via release or
    reclaim.

    The tag management is intentionally not aggressive to prefer
    simplicity over the complexity of handling all the corner cases
    under which post-EOF blocks could be freed (i.e., forward
    truncation, fallocate, write error conditions, etc.). This means
    that a tagged inode may or may not have post-EOF blocks after a
    period of time. The tag is eventually cleared when the inode is
    released or reclaimed.

    Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
    Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 69a58a43f74eb2cb23d9bce2524dae33c289a40f
Author: Eric Sandeen <sandeen@xxxxxxxxxx>
Date:   Tue Oct 9 14:11:45 2012 -0500

    xfs: report projid32bit feature in geometry call

    When xfs gained the projid32bit feature, it was never added to
    the FSGEOMETRY ioctl feature flags, so it's not queryable without
    this patch.

    Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx>
    Reviewed-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
    Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 009507b052fa391618eccf9e8c9f484407fd9018
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Nov 2 11:38:44 2012 +1100

    xfs: fix reading of wrapped log data

    Commit 4439647 ("xfs: reset buffer pointers before freeing them") in
    3.0-rc1 introduced a regression when recovering log buffers that
    wrapped around the end of log. The second part of the log buffer at
    the start of the physical log was being read into the header buffer
    rather than the data buffer, and hence recovery was seeing garbage
    in the data buffer when it got to the region of the log buffer that
    was incorrectly read.

    Cc: <stable@xxxxxxxxxxxxxxx> # 3.0.x, 3.2.x, 3.4.x 3.6.x
    Reported-by: Torsten Kaiser <just.for.lkml@xxxxxxxxxxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 137fff09b7924507871f8e6294dfe57b7a880332
Author: Dave Chinner <david@xxxxxxxxxxxxx>
Date:   Fri Nov 2 14:23:12 2012 +1100

    xfs: fix buffer shudown reference count mismatch

    When we shut down the filesystem, we have to unpin and free all the
    buffers currently active in the CIL. To do this we unpin and remove
    them in one operation as a result of a failed iclogbuf write. For
    buffers, we do this removal via a simultated IO completion of after
    marking the buffer stale.

    At the time we do this, we have two references to the buffer - the
    active LRU reference and the buf log item.  The LRU reference is
    removed by marking the buffer stale, and the active CIL reference is
    by the xfs_buf_iodone() callback that is run by
    xfs_buf_do_callbacks() during ioend processing (via the bp->b_iodone
    callback).

    However, ioend processing requires one more reference - that of the
    IO that it is completing. We don't have this reference, so we free
    the buffer prematurely and use it after it is freed. For buffers
    marked with XBF_ASYNC, this leads to assert failures in
    xfs_buf_rele() on debug kernels because the b_hold count is zero.

    Fix this by making sure we take the necessary IO reference before
    starting IO completion processing on the stale buffer, and set the
    XBF_ASYNC flag to ensure that IO completion processing removes all
    the active references from the buffer to ensure it is fully torn
    down.

    Cc: <stable@xxxxxxxxxxxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit b6aff29f3af7437635ec3d66af9115bb17ba561f
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Nov 2 11:38:42 2012 +1100

    xfs: don't vmap inode cluster buffers during free

    Inode buffers do not need to be mapped as inodes are read or written
    directly from/to the pages underlying the buffer. This fixes a
    regression introduced by commit 611c994 ("xfs: make XBF_MAPPED the
    default behaviour").

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 4c05f9ad4d168098b7ce3ffa7098283f94811ed6
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Nov 2 11:38:41 2012 +1100

    xfs: invalidate allocbt blocks moved to the free list

    When we free a block from the alloc btree tree, we move it to the
    freelist held in the AGFL and mark it busy in the busy extent tree.
    This typically happens when we merge btree blocks.

    Once the transaction is committed and checkpointed, the block can
    remain on the free list for an indefinite amount of time.  Now, this
    isn't the end of the world at this point - if the free list is
    shortened, the buffer is invalidated in the transaction that moves
    it back to free space. If the buffer is allocated as metadata from
    the free list, then all the modifications getted logged, and we have
    no issues, either. And if it gets allocated as userdata direct from
    the freelist, it gets invalidated and so will never get written.

    However, during the time it sits on the free list, pressure on the
    log can cause the AIL to be pushed and the buffer that covers the
    block gets pushed for write. IOWs, we end up writing a freed
    metadata block to disk. Again, this isn't the end of the world
    because we know from the above we are only writing to free space.

    The problem, however, is for validation callbacks. If the block was
    on old btree root block, then the level of the block is going to be
    higher than the current tree root, and so will fail validation.
    There may be other inconsistencies in the block as well, and
    currently we don't care because the block is in free space. Shutting
    down the filesystem because a freed block doesn't pass write
    validation, OTOH, is rather unfriendly.

    So, make sure we always invalidate buffers as they move from the
    free space trees to the free list so that we guarantee they never
    get written to disk while on the free list.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit cd856db69c88db438215244571957d812bdc6813
Author: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
Date:   Sat Oct 20 11:08:19 2012 -0300

    xfs: Update inode alloc comments

    I found some out of date comments while studying the inode allocation
    code, so I believe it's worth to have these comments updated.

    It basically rewrites the comment regarding to "call_again" variable,
    which is not used anymore, but instead, callers of xfs_ialloc() decides
    if it needs to be called again relying only if ialloc_context is NULL or
    not.

    Also did some small changes in another comment that I thought to be
    pertinent to the current behaviour of these functions and some alignment
    on both comments.

    Signed-off-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 531c3bdc8662e1a83f8ec80dc3346b1284877c0a
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Thu Oct 25 17:22:30 2012 +1100

    xfs: silence uninitialised f.file warning.

    Uninitialised variable build warning introduced by 2903ff0 ("switch
    simple cases of fget_light to fdget"), gcc is not smart enough to
    work out that the variable is not used uninitialised, and the commit
    removed the initialisation at declaration that the old variable had.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 1375cb65e87b327a8dd4f920c3e3d837fb40e9c2
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Oct 9 14:50:52 2012 +1100

    xfs: growfs: don't read garbage for new secondary superblocks

    When updating new secondary superblocks in a growfs operation, the
    superblock buffer is read from the newly grown region of the
    underlying device. This is not guaranteed to be zero, so violates
    the underlying assumption that the unused parts of superblocks are
    zero filled. Get a new buffer for these secondary superblocks to
    ensure that the unused regions are zero filled correctly.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit e04426b9202bccd4cfcbc70b2fa2aeca1c86d8f5
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Oct 5 11:06:59 2012 +1000

    xfs: move allocation stack switch up to xfs_bmapi_allocate

    Switching stacks are xfs_alloc_vextent can cause deadlocks when we
    run out of worker threads on the allocation workqueue. This can
    occur because xfs_bmap_btalloc can make multiple calls to
    xfs_alloc_vextent() and even if xfs_alloc_vextent() fails it can
    return with the AGF locked in the current allocation transaction.

    If we then need to make another allocation, and all the allocation
    worker contexts are exhausted because the are blocked waiting for
    the AGF lock, holder of the AGF cannot get it's xfs-alloc_vextent
    work completed to release the AGF.  Hence allocation effectively
    deadlocks.

    To avoid this, move the stack switch one layer up to
    xfs_bmapi_allocate() so that all of the allocation attempts in a
    single switched stack transaction occur in a single worker context.
    This avoids the problem of an allocation being blocked waiting for
    a worker thread whilst holding the AGF.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 2455881c0b52f87be539c4c7deab1afff4d8a560
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Oct 5 11:06:58 2012 +1000

    xfs: introduce XFS_BMAPI_STACK_SWITCH

    Certain allocation paths through xfs_bmapi_write() are in situations
    where we have limited stack available. These are almost always in
    the buffered IO writeback path when convertion delayed allocation
    extents to real extents.

    The current stack switch occurs for userdata allocations, which
    means we also do stack switches for preallocation, direct IO and
    unwritten extent conversion, even those these call chains have never
    been implicated in a stack overrun.

    Hence, let's target just the single stack overun offended for stack
    switches. To do that, introduce a XFS_BMAPI_STACK_SWITCH flag that
    the caller can pass xfs_bmapi_write() to indicate it should switch
    stacks if it needs to do allocation.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit a00416844b8f4b0106344bdfd90fe45a854b1d05
Author: Mark Tinguely <tinguely@xxxxxxx>
Date:   Thu Sep 20 13:16:45 2012 -0500

    xfs: zero allocation_args on the kernel stack

    Zero the kernel stack space that makes up the xfs_alloc_arg structures.

    Signed-off-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Ben Myers <bpm@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit d35e88faa3b0fc2cea35c3b2dca358b5cd09b45f
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Oct 8 21:56:12 2012 +1100

    xfs: only update the last_sync_lsn when a transaction completes

    The log write code stamps each iclog with the current tail LSN in
    the iclog header so that recovery knows where to find the tail of
    thelog once it has found the head. Normally this is taken from the
    first item on the AIL - the log item that corresponds to the oldest
    active item in the log.

    The problem is that when the AIL is empty, the tail lsn is dervied
    from the the l_last_sync_lsn, which is the LSN of the last iclog to
    be written to the log. In most cases this doesn't happen, because
    the AIL is rarely empty on an active filesystem. However, when it
    does, it opens up an interesting case when the transaction being
    committed to the iclog spans multiple iclogs.

    That is, the first iclog is stamped with the l_last_sync_lsn, and IO
    is issued. Then the next iclog is setup, the changes copied into the
    iclog (takes some time), and then the l_last_sync_lsn is stamped
    into the header and IO is issued. This is still the same
    transaction, so the tail lsn of both iclogs must be the same for log
    recovery to find the entire transaction to be able to replay it.

    The problem arises in that the iclog buffer IO completion updates
    the l_last_sync_lsn with it's own LSN. Therefore, If the first iclog
    completes it's IO before the second iclog is filled and has the tail
    lsn stamped in it, it will stamp the LSN of the first iclog into
    it's tail lsn field. If the system fails at this point, log recovery
    will not see a complete transaction, so the transaction will no be
    replayed.

    The fix is simple - the l_last_sync_lsn is updated when a iclog
    buffer IO completes, and this is incorrect. The l_last_sync_lsn
    shoul dbe updated when a transaction is completed by a iclog buffer
    IO. That is, only iclog buffers that have transaction commit
    callbacks attached to them should update the l_last_sync_lsn. This
    means that the last_sync_lsn will only move forward when a commit
    record it written, not in the middle of a large transaction that is
    rolling through multiple iclog buffers.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 33479e0542df066fb0b47df18780e93bfe6e0dc5
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Oct 8 21:56:11 2012 +1100

    xfs: remove xfs_iget.c

    The inode cache functions remaining in xfs_iget.c can be moved to xfs_icache.c
    along with the other inode cache functions. This removes all functionality from
    xfs_iget.c, so the file can simply be removed.

    This move results in various functions now only having the scope of a single
    file (e.g. xfs_inode_free()), so clean up all the definitions and exported
    prototypes in xfs_icache.[ch] and xfs_inode.h appropriately.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit fa96acadf1eb712fca6d59922ad93787c87e44ec
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Oct 8 21:56:10 2012 +1100

    xfs: move inode locking functions to xfs_inode.c

    xfs_ilock() and friends really aren't related to the inode cache in
    any way, so move them to xfs_inode.c with all the other inode
    related functionality.

    While doing this move, move the xfs_ilock() tracepoints to *before*
    the lock is taken so that when a hang on a lock occurs we have
    events to indicate which process and what inode we were trying to
    lock when the hang occurred. This is much better than the current
    silence we get on a hang...

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 6d8b79cfca39399ef9115fb65dde85993455c9a3
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Oct 8 21:56:09 2012 +1100

    xfs: rename xfs_sync.[ch] to xfs_icache.[ch]

    xfs_sync.c now only contains inode reclaim functions and inode cache
    iteration functions. It is not related to sync operations anymore.
    Rename to xfs_icache.c to reflect it's contents and prepare for
    consolidation with the other inode cache file that exists
    (xfs_iget.c).

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit c75921a72a7c4bb73a5e09a697a672722e5543f1
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Oct 8 21:56:08 2012 +1100

    xfs: xfs_quiesce_attr() should quiesce the log like unmount

    xfs_quiesce_attr() is supposed to leave the log empty with an
    unmount record written. Right now it does not wait for the AIL to be
    emptied before writing the unmount record, not does it wait for
    metadata IO completion, either. Fix it to use the same method and
    code as xfs_log_unmount().

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit c7eea6f7adca4501d2c2db7f0f7c9dc88efac95e
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Oct 8 21:56:07 2012 +1100

    xfs: move xfs_quiesce_attr() into xfs_super.c

    Both callers of xfs_quiesce_attr() are in xfs_super.c, and there's
    nothing really sync-specific about this functionality so it doesn't
    really matter where it lives. Move it to benext to it's callers, so
    all the remount/sync_fs code is in the one place.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 34061f5c420561dd42addd252811a1fa4b0ac69b
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Oct 8 21:56:06 2012 +1100

    xfs: xfs_sync_fsdata is redundant

    Why do we need to write the superblock to disk once we've written
    all the data?  We don't actually - the reasons for doing this are
    lost in the mists of time, and go back to the way Irix used to drive
    VFS flushing.

    On linux, this code is only called from two contexts: remount and
    .sync_fs. In the remount case, the call is followed by a metadata
    sync, which unpins and writes the superblock.  In the sync_fs case,
    we only need to force the log to disk to ensure that the superblock
    is correctly on disk, so we don't actually need to write it. Hence
    the functionality is either redundant or superfluous and thus can be
    removed.

    Seeing as xfs_quiesce_data is essentially now just a log force,
    remove it as well and fold the code back into the two callers.
    Neither of them need the log covering check, either, as that is
    redundant for the remount case, and unnecessary for the .sync_fs
    case.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 5889608df35783590251cfd440fa5d48f1855179
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Oct 8 21:56:05 2012 +1100

    xfs: syncd workqueue is no more

    With the syncd functions moved to the log and/or removed, the syncd
    workqueue is the only remaining bit left. It is used by the log
    covering/ail pushing work, as well as by the inode reclaim work.

    Given how cheap workqueues are these days, give the log and inode
    reclaim work their own work queues and kill the syncd work queue.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 9aa05000f2b7cab4be582afba64af10b2d74727e
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Oct 8 21:56:04 2012 +1100

    xfs: xfs_sync_data is redundant.

    We don't do any data writeback from XFS any more - the VFS is
    completely responsible for that, including for freeze. We can
    replace the remaining caller with a VFS level function that
    achieves the same thing, but without conflicting with current
    writeback work.

    This means we can remove the flush_work and xfs_flush_inodes() - the
    VFS functionality completely replaces the internal flush queue for
    doing this writeback work in a separate context to avoid stack
    overruns.

    This does have one complication - it cannot be called with page
    locks held.  Hence move the flushing of delalloc space when ENOSPC
    occurs back up into xfs_file_aio_buffered_write when we don't hold
    any locks that will stall writeback.

    Unfortunately, writeback_inodes_sb_if_idle() is not sufficient to
    trigger delalloc conversion fast enough to prevent spurious ENOSPC
    whent here are hundreds of writers, thousands of small files and GBs
    of free RAM.  Hence we need to use sync_sb_inodes() to block callers
    while we wait for writeback like the previous xfs_flush_inodes
    implementation did.

    That means we have to hold the s_umount lock here, but because this
    call can nest inside i_mutex (the parent directory in the create
    case, held by the VFS), we have to use down_read_trylock() to avoid
    potential deadlocks. In practice, this trylock will succeed on
    almost every attempt as unmount/remount type operations are
    exceedingly rare.

    Note: we always need to pass a count of zero to
    generic_file_buffered_write() as the previously written byte count.
    We only do this by accident before this patch by the virtue of ret
    always being zero when there are no errors. Make this explicit
    rather than needing to specifically zero ret in the ENOSPC retry
    case.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Tested-by: Brian Foster <bfoster@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit cf2931db2d189ce0583be7ae880d7e3f8c15f623
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Oct 8 21:56:03 2012 +1100

    xfs: Bring some sanity to log unmounting

    When unmounting the filesystem, there are lots of operations that
    need to be done in a specific order, and they are spread across
    across a couple of functions. We have to drain the AIL before we
    write the unmount record, and we have to shut down the background
    log work before we do either of them.

    But this is all split haphazardly across xfs_unmountfs() and
    xfs_log_unmount(). Move all the AIL flushing and log manipulations
    to xfs_log_unmount() so that the responisbilities of each function
    is clear and the operations they perform obvious.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit f661f1e0bf5002bdcc8b5810ad0a184a1841537f
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Oct 8 21:56:02 2012 +1100

    xfs: sync work is now only periodic log work

    The only thing the periodic sync work does now is flush the AIL and
    idle the log. These are really functions of the log code, so move
    the work to xfs_log.c and rename it appropriately.

    The only wart that this leaves behind is the xfssyncd_centisecs
    sysctl, otherwise the xfssyncd is dead. Clean up any comments that
    related to xfssyncd to reflect it's passing.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 7f7bebefba152c5bdfe961cd2e97e8695a32998c
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Oct 8 21:56:01 2012 +1100

    xfs: don't run the sync work if the filesystem is read-only

    If the filesystem is mounted or remounted read-only, stop the sync
    worker that tries to flush or cover the log if the filesystem is
    dirty. It's read-only, so it isn't dirty. Restart it on a remount,rw
    as necessary. This avoids the need for RO checks in the work.

    Similarly, stop the sync work when the filesystem is frozen, and
    start it again when the filesysetm is thawed. This avoids the need
    for special freeze checks in the work.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 7e18530bef6a18a5479690ae7e8256319ecf1300
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Oct 8 21:56:00 2012 +1100

    xfs: rationalise xfs_mount_wq users

    Instead of starting and stopping background work on the xfs_mount_wq
    all at the same time, separate them to where they really are needed
    to start and stop.

    The xfs_sync_worker, only needs to be started after all the mount
    processing has completed successfully, while it needs to be stopped
    before the log is unmounted.

    The xfs_reclaim_worker is started on demand, and can be
    stopped before the unmount process does it's own inode reclaim pass.

    The xfs_flush_inodes work is run on demand, and so we really only
    need to ensure that it has stopped running before we start
    processing an unmount, freeze or remount,ro.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 33c7a2bc48a81fa714572f8ce29f29bc17e6faf0
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Oct 8 21:55:59 2012 +1100

    xfs: xfs_syncd_stop must die

    xfs_syncd_start and xfs_syncd_stop tie a bunch of unrelated
    functionailty together that actually have different start and stop
    requirements. Kill these functions and open code the start/stop
    methods for each of the background functions.

    Subsequent patches will move the start/stop functions around to the
    correct places to avoid races and shutdown issues.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

-----------------------------------------------------------------------

Summary of changes:
 fs/xfs/Kconfig                      |    1 +
 fs/xfs/Makefile                     |    4 +-
 fs/xfs/uuid.h                       |    6 +
 fs/xfs/xfs_ag.h                     |    5 +
 fs/xfs/xfs_alloc.c                  |  140 ++++--
 fs/xfs/xfs_alloc.h                  |    3 +
 fs/xfs/xfs_alloc_btree.c            |   77 +++
 fs/xfs/xfs_alloc_btree.h            |    2 +
 fs/xfs/xfs_aops.c                   |   83 ++--
 fs/xfs/xfs_attr.c                   |  103 ++--
 fs/xfs/xfs_attr_leaf.c              |  143 +++---
 fs/xfs/xfs_attr_leaf.h              |    6 +
 fs/xfs/xfs_bmap.c                   |   64 ++-
 fs/xfs/xfs_bmap_btree.c             |   63 +++
 fs/xfs/xfs_bmap_btree.h             |    1 +
 fs/xfs/xfs_btree.c                  |  111 +++--
 fs/xfs/xfs_btree.h                  |   22 +-
 fs/xfs/xfs_buf.c                    |   59 ++-
 fs/xfs/xfs_buf.h                    |   27 +-
 fs/xfs/xfs_cksum.h                  |   63 +++
 fs/xfs/xfs_da_btree.c               |  141 +++++-
 fs/xfs/xfs_da_btree.h               |   10 +-
 fs/xfs/xfs_dfrag.c                  |   13 +-
 fs/xfs/xfs_dir2_block.c             |  436 ++++++++++-------
 fs/xfs/xfs_dir2_data.c              |  170 +++++--
 fs/xfs/xfs_dir2_leaf.c              |  172 +++++--
 fs/xfs/xfs_dir2_node.c              |  288 +++++++----
 fs/xfs/xfs_dir2_priv.h              |   19 +-
 fs/xfs/xfs_dquot.c                  |  134 ++++-
 fs/xfs/xfs_dquot.h                  |    2 +
 fs/xfs/xfs_export.c                 |    1 +
 fs/xfs/xfs_file.c                   |   42 +-
 fs/xfs/xfs_fs.h                     |   33 +-
 fs/xfs/xfs_fs_subr.c                |   96 ----
 fs/xfs/xfs_fsops.c                  |  141 ++++--
 fs/xfs/xfs_globals.c                |    4 +-
 fs/xfs/xfs_ialloc.c                 |   83 +++-
 fs/xfs/xfs_ialloc.h                 |    4 +-
 fs/xfs/xfs_ialloc_btree.c           |   55 +++
 fs/xfs/xfs_ialloc_btree.h           |    2 +
 fs/xfs/{xfs_sync.c => xfs_icache.c} |  914 ++++++++++++++++++++++++-----------
 fs/xfs/{xfs_sync.h => xfs_icache.h} |   28 +-
 fs/xfs/xfs_iget.c                   |  705 ---------------------------
 fs/xfs/xfs_inode.c                  |  437 ++++++++++++++---
 fs/xfs/xfs_inode.h                  |   12 +-
 fs/xfs/xfs_ioctl.c                  |   21 +
 fs/xfs/xfs_iomap.c                  |   31 +-
 fs/xfs/xfs_iops.c                   |    8 +-
 fs/xfs/xfs_itable.c                 |    4 +-
 fs/xfs/xfs_linux.h                  |    2 +
 fs/xfs/xfs_log.c                    |  241 +++++++--
 fs/xfs/xfs_log.h                    |    4 +
 fs/xfs/xfs_log_priv.h               |   12 +-
 fs/xfs/xfs_log_recover.c            |  146 +++---
 fs/xfs/xfs_mount.c                  |  163 ++++---
 fs/xfs/xfs_mount.h                  |   13 +-
 fs/xfs/xfs_qm.c                     |   22 +-
 fs/xfs/xfs_qm_syscalls.c            |    6 +-
 fs/xfs/xfs_rtalloc.c                |   16 +-
 fs/xfs/xfs_sb.h                     |    7 +
 fs/xfs/xfs_super.c                  |  148 ++++--
 fs/xfs/xfs_super.h                  |    1 +
 fs/xfs/xfs_sysctl.c                 |    9 +
 fs/xfs/xfs_sysctl.h                 |    1 +
 fs/xfs/xfs_trace.h                  |   60 ++-
 fs/xfs/xfs_trans.h                  |   19 +-
 fs/xfs/xfs_trans_buf.c              |    9 +-
 fs/xfs/xfs_vnodeops.c               |  168 +++++--
 fs/xfs/xfs_vnodeops.h               |    9 +-
 69 files changed, 3737 insertions(+), 2308 deletions(-)
 create mode 100644 fs/xfs/xfs_cksum.h
 delete mode 100644 fs/xfs/xfs_fs_subr.c
 rename fs/xfs/{xfs_sync.c => xfs_icache.c} (55%)
 rename fs/xfs/{xfs_sync.h => xfs_icache.h} (64%)
 delete mode 100644 fs/xfs/xfs_iget.c

hooks/post-receive
-- 
XFS development tree

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs