[XFS updates] XFS development tree branch, for-next, updated. v3.10-rc1-22-g56c19e8

xfs@xxxxxxxxxxx · Thu, 30 May 2013 13:56:00 -0500 (CDT)

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "XFS development tree".

The branch, for-next has been updated
  56c19e8 xfs: kill suid/sgid through the truncate path.
  74137ff xfs: add fsgeom flag for v5 superblock support.
  02f7540 xfs: disable swap extents ioctl on CRC enabled filesystems
  709da6a xfs: fix split buffer vector log recovery support
  321a958 xfs: fix incorrect remote symlink block count
  3451018 xfs: don't emit v5 superblock warnings on write
      from  ad1858d77771172e08016890f0eb2faedec3ecee (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 56c19e89b38618390addfc743d822f99519055c6
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon May 27 16:38:25 2013 +1000

    xfs: kill suid/sgid through the truncate path.

    XFS has failed to kill suid/sgid bits correctly when truncating
    files of non-zero size since commit c4ed4243 ("xfs: split
    xfs_setattr") introduced in the 3.1 kernel. Fix it.

    Fix it.

    cc: stable kernel <stable@xxxxxxxxxxxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 74137fff067961c9aca1e14d073805c3de8549bd
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon May 27 16:38:26 2013 +1000

    xfs: add fsgeom flag for v5 superblock support.

    Currently userspace has no way of determining that a filesystem is
    CRC enabled. Add a flag to the XFS_IOC_FSGEOMETRY ioctl output to
    indicate that the filesystem has v5 superblock support enabled.
    This will allow xfs_info to correctly report the state of the
    filesystem.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Eric Sandeen <sandeen@xxxxxxxxxx>
    Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 02f75405a75eadfb072609f6bf839e027de6a29a
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon May 27 16:38:24 2013 +1000

    xfs: disable swap extents ioctl on CRC enabled filesystems

    Currently, swapping extents from one inode to another is a simple
    act of switching data and attribute forks from one inode to another.
    This, unfortunately in no longer so simple with CRC enabled
    filesystems as there is owner information embedded into the BMBT
    blocks that are swapped between inodes. Hence swapping the forks
    between inodes results in the inodes having mapping blocks that
    point to the wrong owner and hence are considered corrupt.

    To fix this we need an extent tree block or record based swap
    algorithm so that the BMBT block owner information can be updated
    atomically in the swap transaction. This is a significant piece of
    new work, so for the moment simply don't allow swap extent
    operations to succeed on CRC enabled filesystems.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Ben Myers <bpm@xxxxxxx>
    Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 709da6a61aaf12181a8eea8443919ae5fc1b731d
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon May 27 16:38:23 2013 +1000

    xfs: fix split buffer vector log recovery support

    A long time ago in a galaxy far away....

    .. the was a commit made to fix some ilinux specific "fragmented
    buffer" log recovery problem:

    http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=b29c0bece51da72fb3ff3b61391a391ea54e1603

    That problem occurred when a contiguous dirty region of a buffer was
    split across across two pages of an unmapped buffer. It's been a
    long time since that has been done in XFS, and the changes to log
    the entire inode buffers for CRC enabled filesystems has
    re-introduced that corner case.

    And, of course, it turns out that the above commit didn't actually
    fix anything - it just ensured that log recovery is guaranteed to
    fail when this situation occurs. And now for the gory details.

    xfstest xfs/085 is failing with this assert:

    XFS (vdb): bad number of regions (0) in inode log format
    XFS: Assertion failed: 0, file: fs/xfs/xfs_log_recover.c, line: 1583

    Largely undocumented factoid #1: Log recovery depends on all log
    buffer format items starting with this format:

    struct foo_log_format {
    	__uint16_t	type;
    	__uint16_t	size;
    	....

    As recoery uses the size field and assumptions about 32 bit
    alignment in decoding format items.  So don't pay much attention to
    the fact log recovery thinks that it decoding an inode log format
    item - it just uses them to determine what the size of the item is.

    But why would it see a log format item with a zero size? Well,
    luckily enough xfs_logprint uses the same code and gives the same
    error, so with a bit of gdb magic, it turns out that it isn't a log
    format that is being decoded. What logprint tells us is this:

    Oper (130): tid: a0375e1a  len: 28  clientid: TRANS  flags: none
    BUF:  #regs: 2   start blkno: 144 (0x90)  len: 16  bmap size: 2  flags: 0x4000
    Oper (131): tid: a0375e1a  len: 4096  clientid: TRANS  flags: none
    BUF DATA
    ----------------------------------------------------------------------------
    Oper (132): tid: a0375e1a  len: 4096  clientid: TRANS  flags: none
    xfs_logprint: unknown log operation type (4e49)
    **********************************************************************
    * ERROR: data block=2                                                 *
    **********************************************************************

    That we've got a buffer format item (oper 130) that has two regions;
    the format item itself and one dirty region. The subsequent region
    after the buffer format item and it's data is them what we are
    tripping over, and the first bytes of it at an inode magic number.
    Not a log opheader like there is supposed to be.

    That means there's a problem with the buffer format item. It's dirty
    data region is 4096 bytes, and it contains - you guessed it -
    initialised inodes. But inode buffers are 8k, not 4k, and we log
    them in their entirety. So something is wrong here. The buffer
    format item contains:

    (gdb) p /x *(struct xfs_buf_log_format *)in_f
    $22 = {blf_type = 0x123c, blf_size = 0x2, blf_flags = 0x4000,
           blf_len = 0x10, blf_blkno = 0x90, blf_map_size = 0x2,
           blf_data_map = {0xffffffff, 0xffffffff, .... }}

    Two regions, and a signle dirty contiguous region of 64 bits.  64 *
    128 = 8k, so this should be followed by a single 8k region of data.
    And the blf_flags tell us that the type of buffer is a
    XFS_BLFT_DINO_BUF. It contains inodes. And because it doesn't have
    the XFS_BLF_INODE_BUF flag set, that means it's an inode allocation
    buffer. So, it should be followed by 8k of inode data.

    But we know that the next region has a header of:

    (gdb) p /x *ohead
    $25 = {oh_tid = 0x1a5e37a0, oh_len = 0x100000, oh_clientid = 0x69,
           oh_flags = 0x0, oh_res2 = 0x0}

    and so be32_to_cpu(oh_len) = 0x1000 = 4096 bytes. It's simply not
    long enough to hold all the logged data. There must be another
    region. There is - there's a following opheader for another 4k of
    data that contains the other half of the inode cluster data - the
    one we assert fail on because it's not a log format header.

    So why is the second part of the data not being accounted to the
    correct buffer log format structure? It took a little more work with
    gdb to work out that the buffer log format structure was both
    expecting it to be there but hadn't accounted for it. It was at that
    point I went to the kernel code, as clearly this wasn't a bug in
    xfs_logprint and the kernel was writing bad stuff to the log.

    First port of call was the buffer item formatting code, and the
    discontiguous memory/contiguous dirty region handling code
    immediately stood out. I've wondered for a long time why the code
    had this comment in it:

                            vecp->i_addr = xfs_buf_offset(bp, buffer_offset);
                            vecp->i_len = nbits * XFS_BLF_CHUNK;
                            vecp->i_type = XLOG_REG_TYPE_BCHUNK;
    /*
     * You would think we need to bump the nvecs here too, but we do not
     * this number is used by recovery, and it gets confused by the boundary
     * split here
     *                      nvecs++;
     */
                            vecp++;

    And it didn't account for the extra vector pointer. The case being
    handled here is that a contiguous dirty region lies across a
    boundary that cannot be memcpy()d across, and so has to be split
    into two separate operations for xlog_write() to perform.

    What this code assumes is that what is written to the log is two
    consecutive blocks of data that are accounted in the buf log format
    item as the same contiguous dirty region and so will get decoded as
    such by the log recovery code.

    The thing is, xlog_write() knows nothing about this, and so just
    does it's normal thing of adding an opheader for each vector. That
    means the 8k region gets written to the log as two separate regions
    of 4k each, but because nvecs has not been incremented, the buf log
    format item accounts for only one of them.

    Hence when we come to log recovery, we process the first 4k region
    and then expect to come across a new item that starts with a log
    format structure of some kind that tells us whenteh next data is
    going to be. Instead, we hit raw buffer data and things go bad real
    quick.

    So, the commit from 2002 that commented out nvecs++ is just plain
    wrong. It breaks log recovery completely, and it would seem the only
    reason this hasn't been since then is that we don't log large
    contigous regions of multi-page unmapped buffers very often. Never
    would be a closer estimate, at least until the CRC code came along....

    So, lets fix that by restoring the nvecs accounting for the extra
    region when we hit this case.....

    .... and there's the problemin log recovery it is apparently working
    around:

    XFS: Assertion failed: i == item->ri_total, file: fs/xfs/xfs_log_recover.c, line: 2135

    Yup, xlog_recover_do_reg_buffer() doesn't handle contigous dirty
    regions being broken up into multiple regions by the log formatting
    code. That's an easy fix, though - if the number of contiguous dirty
    bits exceeds the length of the region being copied out of the log,
    only account for the number of dirty bits that region covers, and
    then loop again and copy more from the next region. It's a 2 line
    fix.

    Now xfstests xfs/085 passes, we have one less piece of mystery
    code, and one more important piece of knowledge about how to
    structure new log format items..

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 321a95839e65db3759a07a3655184b0283af90fe
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon May 27 16:38:20 2013 +1000

    xfs: fix incorrect remote symlink block count

    When CRCs are enabled, the number of blocks needed to hold a remote
    symlink on a 1k block size filesystem may be 2 instead of 1. The
    transaction reservation for the allocated blocks was not taking this
    into account and only allocating one block. Hence when trying to
    read or invalidate such symlinks, we are mapping a hole where there
    should be a block and things go bad at that point.

    Fix the reservation to use the correct block count, clean up the
    block count calculation similar to the remote attribute calculation,
    and add a debug guard to detect when we don't write the entire
    symlink to disk.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Ben Myers <bpm@xxxxxxx>
    Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 34510185abeaa5be9b178a41c0a03d30aec3db7e
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon May 27 16:38:19 2013 +1000

    xfs: don't emit v5 superblock warnings on write

    We write the superblock every 30s or so which results in the
    verifier being called. Right now that results in this output
    every 30s:

    XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
    Use of these features in this kernel is at your own risk!

    And spamming the logs.

    We don't need to check for whether we support v5 superblocks or
    whether there are feature bits we don't support set as these are
    only relevant when we first mount the filesytem. i.e. on superblock
    read. Hence for the write verification we can just skip all the
    checks (and hence verbose output) altogether.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

-----------------------------------------------------------------------

Summary of changes:
 fs/xfs/xfs_buf_item.c    |  7 +------
 fs/xfs/xfs_dfrag.c       |  8 ++++++++
 fs/xfs/xfs_fs.h          |  1 +
 fs/xfs/xfs_fsops.c       |  4 +++-
 fs/xfs/xfs_iops.c        | 47 ++++++++++++++++++++++++++++++++---------------
 fs/xfs/xfs_log_recover.c | 11 +++++++++++
 fs/xfs/xfs_mount.c       | 18 +++++++++++-------
 fs/xfs/xfs_symlink.c     | 20 ++++++--------------
 8 files changed, 73 insertions(+), 43 deletions(-)

hooks/post-receive
-- 
XFS development tree

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs