Re: [PATCH v6A 00/19] xfs: online scrub support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Mar 11, 2017 at 12:35 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> On Sat, Mar 11, 2017 at 1:19 AM, Darrick J. Wong
> <darrick.wong@xxxxxxxxxx> wrote:
>> Hi all,
>>
>> [Yes, this is a pre-LSFMM patch dump.]
>>
>> This is the sixth revision of a patchset that adds to XFS kernel support
>> for online metadata scrubbing and repair.  There aren't any on-disk
>> format changes.  Changes since v5 include bug fixes to the repair code
>> to eliminate weird hangs and to do a better job of temporarily stopping
>> access to the filesystem in the rare event that we need todo so to
>> rebuild something.  For my own dogfooding amusement, I now perform
>> automated periodic scans of the XFS filesystems on my development
>> workstations, which (so far) haven't destroyed anything or blown up.
>>
>> Online scrub/repair support consists of four major pieces -- first, an
>> ioctl that maps physical extents to their owners (GETFSMAP; queued for
>> 4.12); second, various in-kernel metadata scrubbing ioctls to examine
>> metadata records and cross-reference them with other filesystem
>> metadata; third, an in-kernel mechanism for rebuilding damaged metadata
>> objects and btrees; and fourth, a userspace component to coordinate
>> scrubbing and repair operations.
>>
>> This new utility, xfs_scrub, is separate from the existing offline
>> xfs_repair tool.  The program uses GETFSMAP and various XFS ioctls to
>> iterate all XFS metadata and asks the kernel to check the metadata and
>> repair it if necessary.
>>
>> Per reviewer request, the v6 patch series has been broken into four
>> smaller series -- this first one to add the minimum code necessary to
>> scrub objects; a second one to add the ability to cross reference with
>> other metadata; a third one containing the rebuilding code; and a fourth
>> series with the userspace tool code.
>>
>> If you're going to start using this mess, you probably ought to just
>> pull from my git trees.  The kernel patches[1] should apply against
>> 4.11-rc1.  xfsprogs[2] and xfstests[3] can be found in their usual
>> places.  The git trees contain all four series' worth of changes.
>>
>> This is an extraordinary way to eat your data.  Enjoy!
>> Comments and questions are, as always, welcome.
>>
>> --D
>>
>> [1] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=djwong-devel
>> [2] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=djwong-devel
>> [3] https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=djwong-devel
>
> Hi Darrick,
>
> My first attempt to run the dengerous_scrub tests did not go so well.
>
> 1. For some reason, xfsprogs configure does not correctly detect that my system
>     include files are missing FICLONE and friends, so had to manually add:
> --- a/include/builddefs.in
> +++ b/include/builddefs.in
> @@ -178,6 +178,10 @@ ifeq ($(PKG_PLATFORM)_$(HAVE_SYS_GETFSMAP),linux_)
>  PCFLAGS+= -DOVERRIDE_GETFSMAP
>  endif
>
> +PCFLAGS+= -DOVERRIDE_FICLONE
> +PCFLAGS+= -DOVERRIDE_FICLONERANGE
> +PCFLAGS+= -DOVERRIDE_FIDEDUPERANGE
> +PCFLAGS+= -DOVERRIDE_GETFSMAP
>
> I'll investigate this next week.
>

This was my bad. needed make realclean.

> 2. On first attempt to run -g xfs/dengerous_scrub, 1378 triggered an
> ASSERT, so modified:
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -335,7 +335,7 @@ static inline __uint64_t howmany_64(__uint64_t x,
> __uint32_t y)
>
>  #ifdef DEBUG
>  #define ASSERT(expr)   \
> -       (likely(expr) ? (void)0 : assfail(#expr, __FILE__, __LINE__))
> +       (likely(expr) ? (void)0 : asswarn(#expr, __FILE__, __LINE__))
>
> 3. Second attempt did not get much further. scratch mount wasn't able
> to umount after 262
>     (attached out.bad full and dmesg of this run)
>
> 4. 3rd attempt, I just ran 350, it got a kernel page fault on logsunit fuzzing
>     (attached full output and dmesg of this run)
>

This page fault is reproducible on my system.
350 hits the page fault during logsunit middlebit verb, same as previous run.

This is my scratch setup (100GB LV on rotating drive):

$ xfs_info /mnt/scratch
meta-data=/dev/mapper/storage-scratch isize=512    agcount=4,
agsize=6553600 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0 rmapbt=1
         =                       reflink=1
data     =                       bsize=4096   blocks=26214400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=12800, version=2
         =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

This is my kernel xfs config:

CONFIG_JFS_STATISTICS=y
CONFIG_XFS_FS=m
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
CONFIG_XFS_DEBUG=y


Do you need anymore info about my setup?



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux