Re: mount before xfs_repair hangs

Bart Brashers <bart.brashers@xxxxxxxxx> · Wed, 11 Mar 2020 16:11:27 -0700

After working fine for 2 days, it happened again. Drives went offline
for no apparent reason, and a logicaldevice (as arcconf calls them)
failed. arcconf listed the hard drives as all online by the time I had
logged on.

The server connected to the JBOD had rebooted by the time I noticed the problem.

There are two xfs filesystems on this server. I can mount one of them,
and ran xfs_repair on it.

I first tried mounting the other read-only,no-recovery. That worked.
Trying to mount normally hangs. I see in ps aux | grep mount that it's
not using CPU. Here's the mount command I gave:

mount -t xfs -o inode64,logdev=/dev/md/nvme2 /dev/volgrp4TB/lvol4TB
/export/lvol4TB/

I did an echo w > /proc/sysrc-trigger while I was watching the
console, it said "SysRq : Show Blocked State". Here's what the output
of dmesg looks like, starting with that line. Then it gives blocks
about what's happening on each CPU, some of which mention "xfs".

[  228.927915] SysRq : Show Blocked State
[  228.928525]   task                        PC stack   pid father
[  228.928605] mount           D ffff96f79a553150     0 11341  11254 0x00000080
[  228.928609] Call Trace:
[  228.928617]  [<ffffffffb0b7f1c9>] schedule+0x29/0x70
[  228.928624]  [<ffffffffb0b7cb51>] schedule_timeout+0x221/0x2d0
[  228.928626]  [<ffffffffb0b7f57d>] wait_for_completion+0xfd/0x140
[  228.928633]  [<ffffffffb04da0b0>] ? wake_up_state+0x20/0x20
[  228.928667]  [<ffffffffc04c599e>] ? xfs_buf_delwri_submit+0x5e/0xf0 [xfs]
[  228.928682]  [<ffffffffc04c3217>] xfs_buf_iowait+0x27/0xb0 [xfs]
[  228.928696]  [<ffffffffc04c599e>] xfs_buf_delwri_submit+0x5e/0xf0 [xfs]
[  228.928712]  [<ffffffffc04f2a9e>] xlog_do_recovery_pass+0x3ae/0x6e0 [xfs]
[  228.928727]  [<ffffffffc04f2e59>] xlog_do_log_recovery+0x89/0xd0 [xfs]
[  228.928742]  [<ffffffffc04f2ed1>] xlog_do_recover+0x31/0x180 [xfs]
[  228.928758]  [<ffffffffc04f3fef>] xlog_recover+0xbf/0x190 [xfs]
[  228.928772]  [<ffffffffc04e658f>] xfs_log_mount+0xff/0x310 [xfs]
[  228.928801]  [<ffffffffc04dd1b0>] xfs_mountfs+0x520/0x8e0 [xfs]
[  228.928814]  [<ffffffffc04e02a0>] xfs_fs_fill_super+0x410/0x550 [xfs]
[  228.928818]  [<ffffffffb064c893>] mount_bdev+0x1b3/0x1f0
[  228.928831]  [<ffffffffc04dfe90>] ?
xfs_test_remount_options.isra.12+0x70/0x70 [xfs]
[  228.928842]  [<ffffffffc04deaa5>] xfs_fs_mount+0x15/0x20 [xfs]
[  228.928845]  [<ffffffffb064d1fe>] mount_fs+0x3e/0x1b0
[  228.928850]  [<ffffffffb066b377>] vfs_kern_mount+0x67/0x110
[  228.928852]  [<ffffffffb066dacf>] do_mount+0x1ef/0xce0
[  228.928855]  [<ffffffffb064521a>] ? __check_object_size+0x1ca/0x250
[  228.928858]  [<ffffffffb062368c>] ? kmem_cache_alloc_trace+0x3c/0x200
[  228.928860]  [<ffffffffb066e903>] SyS_mount+0x83/0xd0
[  228.928863]  [<ffffffffb0b8bede>] system_call_fastpath+0x25/0x2a
[  228.928884] Sched Debug Version: v0.11, 3.10.0-1062.el7.x86_64 #1
[  228.928886] ktime                                   : 228605.351961
[  228.928887] sched_clk                               : 228928.883526
[  228.928888] cpu_clk                                 : 228928.883743
[  228.928889] jiffies                                 : 4294895902
[  228.928891] sched_clock_stable()                    : 1

[  228.928893] sysctl_sched
[  228.928894]   .sysctl_sched_latency                    : 24.000000
[  228.928896]   .sysctl_sched_min_granularity            : 10.000000
[  228.928897]   .sysctl_sched_wakeup_granularity         : 15.000000
[  228.928898]   .sysctl_sched_child_runs_first           : 0
[  228.928899]   .sysctl_sched_features                   : 56955
[  228.928900]   .sysctl_sched_tunable_scaling            : 1 (logaritmic)

Every 120 seconds, it adds to dmesg:

[  241.320468] INFO: task mount:11341 blocked for more than 120 seconds.
[  241.321253] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  241.321862] mount           D ffff96f79a553150     0 11341  11254 0x00000080
[  241.321866] Call Trace:
[  241.321873]  [<ffffffffb0b7f1c9>] schedule+0x29/0x70
[  241.321879]  [<ffffffffb0b7cb51>] schedule_timeout+0x221/0x2d0
[  241.321881]  [<ffffffffb0b7f57d>] wait_for_completion+0xfd/0x140
[  241.321887]  [<ffffffffb04da0b0>] ? wake_up_state+0x20/0x20
[  241.321931]  [<ffffffffc04c599e>] ? xfs_buf_delwri_submit+0x5e/0xf0 [xfs]
[  241.321945]  [<ffffffffc04c3217>] xfs_buf_iowait+0x27/0xb0 [xfs]
[  241.321962]  [<ffffffffc04c599e>] xfs_buf_delwri_submit+0x5e/0xf0 [xfs]
[  241.321976]  [<ffffffffc04f2a9e>] xlog_do_recovery_pass+0x3ae/0x6e0 [xfs]
[  241.321990]  [<ffffffffc04f2e59>] xlog_do_log_recovery+0x89/0xd0 [xfs]
[  241.322003]  [<ffffffffc04f2ed1>] xlog_do_recover+0x31/0x180 [xfs]
[  241.322017]  [<ffffffffc04f3fef>] xlog_recover+0xbf/0x190 [xfs]
[  241.322030]  [<ffffffffc04e658f>] xfs_log_mount+0xff/0x310 [xfs]
[  241.322043]  [<ffffffffc04dd1b0>] xfs_mountfs+0x520/0x8e0 [xfs]
[  241.322057]  [<ffffffffc04e02a0>] xfs_fs_fill_super+0x410/0x550 [xfs]
[  241.322064]  [<ffffffffb064c893>] mount_bdev+0x1b3/0x1f0
[  241.322077]  [<ffffffffc04dfe90>] ?
xfs_test_remount_options.isra.12+0x70/0x70 [xfs]
[  241.322090]  [<ffffffffc04deaa5>] xfs_fs_mount+0x15/0x20 [xfs]
[  241.322092]  [<ffffffffb064d1fe>] mount_fs+0x3e/0x1b0
[  241.322095]  [<ffffffffb066b377>] vfs_kern_mount+0x67/0x110
[  241.322097]  [<ffffffffb066dacf>] do_mount+0x1ef/0xce0
[  241.322099]  [<ffffffffb064521a>] ? __check_object_size+0x1ca/0x250
[  241.322102]  [<ffffffffb062368c>] ? kmem_cache_alloc_trace+0x3c/0x200
[  241.322104]  [<ffffffffb066e903>] SyS_mount+0x83/0xd0
[  241.322107]  [<ffffffffb0b8bede>] system_call_fastpath+0x25/0x2a

Can anyone suggest what is causing mount to hang?

Bart

On Sun, Mar 8, 2020 at 6:32 PM Bart Brashers <bart.brashers@xxxxxxxxx> wrote:
>
> Thanks Dave!
>
> We had what I think was a power fluctuation, and several more drives
> went offline in my JBOD. I had to power-cycle the JBOD to make them
> show "online" again. I unmounted the arrays first, though.
>
> After doing the "echo w > /proc/sysrq-trigger" I was able to mount the
> problematic filesystem directly, no having to read dmesg output. If
> that was due to the power cycling and forcing logicalvolumes to be
> "optimal" (online) again, I don't know.
>
> I was able to run xfs_repair on both filesystems, and have tons of
> files in lost+found to parse now, but at least I have most of my data
> back.
>
> Thanks!
>
> Bart
>
>
> Bart
> ---
> Bart Brashers
> 3039 NW 62nd St
> Seattle WA 98107
> 206-789-1120 Home
> 425-412-1812 Work
> 206-550-2606 Mobile
>
>
> On Sun, Mar 8, 2020 at 3:26 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
> > On Sun, Mar 08, 2020 at 12:43:29PM -0700, Bart Brashers wrote:
> > > An update:
> > >
> > > Mounting the degraded xfs filesystem still hangs, so I can't replay
> > > the journal, so I don't yet want to run xfs_repair.
> >
> > echo w > /proc/sysrq-trigger
> >
> > and dump demsg to find where it is hung. If it is not hung and is
> > instead stuck in a loop, use 'echo l > /proc/sysrq-trigger'.
> >
> > > I can mount the degraded xfs filesystem like this:
> > >
> > > $ mount -t xfs -o ro,norecovery,inode64,logdev=/dev/md/nvme2
> > > /dev/volgrp4TB/lvol4TB /export/lvol4TB/
> > >
> > > If I do a "du" on the contents, I see 3822 files with either
> > > "Structure needs cleaning" or "No such file or directory".
> >
> > TO be expected - you mounted an inconsistent filesystem image and
> > it's falling off the end of structures that are incomplete and
> > require recovery to make consistent.
> >
> > > Is what I mounted what I would get if I used the xfs_repair -L option,
> > > and discarded the journal? Or would there be more corruption, e.g. to
> > > the directory structure?
> >
> > Maybe. Maybe more, maybe less. Maybe.
> >
> > > Some of the instances of "No such file or directory" are for files
> > > that are not in their correct directory - I can tell by the filetype
> > > and the directory name. Does that by itself imply directory
> > > corruption?
> >
> > Maybe.
> >
> > It also may imply log recovery has not been run and so things
> > like renames are not complete on disk, and recvoery would fix that.
> >
> > But keep in mind your array had a triple disk failure, so there is
> > going to be -something- lost and not recoverable. That may well be
> > in the journal, at which point repair is your only option...
> >
> > > At this point, can I do a backup, either using rsync or xfsdump or
> > > xfs_copy?
> >
> > Do it any way you want.
> >
> > > I have a separate RAID array on the same server where I
> > > could put the 7.8 TB of data, though the destination already has data
> > > on it - so I don't think xfs_copy is right. Is xfsdump to a directory
> > > faster/better than rsync? Or would it be best to use something like
> > >
> > > $ tar cf - /export/lvol4TB/directory | (cd /export/lvol6TB/ ; tar xfp -)
> >
> > Do it how ever you are confident the data gets copied reliably in
> > the face of filesystem traversal errors.
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@xxxxxxxxxxxxx