Re: Reproducible system lockup, extracting files into XFS on dm-raid5 on dm-integrity on HDD

Roger Heflin <rogerheflin@xxxxxxxxx> · Thu, 6 Jun 2024 07:17:27 -0500

grep -E 'Dirty|Write' /proc/meminfo

The above are the amount of outstanding writes that need to be
processed, I would expect it to be significant.
sysctl -a 2>/dev/null | grep -iE 'dirty_ratio|dirty_bytes|dirty_background'
vm.dirty_background_bytes = 3000000
vm.dirty_background_ratio = 0
vm.dirty_bytes = 5000000
vm.dirty_ratio = 0

Default you will be using ratio and ratio is a % of ram.
dirty_(ratio|bytes) is the high water mark  (background is the low
water mark) and the OS will stop writes when the high water mark is
hit until you reach the low water mark.  With spinning disks and a tar
file and the default setting that will be GB's of data requiring a
significant number of seeks and the high water to lower water mark
could be a really long time (many minutes or even hours if you have
enough ram).

In my experience about all these settings do is make it appear IO
finished when it really did not (I believe the original intent of
these changes may have been to improve benchmark results when the
benchmark did not force sync).

The downsides are that at any given time a much higher amount of
IO/data could be in ram only and not on disk and if a crash
(hardware/software/power) were to happen a lot of data could get lost.

Because of that I set these values much lower (see above).  It appears
that you need a write IO cache, but really it does not need to be
huge.

The settings above will stop writes when 5MB is hit and restart at 3MB
(likely a pause of well under a second, rather than locking up the
machine for a really long time).

On Wed, Jun 5, 2024 at 1:41 PM Zack Weinberg <zack@xxxxxxxxxxxx> wrote:
>
> I am experimenting with the use of dm-integrity underneath dm-raid,
> to get around the problem where, if a RAID 1 or RAID 5 array is
> inconsistent, you may not know which copy is the good one.  I have found
> a reproducible hard lockup involving XFS, RAID 5 and dm-integrity.
>
> My test array consists of three spinning HDDs (each 2 decimal
> terabytes), each with dm-integrity laid directly onto the disk
> (no partition table), using SHA-256 checksums.  On top of this is
> an MD-RAID array (raid5), and on top of *that* is an ordinary XFS
> filesystem.
>
> Extracting a large tar archive (970 G) into the filesystem causes a hard
> lockup -- the entire system becomes unresponsive -- after some tens of
> gigabytes have been extracted.  I have reproduced the lockup using
> kernel versions 6.6.21 and 6.9.3.  No error messages make it to the
> console, but with 6.9.3 I was able to extract almost all of a lockdep
> report from pstore.  I don't fully understand lockdep reports, but it
> *looks* like it might be a livelock rather than a deadlock, with all
> available kworker threads so bogged down with dm-integrity chores that
> an XFS log flush is blocked long enough to hit the hung task timeout.
>
> Attached are:
>
> - what I have of the lockdep report (kernel 6.9.3) (only a couple
>   of lines at the very top are missing)
> - kernel .config (6.9.3, lockdep enabled)
> - dmesg up till userspace starts (6.6.21, lockdep not enabled)
> - details of the test array configuration
>
> Please advise if there is any more information you need.  I am happy to
> test patches.  I'm not subscribed to either dm-devel or linux-xfs.
>
> zw
>
> p.s. Incidentally, why doesn't the dm-integrity superblock record the
> checksum algorithm in use?