Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition

Donald Buczek <buczek@xxxxxxxxxxxxx> · Thu, 14 Sep 2023 08:03:53 +0200

On 9/13/23 16:16, Dragan Stancevic wrote:
Hi Donald-

On 9/13/23 04:08, Donald Buczek wrote:
On 9/5/23 3:54 PM, Dragan Stancevic wrote:
On 9/4/23 22:50, Yu Kuai wrote:
Hi,

在 2023/08/30 9:36, Yu Kuai 写道:
Hi,

在 2023/08/29 4:32, Dragan Stancevic 写道:

Just a followup on 6.1 testing. I tried reproducing this problem for 5 days with 6.1.42 kernel without your patches and I was not able to reproduce it.

oops, I forgot that you need to backport this patch first to reporduce
this problem:

https://lore.kernel.org/all/20230529132037.2124527-2-yukuai1@xxxxxxxxxxxxxxx/

The patch fix the deadlock as well, but it introduce some regressions.

We've just got an unplanned lock up on "check" to "idle" transition with 6.1.52 after a few hours on a backup server. For the last 2 1/2 years we used the patch I originally proposed with multiple kernel versions [1]. But this no longer seems to be valid or maybe its even destructive in combination with the other changes.

But I totally lost track of the further development. As I understood, there are patches queue up in mainline, which might go into 6.1, too, but have not landed there which should fix the problem?

Can anyone give me exact references to the patches I'd need to apply to 6.1.52, so that I could probably fix my problem and also test the patches for you those on production systems with a load which tends to run into that problem easily?

Here is a list of changes for 6.1:

e5e9b9cb71a0 md: factor out a helper to wake up md_thread directly
f71209b1f21c md: enhance checking in md_check_recovery()
753260ed0b46 md: wake up 'resync_wait' at last in md_reap_sync_thread()
130443d60b1b md: refactor idle/frozen_sync_thread() to fix deadlock
6f56f0c4f124 md: add a mutex to synchronize idle and frozen in action_store()
64e5e09afc14 md: refactor action_store() for 'idle' and 'frozen'
a865b96c513b Revert "md: unlock mddev before reap sync_thread in action_store"

Thanks!

I've put these patches on v6.1.52. I've started a script which transitions the three md-devices of a very active backup server through idle->check->idle every 6 minutes a few ours ago.  It went through ~400 iterations till now. No lock-ups so far.

LGTM !

  Donald

buczek@done:~$ dmesg|grep "data-check of RAID array"|wc
    393    2820   18864
buczek@done:~$ cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] [multipath]
md2 : active raid6 sdc[0] sdo[15] sdn[14] sdm[13] sdl[12] sdk[11] sdj[10] sdi[9] sdh[8] sdg[7] sdf[6] sde[5] sdd[4] sdr[3] sdq[2] sdp[1]
      109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
      [=========>...........]  check = 47.1% (3681799128/7813894144) finish=671.8min speed=102496K/sec
      bitmap: 0/59 pages [0KB], 65536KB chunk

md1 : active raid6 sdaa[0] sdz[15] sdy[14] sdx[13] sdw[12] sdv[11] sdu[10] sdt[16] sds[8] sdah[7] sdag[17] sdaf[5] sdae[4] sdad[3] sdac[2] sdab[1]
      109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
      [=======>.............]  check = 38.5% (3009484896/7813894144) finish=811.0min speed=98720K/sec
      bitmap: 0/59 pages [0KB], 65536KB chunk

md0 : active raid6 sdai[0] sdax[15] sdaw[16] sdav[13] sdau[12] sdat[11] sdas[10] sdar[9] sdaq[8] sdap[7] sdao[6] sdan[17] sdam[4] sdal[3] sdak[2] sdaj[1]
      109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
      [========>............]  check = 42.3% (3311789940/7813894144) finish=911.9min speed=82272K/sec
      bitmap: 6/59 pages [24KB], 65536KB chunk

unused devices: <none>

You can get them from the following tree:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

Thanks

   Donald

[1]: https://lore.kernel.org/linux-raid/bc342de0-98d2-1733-39cd-cc1999777ff3@xxxxxxxxxxxxx/

Ha, jinx :) I was about to email you that I isolated that change with the testing over the weekend that made it more difficult to reproduce in 6.1 and that the original change must be reverted :)

Thanks,
Kuai

It seems that 6.1 has some other code that prevents this from happening.

I see that there are lots of patches for raid456 between 5.10 and 6.1,
however, I remember that I used to reporduce the deadlock after 6.1, and
it's true it's not easy to reporduce, see below:

https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@xxxxxxxxxxxxxxx/

My guess is that 6.1 is harder to reporduce than 5.10 due to some
changes inside raid456.

By the way, raid10 had a similiar deadlock, and can be fixed the same
way, so it make sense to backport these patches.

https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@xxxxxxxxxxxxxxx

Thanks,
Kuai

On 5.10 I can reproduce it within minutes to an hour.

.

--
Donald Buczek
buczek@xxxxxxxxxxxxx
Tel: +49 30 8413 1433