Re: PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files

Christian Theune <ct@xxxxxxxxxxxxxxx> · Thu, 8 Aug 2024 09:06:44 +0200

Hi,

> On 8. Aug 2024, at 08:55, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
> 
> Since 6.10 is the same, I take a closer look at this.

Much appreciated, thanks!

> At first is this a new problem or a new scenario?

Both? ;)

The user-level scenario (rsyncing those files) is a regular task that has been working fine. The new aspect of the scenario is only that we’re now using NVMe. The problem has not been observed before.

>> [ 7497.019235] INFO: task .backy-wrapped:2706 blocked for more than 122 seconds.
>> [ 7497.027265]       Not tainted 6.10.3 #1-NixOS
>> [ 7497.032173] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 7497.040974] task:.backy-wrapped  state:D stack:0     pid:2706  tgid:2706  ppid:1      flags:0x00000002
>> [ 7497.040979] Call Trace:
>> [ 7497.040981]  <TASK>
>> [ 7497.040987]  __schedule+0x3fa/0x1550
>> [ 7497.040996]  ? xfs_iextents_copy+0xec/0x1b0 [xfs]
>> [ 7497.041085]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [ 7497.041089]  ? xlog_copy_iovec+0x30/0x90 [xfs]
>> [ 7497.041168]  schedule+0x27/0xf0
>> [ 7497.041171]  io_schedule+0x46/0x70
>> [ 7497.041173]  folio_wait_bit_common+0x13f/0x340
>> [ 7497.041180]  ? __pfx_wake_page_function+0x10/0x10
>> [ 7497.041187]  folio_wait_writeback+0x2b/0x80
>> [ 7497.041191]  truncate_inode_partial_folio+0x5b/0x190
>> [ 7497.041194]  truncate_inode_pages_range+0x1de/0x400
>> [ 7497.041207]  evict+0x1b0/0x1d0
>> [ 7497.041212]  __dentry_kill+0x6e/0x170
>> [ 7497.041216]  dput+0xe5/0x1b0
>> [ 7497.041218]  do_renameat2+0x386/0x600
>> [ 7497.041226]  __x64_sys_rename+0x43/0x50
>> [ 7497.041229]  do_syscall_64+0xb7/0x200
>> [ 7497.041234]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> [ 7497.041236] RIP: 0033:0x7f4be586f75b
>> [ 7497.041265] RSP: 002b:00007fffd2706538 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
>> [ 7497.041267] RAX: ffffffffffffffda RBX: 00007fffd27065d0 RCX: 00007f4be586f75b
>> [ 7497.041269] RDX: 0000000000000000 RSI: 00007f4bd6f73e50 RDI: 00007f4bd6f732d0
>> [ 7497.041270] RBP: 00007fffd2706580 R08: 00000000ffffffff R09: 0000000000000000
>> [ 7497.041271] R10: 00007fffd27067b0 R11: 0000000000000246 R12: 00000000ffffff9c
>> [ 7497.041273] R13: 00000000ffffff9c R14: 0000000037fb4ab0 R15: 00007f4be5814810
>> [ 7497.041277]  </TASK>
>> [ 7497.041281] INFO: task kworker/u131:1:12780 blocked for more than 122 seconds.
>> [ 7497.049410]       Not tainted 6.10.3 #1-NixOS
>> [ 7497.054317] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 7497.063124] task:kworker/u131:1  state:D stack:0     pid:12780 tgid:12780 ppid:2      flags:0x00004000
>> [ 7497.063131] Workqueue: kcryptd-253:4-1 kcryptd_crypt [dm_crypt]
>> [ 7497.063140] Call Trace:
>> [ 7497.063141]  <TASK>
>> [ 7497.063145]  __schedule+0x3fa/0x1550
>> [ 7497.063154]  schedule+0x27/0xf0
>> [ 7497.063156]  md_bitmap_startwrite+0x14f/0x1c0
> 
> From code review, the counter for the bit reaches COUNTER_MAX, means
> there are already lots of write IO issued in the range represented by
> this bit. And md_bitmap_startwrite() is waiting for such IO to be done
> to issue new IO. Hence either IO is handered too slow or deadlock is
> triggered.
> 
> So for the next step, can you do the following test for problem
> identificatioin?
> 
> 1) With the hangtaks, are the underlying disks idle?(By iostat). And can
> you please collect /sys/block/[disk]/inflight for both raid and
> underlying disks.

I will try that.

> 2) Can you still reporduce the problem with raid1/raid10?
> 3) Can you still reporduce the problem with bitmap disabled? By adding
> md-bitmap=none while creating the array.

Here’s the part where debugging is limited: I do have valuable data on this machine. If we can get around re-creating the array that would be great. Otherwise I’ll have to move the data to a different host - this can be done but it will take some time.

I could take the hot-spare out of the cluster and do something with it, but I guess that doesn’t give you any insight, right?

I’ll get back to you with the information from question 1.

Christian

-- 
Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick