Re: regression: CPU soft lockup with raid10: check slab-out-of-bounds in md_bitmap_get_counter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 4/25/24 12:52 PM, Song Liu wrote:
On Thu, Apr 25, 2024 at 5:10 AM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:

On 4/24/24 2:57 AM, Yu Kuai wrote:
Hi, Nigel

在 2024/04/21 20:30, Nigel Croxon 写道:
On 4/20/24 2:09 AM, Yu Kuai wrote:
Hi,

在 2024/04/20 3:49, Nigel Croxon 写道:
There is a problem with this commit, it causes a CPU#x soft lockup

commit 301867b1c16805aebbc306aafa6ecdc68b73c7e5
Author: Li Nan <linan122@xxxxxxxxxx>
Date:   Mon May 15 21:48:05 2023 +0800
md/raid10: check slab-out-of-bounds in md_bitmap_get_counter

Did you found this commit by bisect?

Yes, found this issue by bisecting...

Message from syslogd@rhel9 at Apr 19 14:14:55 ...
   kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 26s!
[mdX_resync:6976]

dmesg:

[  104.245585] CPU: 7 PID: 3588 Comm: mdX_resync Kdump: loaded Not
tainted 6.9.0-rc4-next-20240419 #1
[  104.245588] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS 1.16.2-1.fc38 04/01/2014
[  104.245590] RIP: 0010:_raw_spin_unlock_irq+0x13/0x30
[  104.245598] Code: 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 0f 1f 44 00 00 c6 07 00 90 90 90 fb 65 ff
0d 95 9f 75 76 <74> 05 c3 cc cc cc cc 0f 1f 44 00 00 c3 cc cc cc cc
cc cc cc cc cc
[  104.245601] RSP: 0018:ffffb2d74a81bbf8 EFLAGS: 00000246
[  104.245603] RAX: 0000000000000000 RBX: 0000000001000000 RCX:
000000000000000c
[  104.245604] RDX: 0000000000000000 RSI: 0000000001000000 RDI:
ffff926160ccd200
[  104.245606] RBP: ffffb2d74a81bcd0 R08: 0000000000000013 R09:
0000000000000000
[  104.245607] R10: 0000000000000000 R11: ffffb2d74a81bad8 R12:
0000000000000000
[  104.245608] R13: 0000000000000000 R14: ffff926160ccd200 R15:
ffff926151019000
[  104.245611] FS:  0000000000000000(0000)
GS:ffff9273f9580000(0000) knlGS:0000000000000000
[  104.245613] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  104.245614] CR2: 00007f23774d2584 CR3: 0000000104098003 CR4:
0000000000370ef0
[  104.245616] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  104.245617] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  104.245618] Call Trace:
[  104.245620]  <IRQ>
[  104.245623]  ? watchdog_timer_fn+0x1e3/0x260
[  104.245630]  ? __pfx_watchdog_timer_fn+0x10/0x10
[  104.245634]  ? __hrtimer_run_queues+0x112/0x2a0
[  104.245638]  ? hrtimer_interrupt+0xff/0x240
[  104.245640]  ? sched_clock+0xc/0x30
[  104.245644]  ? __sysvec_apic_timer_interrupt+0x54/0x140
[  104.245649]  ? sysvec_apic_timer_interrupt+0x6c/0x90
[  104.245652]  </IRQ>
[  104.245653]  <TASK>
[  104.245654]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[  104.245659]  ? _raw_spin_unlock_irq+0x13/0x30
[  104.245661]  md_bitmap_start_sync+0x6b/0xf0
Can you give the following patch a test as well? I believe this is
the root cause why page > bitmap->pages, dm-raid is using the wrong
bitmap size.

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index abe88d1e6735..d9c65ef9c9fb 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -4052,7 +4052,8 @@ static int raid_preresume(struct dm_target *ti)
                mddev->bitmap_info.chunksize !=
to_bytes(rs->requested_bitmap_chunk_sectors)))) {
                 int chunksize =
to_bytes(rs->requested_bitmap_chunk_sectors) ?:
mddev->bitmap_info.chunksize;

-               r = md_bitmap_resize(mddev->bitmap,
mddev->dev_sectors, chunksize, 0);
+               r = md_bitmap_resize(mddev->bitmap,
mddev->resync_max_sectors,
+                                    chunksize, 0);
                 if (r)
                         DMERR("Failed to resize bitmap");
         }

Thanks,
Kuai
Hello Kaui,

Tested and found no issues. Good to go..

-Nigel
Thanks for the fixes and the tests.

For the next step, do we need both patches or just one of them?

Song

They both fix the problem independently without the other.

-Nigel





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux