Hi, guys
We recently report a BUG in raid5 on redhat 7.2. We want to know is
anyone encountered the same problem?
The stack as follow:
[78008.110094] ------------[ cut here ]------------
[78008.110898] kernel BUG at drivers/md/raid5.c:4966!
[78008.111718] invalid opcode: 0000 [#1] SMP
[78008.131948] task: ffff882f2ab28b80 ti: ffff882f2ab30000 task.ti:
ffff882f2ab30000
[78008.132734] RIP: 0010:[<ffffffffa01d584e>] [<ffffffffa01d584e>]
handle_active_stripes.isra.41+0x4de/0x4f0 [raid456]
[78008.133776] RSP: 0018:ffff882f2ab33cc8 EFLAGS: 00010086
[78008.134330] RAX: 00000000ffffffff RBX: ffff882f36ff5400 RCX:
dead000000000200
[78008.135025] RDX: ffff882f36ff5488 RSI: 00000000ffffffff RDI:
ffff882f2b7d8508
[78008.135804] RBP: ffff882f2ab33d68 R08: ffff882f2b7d8508 R09:
ffff882f36ff5428
[78008.136723] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff882f2b7d84f8
[78008.137936] R13: 0000000000000000 R14: ffff882f36ff5498 R15:
ffff882f2b7d8508
[78008.139151] FS: 0000000000000000(0000) GS:ffff882f7fc80000(0000)
knlGS:0000000000000000
[78008.140496] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[78008.141441] CR2: 00007f888240d000 CR3: 0000002af30c0000 CR4:
00000000003407e0
[78008.142620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[78008.143815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[78008.145027] Stack:
[78008.145383] 0000000000000046 0000000000011628 0000000000011628
ffff882f36ff5670
[78008.146700] ffffffffffffffd8 ffffffff2b5a31c0 ffff882f2b5a0ee8
ffff882f2b177740
[78008.148079] ffff882f2b7d89f0 ffff882f2ae8ca88 ffff882f2b170ee8
ffff882f2b5a7740
[78008.149385] ffff882f2a84acb8 ffff882f2b5a31b0 000000009f37ab80
0000000000000000
[78008.150709] ffff882f2ab33dd0 ffff882f36ff5400 ffff882f36ff5670
ffff882f2ab28b80
[78008.152009] Call Trace:
[78008.152450] [<ffffffffa01d5d68>] raid5d+0x508/0x760 [raid456]
[78008.153343] [<ffffffff814c68d5>] md_thread+0x155/0x1a0
[78008.154214] [<ffffffff810a71d0>] ? wake_up_atomic_t+0x30/0x30
[78008.155181] [<ffffffff814c6780>] ? md_safemode_timeout+0x50/0x50
[78008.156194] [<ffffffff810a61af>] kthread+0xcf/0xe0
[78008.157015] [<ffffffff810a60e0>] ? kthread_create_on_node+0x140/0x140
[78008.158128] [<ffffffff816523d8>] ret_from_fork+0x58/0x90
[78008.161126] [<ffffffff810a60e0>] ? kthread_create_on_node+0x140/0x140
[78008.164297] Code: c6 f8 a4 1d a0 4c 0f 44 c2 4c 39 e0 48 0f 44 ca 48
c7 c2 50 84 1d a0 31 c0 e8 df 6d 14 e1 49 8b 04 24 44 8b 5d 88 e9 a8 fb
ff ff <0f> 0b e8 cb 57 ea e0 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
[78008.172435] RIP [<ffffffffa01d584e>]
handle_active_stripes.isra.41+0x4de/0x4f0 [raid456]
[78008.174520] RSP <ffff882f2ab33cc8>
[78008.178475] ---[ end trace 7f1857a07ac12adf ]---
[78008.181447] Kernel panic - not syncing: Fatal exception
[78008.184775] die even has been record!
BUG_ON code as follow:
4893 static struct stripe_head *__get_priority_stripe(struct r5conf
*conf, int group)
4894 {
4895 struct stripe_head *sh = NULL, *tmp;
4896 struct list_head *handle_list = NULL;
4897 struct r5worker_group *wg = NULL;
......
4957
4958 if (!sh)
4959 return NULL;
4960
4961 if (wg) {
4962 wg->stripes_cnt--;
4963 sh->group = NULL;
4964 }
4965 list_del_init(&sh->lru);
4966 BUG_ON(atomic_inc_return(&sh->count) != 1); //BUG_ON here
4967 return sh;
4968 }
# mdadm -D /dev/md6
/dev/md6:
Version : 1.2
Creation Time : Mon Aug 27 12:45:44 2018
Raid Level : raid5
Array Size : 3516065792 (3353.18 GiB 3600.45 GB)
Used Dev Size : 1758032896 (1676.59 GiB 1800.23 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Thu Sep 13 03:57:33 2018
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : host-172-16-39-106:6 (local to host host-172-16-39-106)
UUID : 80e6ca07:7218a374:32c672c5:f516eb4e
Events : 4241
Number Major Minor RaidDevice State
0 65 32 0 active sync /dev/sds
1 65 48 1 active sync /dev/sdt
3 65 64 2 active sync /dev/sdu
# mdadm -E /dev/sds
/dev/sds:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 80e6ca07:7218a374:32c672c5:f516eb4e
Name : host-172-16-39-106:6 (local to host host-172-16-39-106)
Creation Time : Mon Aug 27 12:45:44 2018
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 3516066224 (1676.59 GiB 1800.23 GB)
Array Size : 3516065792 (3353.18 GiB 3600.45 GB)
Used Dev Size : 3516065792 (1676.59 GiB 1800.23 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=432 sectors
State : clean
Device UUID : 0eacd8fd:5d427c77:8000c2b4:035928de
Internal Bitmap : 8 sectors from superblock
Update Time : Thu Sep 13 03:59:33 2018
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : d7946e32 - correct
Events : 4241
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
Any suggestions are welcome.
Thanks a lot
Yufen