Re: raid5: BUG_ON(atomic_inc_return(&sh->count) != 1)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



yuyufen> We recently report a BUG in raid5 on redhat 7.2. We want to
yuyufen> know is anyone encountered the same problem?

Are you talking RHEL 7.2 (current-ish) or Redhat 7.2, which is
something like 15 years out of date?  In any case, since this is a
redhat supported kernel, you'll have to bring it up with them.  But in
general, I'd suggest you upgrade to RHEL 7.5 and see if the problem
still happens.

But you also would need to provide the kernel you're using, and if
not something recent like 4.18 or newer, then the first thing to do is
to upgrade your kernel and see if that fixed the issue.  But in any
case, please provide more details.

John


yuyufen> The stack as follow:

yuyufen> [78008.110094] ------------[ cut here ]------------
yuyufen> [78008.110898] kernel BUG at drivers/md/raid5.c:4966!
yuyufen> [78008.111718] invalid opcode: 0000 [#1] SMP
yuyufen> [78008.131948] task: ffff882f2ab28b80 ti: ffff882f2ab30000 task.ti: 
yuyufen> ffff882f2ab30000
yuyufen> [78008.132734] RIP: 0010:[<ffffffffa01d584e>] [<ffffffffa01d584e>] 
yuyufen> handle_active_stripes.isra.41+0x4de/0x4f0 [raid456]
yuyufen> [78008.133776] RSP: 0018:ffff882f2ab33cc8  EFLAGS: 00010086
yuyufen> [78008.134330] RAX: 00000000ffffffff RBX: ffff882f36ff5400 RCX: 
yuyufen> dead000000000200
yuyufen> [78008.135025] RDX: ffff882f36ff5488 RSI: 00000000ffffffff RDI: 
yuyufen> ffff882f2b7d8508
yuyufen> [78008.135804] RBP: ffff882f2ab33d68 R08: ffff882f2b7d8508 R09: 
yuyufen> ffff882f36ff5428
yuyufen> [78008.136723] R10: 0000000000000000 R11: 0000000000000000 R12: 
yuyufen> ffff882f2b7d84f8
yuyufen> [78008.137936] R13: 0000000000000000 R14: ffff882f36ff5498 R15: 
yuyufen> ffff882f2b7d8508
yuyufen> [78008.139151] FS:  0000000000000000(0000) GS:ffff882f7fc80000(0000) 
yuyufen> knlGS:0000000000000000
yuyufen> [78008.140496] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
yuyufen> [78008.141441] CR2: 00007f888240d000 CR3: 0000002af30c0000 CR4: 
yuyufen> 00000000003407e0
yuyufen> [78008.142620] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
yuyufen> 0000000000000000
yuyufen> [78008.143815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
yuyufen> 0000000000000400
yuyufen> [78008.145027] Stack:
yuyufen> [78008.145383]  0000000000000046 0000000000011628 0000000000011628 
yuyufen> ffff882f36ff5670
yuyufen> [78008.146700]  ffffffffffffffd8 ffffffff2b5a31c0 ffff882f2b5a0ee8 
yuyufen> ffff882f2b177740
yuyufen> [78008.148079]  ffff882f2b7d89f0 ffff882f2ae8ca88 ffff882f2b170ee8 
yuyufen> ffff882f2b5a7740
yuyufen> [78008.149385]  ffff882f2a84acb8 ffff882f2b5a31b0 000000009f37ab80 
yuyufen> 0000000000000000
yuyufen> [78008.150709]  ffff882f2ab33dd0 ffff882f36ff5400 ffff882f36ff5670 
yuyufen> ffff882f2ab28b80
yuyufen> [78008.152009] Call Trace:
yuyufen> [78008.152450]  [<ffffffffa01d5d68>] raid5d+0x508/0x760 [raid456]
yuyufen> [78008.153343]  [<ffffffff814c68d5>] md_thread+0x155/0x1a0
yuyufen> [78008.154214]  [<ffffffff810a71d0>] ? wake_up_atomic_t+0x30/0x30
yuyufen> [78008.155181]  [<ffffffff814c6780>] ? md_safemode_timeout+0x50/0x50
yuyufen> [78008.156194]  [<ffffffff810a61af>] kthread+0xcf/0xe0
yuyufen> [78008.157015]  [<ffffffff810a60e0>] ? kthread_create_on_node+0x140/0x140
yuyufen> [78008.158128]  [<ffffffff816523d8>] ret_from_fork+0x58/0x90
yuyufen> [78008.161126]  [<ffffffff810a60e0>] ? kthread_create_on_node+0x140/0x140
yuyufen> [78008.164297] Code: c6 f8 a4 1d a0 4c 0f 44 c2 4c 39 e0 48 0f 44 ca 48 
yuyufen> c7 c2 50 84 1d a0 31 c0 e8 df 6d 14 e1 49 8b 04 24 44 8b 5d 88 e9 a8 fb 
yuyufen> ff ff <0f> 0b e8 cb 57 ea e0 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
yuyufen> [78008.172435] RIP  [<ffffffffa01d584e>] 
yuyufen> handle_active_stripes.isra.41+0x4de/0x4f0 [raid456]
yuyufen> [78008.174520]  RSP <ffff882f2ab33cc8>
yuyufen> [78008.178475] ---[ end trace 7f1857a07ac12adf ]---
yuyufen> [78008.181447] Kernel panic - not syncing: Fatal exception
yuyufen> [78008.184775] die even has been record!


yuyufen> BUG_ON code as follow:

yuyufen> 4893 static struct stripe_head *__get_priority_stripe(struct r5conf 
yuyufen> *conf, int group)
yuyufen> 4894 {
yuyufen> 4895     struct stripe_head *sh = NULL, *tmp;
yuyufen> 4896     struct list_head *handle_list = NULL;
yuyufen> 4897     struct r5worker_group *wg = NULL;
yuyufen> ......
yuyufen> 4957
yuyufen> 4958     if (!sh)
yuyufen> 4959         return NULL;
yuyufen> 4960
yuyufen> 4961     if (wg) {
yuyufen> 4962         wg->stripes_cnt--;
yuyufen> 4963         sh->group = NULL;
yuyufen> 4964     }
yuyufen> 4965     list_del_init(&sh->lru);
yuyufen> 4966     BUG_ON(atomic_inc_return(&sh->count) != 1); //BUG_ON here
yuyufen> 4967     return sh;
yuyufen> 4968 }


yuyufen> # mdadm -D /dev/md6
yuyufen> /dev/md6:
yuyufen>          Version : 1.2
yuyufen>    Creation Time : Mon Aug 27 12:45:44 2018
yuyufen>       Raid Level : raid5
yuyufen>       Array Size : 3516065792 (3353.18 GiB 3600.45 GB)
yuyufen>    Used Dev Size : 1758032896 (1676.59 GiB 1800.23 GB)
yuyufen>     Raid Devices : 3
yuyufen>    Total Devices : 3
yuyufen>      Persistence : Superblock is persistent

yuyufen>    Intent Bitmap : Internal

yuyufen>      Update Time : Thu Sep 13 03:57:33 2018
yuyufen>            State : clean
yuyufen>   Active Devices : 3
yuyufen> Working Devices : 3
yuyufen>   Failed Devices : 0
yuyufen>    Spare Devices : 0

yuyufen>           Layout : left-symmetric
yuyufen>       Chunk Size : 512K

yuyufen>             Name : host-172-16-39-106:6  (local to host host-172-16-39-106)
yuyufen>             UUID : 80e6ca07:7218a374:32c672c5:f516eb4e
yuyufen>           Events : 4241

yuyufen>      Number   Major   Minor   RaidDevice State
yuyufen>         0      65       32        0      active sync   /dev/sds
yuyufen>         1      65       48        1      active sync   /dev/sdt
yuyufen>         3      65       64        2      active sync   /dev/sdu

yuyufen> # mdadm -E /dev/sds
yuyufen> /dev/sds:
yuyufen>            Magic : a92b4efc
yuyufen>          Version : 1.2
yuyufen>      Feature Map : 0x1
yuyufen>       Array UUID : 80e6ca07:7218a374:32c672c5:f516eb4e
yuyufen>             Name : host-172-16-39-106:6  (local to host host-172-16-39-106)
yuyufen>    Creation Time : Mon Aug 27 12:45:44 2018
yuyufen>       Raid Level : raid5
yuyufen>     Raid Devices : 3

yuyufen>   Avail Dev Size : 3516066224 (1676.59 GiB 1800.23 GB)
yuyufen>       Array Size : 3516065792 (3353.18 GiB 3600.45 GB)
yuyufen>    Used Dev Size : 3516065792 (1676.59 GiB 1800.23 GB)
yuyufen>      Data Offset : 262144 sectors
yuyufen>     Super Offset : 8 sectors
yuyufen>     Unused Space : before=262056 sectors, after=432 sectors
yuyufen>            State : clean
yuyufen>      Device UUID : 0eacd8fd:5d427c77:8000c2b4:035928de

yuyufen> Internal Bitmap : 8 sectors from superblock
yuyufen>      Update Time : Thu Sep 13 03:59:33 2018
yuyufen>    Bad Block Log : 512 entries available at offset 72 sectors
yuyufen>         Checksum : d7946e32 - correct
yuyufen>           Events : 4241

yuyufen>           Layout : left-symmetric
yuyufen>       Chunk Size : 512K

yuyufen>     Device Role : Active device 0
yuyufen>     Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)



yuyufen> Any suggestions are welcome.

yuyufen> Thanks a lot
yuyufen> Yufen




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux