Re: raid5: BUG_ON(atomic_inc_return(&sh->count) != 1)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



yuyufen kirjoitti 2018-09-15 16:08:
Hi, guys

Hi,

We recently report a BUG in raid5 on redhat 7.2. We want to know is
anyone encountered the same problem?

I just hit the same BUG_ON() in __get_priority_stripe() on 4.18.16-200.fc28.

Photo of stacktrace: http://onse.fi/files/raid5-sh-count-4.18.16.jpg

Nothing special was going on (i.e. no resync etc.), but the array was under regular R/W load. The system had also been running fine for a long time, so the issue seems rare.

I'll run v4.19 next, though I see no post-4.18 changes to this code. I'll report if I encounter this again.


Array details:

$ sudo mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Mon Jul 21 06:30:25 2014
     Raid Level : raid5
     Array Size : 486035776 (463.52 GiB 497.70 GB)
  Used Dev Size : 243017888 (231.76 GiB 248.85 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Tue Jan  1 05:55:01 2019
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 32K

           Name : redacted:2  (local to host redacted)
           UUID : c3971148:b56063bc:bb259a8a:d6531cfb
         Events : 1827

    Number   Major   Minor   RaidDevice State
       0       8       50        0      active sync   /dev/sdd2
       1       8       82        1      active sync   /dev/sdf2
       3       8       66        2      active sync   /dev/sde2

$ sudo mdadm --examine /dev/sdd2
/dev/sdd2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : c3971148:b56063bc:bb259a8a:d6531cfb
           Name : redacted:2  (local to host redacted)
  Creation Time : Mon Jul 21 06:30:25 2014
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 486035824 (231.76 GiB 248.85 GB)
     Array Size : 486035776 (463.52 GiB 497.70 GB)
  Used Dev Size : 486035776 (231.76 GiB 248.85 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=48 sectors
          State : clean
    Device UUID : 458fc191:453a4de5:6a1e6272:374b0f03

    Update Time : Tue Jan  1 06:02:31 2019
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 530b4275 - correct
         Events : 1827

         Layout : left-symmetric
     Chunk Size : 32K

   Device Role : Active device 0
   Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)



The stack as follow:

[78008.110094] ------------[ cut here ]------------
[78008.110898] kernel BUG at drivers/md/raid5.c:4966!
[78008.111718] invalid opcode: 0000 [#1] SMP
[78008.131948] task: ffff882f2ab28b80 ti: ffff882f2ab30000 task.ti:
ffff882f2ab30000
[78008.132734] RIP: 0010:[<ffffffffa01d584e>] [<ffffffffa01d584e>]
handle_active_stripes.isra.41+0x4de/0x4f0 [raid456]
[78008.133776] RSP: 0018:ffff882f2ab33cc8  EFLAGS: 00010086
[78008.134330] RAX: 00000000ffffffff RBX: ffff882f36ff5400 RCX: dead000000000200 [78008.135025] RDX: ffff882f36ff5488 RSI: 00000000ffffffff RDI: ffff882f2b7d8508 [78008.135804] RBP: ffff882f2ab33d68 R08: ffff882f2b7d8508 R09: ffff882f36ff5428 [78008.136723] R10: 0000000000000000 R11: 0000000000000000 R12: ffff882f2b7d84f8 [78008.137936] R13: 0000000000000000 R14: ffff882f36ff5498 R15: ffff882f2b7d8508
[78008.139151] FS:  0000000000000000(0000) GS:ffff882f7fc80000(0000)
knlGS:0000000000000000
[78008.140496] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[78008.141441] CR2: 00007f888240d000 CR3: 0000002af30c0000 CR4: 00000000003407e0 [78008.142620] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [78008.143815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[78008.145027] Stack:
[78008.145383]  0000000000000046 0000000000011628 0000000000011628
ffff882f36ff5670
[78008.146700]  ffffffffffffffd8 ffffffff2b5a31c0 ffff882f2b5a0ee8
ffff882f2b177740
[78008.148079]  ffff882f2b7d89f0 ffff882f2ae8ca88 ffff882f2b170ee8
ffff882f2b5a7740
[78008.149385]  ffff882f2a84acb8 ffff882f2b5a31b0 000000009f37ab80
0000000000000000
[78008.150709]  ffff882f2ab33dd0 ffff882f36ff5400 ffff882f36ff5670
ffff882f2ab28b80
[78008.152009] Call Trace:
[78008.152450]  [<ffffffffa01d5d68>] raid5d+0x508/0x760 [raid456]
[78008.153343]  [<ffffffff814c68d5>] md_thread+0x155/0x1a0
[78008.154214]  [<ffffffff810a71d0>] ? wake_up_atomic_t+0x30/0x30
[78008.155181]  [<ffffffff814c6780>] ? md_safemode_timeout+0x50/0x50
[78008.156194]  [<ffffffff810a61af>] kthread+0xcf/0xe0
[78008.157015] [<ffffffff810a60e0>] ? kthread_create_on_node+0x140/0x140
[78008.158128]  [<ffffffff816523d8>] ret_from_fork+0x58/0x90
[78008.161126] [<ffffffff810a60e0>] ? kthread_create_on_node+0x140/0x140
[78008.164297] Code: c6 f8 a4 1d a0 4c 0f 44 c2 4c 39 e0 48 0f 44 ca
48 c7 c2 50 84 1d a0 31 c0 e8 df 6d 14 e1 49 8b 04 24 44 8b 5d 88 e9
a8 fb ff ff <0f> 0b e8 cb 57 ea e0 66 66 2e 0f 1f 84 00 00 00 00 00 0f
1f 44
[78008.172435] RIP  [<ffffffffa01d584e>]
handle_active_stripes.isra.41+0x4de/0x4f0 [raid456]
[78008.174520]  RSP <ffff882f2ab33cc8>
[78008.178475] ---[ end trace 7f1857a07ac12adf ]---
[78008.181447] Kernel panic - not syncing: Fatal exception
[78008.184775] die even has been record!


BUG_ON code as follow:

4893 static struct stripe_head *__get_priority_stripe(struct r5conf
*conf, int group)
4894 {
4895     struct stripe_head *sh = NULL, *tmp;
4896     struct list_head *handle_list = NULL;
4897     struct r5worker_group *wg = NULL;
......
4957
4958     if (!sh)
4959         return NULL;
4960
4961     if (wg) {
4962         wg->stripes_cnt--;
4963         sh->group = NULL;
4964     }
4965     list_del_init(&sh->lru);
4966     BUG_ON(atomic_inc_return(&sh->count) != 1); //BUG_ON here
4967     return sh;
4968 }


# mdadm -D /dev/md6
/dev/md6:
        Version : 1.2
  Creation Time : Mon Aug 27 12:45:44 2018
     Raid Level : raid5
     Array Size : 3516065792 (3353.18 GiB 3600.45 GB)
  Used Dev Size : 1758032896 (1676.59 GiB 1800.23 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Thu Sep 13 03:57:33 2018
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

Name : host-172-16-39-106:6 (local to host host-172-16-39-106)
           UUID : 80e6ca07:7218a374:32c672c5:f516eb4e
         Events : 4241

    Number   Major   Minor   RaidDevice State
       0      65       32        0      active sync   /dev/sds
       1      65       48        1      active sync   /dev/sdt
       3      65       64        2      active sync   /dev/sdu

# mdadm -E /dev/sds
/dev/sds:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 80e6ca07:7218a374:32c672c5:f516eb4e
Name : host-172-16-39-106:6 (local to host host-172-16-39-106)
  Creation Time : Mon Aug 27 12:45:44 2018
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 3516066224 (1676.59 GiB 1800.23 GB)
     Array Size : 3516065792 (3353.18 GiB 3600.45 GB)
  Used Dev Size : 3516065792 (1676.59 GiB 1800.23 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=432 sectors
          State : clean
    Device UUID : 0eacd8fd:5d427c77:8000c2b4:035928de

Internal Bitmap : 8 sectors from superblock
    Update Time : Thu Sep 13 03:59:33 2018
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : d7946e32 - correct
         Events : 4241

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)



Any suggestions are welcome.

Thanks a lot
Yufen

--
Anssi Hannula



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux