Re: PROBLEM: double fault in md_end_io

Paweł Wiejacha <pawel.wiejacha@xxxxxxxxxxxx> · Thu, 22 Apr 2021 17:40:28 +0200

I was able to reproduce this bug on a third machine. This time it's a
Xeon Gold 6230 CPU + 3x INTEL SSDPE2KE076T8. This machine is running
constantly the same type of work for almost two years and it was fine
with Ubuntu's 5.4 and older kernels. After upgrading the kernel to
5.8.0-50-generic (with backported "md: improve io stats accounting")
it crashed after 42 hours of running.

This time ERST data looks slightly different (the call trace
timestamps are before the panic timestamp and the panic message is
different):

dmesg-erst-6953966727021461507:
Panic#2 Part1
48180.542749]  ? bio_endio+0xef/0x150
<4>[148180.542749]  ? bio_chain_endio+0x2d/0x40
<4>[148180.542750]  ? md_end_io+0x5d/0x70
<4>[148180.542750]  ? bio_endio+0xef/0x150
<4>[148180.542750]  ? bio_chain_endio+0x2d/0x40
<4>[148180.542751]  ? md_end_io+0x5d/0x70
<4>[148180.542751]  ? bio_endio+0xef/0x150
<4>[148180.542751]  ? bio_chain_endio+0x2d/0x40
<4>[148180.542751]  ? md_end_io+0x5d/0x70
...
<4>[148180.542791]  ? md_end_io+0x5d/0x70
<4>[148180.542792]  ? bio_endio+0xef/0x150
<4>[148180.542792]  ? bio_chain_endio+0x2d/0x40
<4>[148180.542792]  ? md_end_io+0x5d/0x70
<4>[148180.542792]  ? bio_endio+0xef/0x150
<4>[148180.542793]  ? bio_chain_endio+0x2d/
<4>[148180.542794] Lost 353 message(s)!
<4>[148180.683385] ---[ end trace 00bf232b0f939095 ]---
<4>[148180.683385] RIP: 0010:__slab_free+0x26/0x330
<4>[148180.683386] Code: 1f 44 00 00 0f 1f 44 00 00 55 49 89 ca 45 89
c3 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 49 89 f4 53 48 83 e4 f0
48 83 ec 70 <48> 89 54 24 28 f7 47 08 00 0d 21 00 0f 85 ac 01 00 00 41
8b 46 20
<4>[148180.683387] RSP: 0018:ffffae2b19088fb0 EFLAGS: 00010082
<4>[148180.683388] RAX: ffff978cc0b31080 RBX: ffff978c29f25bc0 RCX:
ffff978c29f25bc0
<4>[148180.683389] RDX: ffff978c29f25bc0 RSI: ffffdb4638a7c940 RDI:
ffff978cc0407800
<4>[148180.683389] RBP: ffffae2b19089048 R08: 0000000000000001 R09:
ffffffffb0e345ee
<4>[148180.683390] R10: ffff978c29f25bc0 R11: 0000000000000001 R12:
ffffdb4638a7c940
<4>[148180.683390] R13: ffff978cc0407800 R14: ffff978cc0407800 R15:
0000000000000000
<4>[148180.683391] FS:  00007fcfdf7fe700(0000)
GS:ffff978cc0b00000(0000) knlGS:0000000000000000
<4>[148180.683391] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[148180.683392] CR2: ffffae2b19088fa8 CR3: 0000007a379ac001 CR4:
00000000007606e0
<4>[148180.683392] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
<4>[148180.683392] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
<4>[148180.683393] PKRU: 55555554
<0>[148180.683393] Kernel panic - not syncing: Fatal exception in interrupt
<0>[148180.683447] Kernel Offset: 0x2fc00000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
dmesg-erst-6953966727021461505:
Oops#1 Part1
0x70
<4>[148180.542739]  ? bio_endio+0xef/0x150
<4>[148180.542740]  ? bio_chain_endio+0x2d/0x40
<4>[148180.542740]  ? md_end_io+0x5d/0x70
<4>[148180.542740]  ? bio_endio+0xef/0x150
<4>[148180.542741]  ? bio_chain_endio+0x2d/0x40
<4>[148180.542741]  ? md_end_io+0x5d/0x70
<4>[148180.542741]  ? bio_endio+0xef/0x150
<4>[148180.542741]  ? bio_chain_endio+0x2d/0x40
<4>[148180.542742]  ? md_end_io+0x5d/0x70
...
<4>[148180.542792]  ? bio_chain_endio+0x2d/0x40
<4>[148180.542792]  ? md_end_io+0x5d/0x70
<4>[148180.542792]  ? bio_endio+0xef/0x150
<4>[148180.542793]  ? bio_chain_endio+0x2d/
<4>[148180.542794] Lost 353 message(s)!

If you cannot reproduce it, maybe there's a way I could help you to
solve this issue? For example, I could change the dumpstack.c to skip
printing repeated frames or something.

Regards,
Pawel Wiejacha

On Thu, 15 Apr 2021 at 17:35, Paweł Wiejacha
<pawel.wiejacha@xxxxxxxxxxxx> wrote:
>
> > Pawel, have you tried to repro with md-next branch?
>
> Yes, and it also crashes:
>
> $ git log --decorate --format=oneline
> ca882ba4c0478c68f927fa7b622ec17bde9361ce (HEAD -> md-next-pw2) using
> slab_pool for md_io_pool
> 2715e61834586cef8292fcaa457cbf2da955a3b8 (song-md/md-next) md/bitmap:
> wait for external bitmap writes to complete during tear down
> c84917aa5dfc4c809634120fb429f0ff590a1f75 md: do not return existing
> mddevs from mddev_find_or_alloc
>
> <0>[ 2086.596361] traps: PANIC: double fault, error_code: 0x0
> <4>[ 2086.596364] double fault: 0000 [#1] SMP NOPTI
> <4>[ 2086.596365] CPU: 40 PID: 0 Comm: swapper/40 Tainted: G
> OE     5.12.0-rc3-md-next-pw-fix2 #1
> <4>[ 2086.596365] Hardware name: empty S8030GM2NE/S8030GM2NE, BIOS
> V4.00 03/11/2021
> <4>[ 2086.596366] RIP: 0010:__slab_free+0x26/0x380
> <4>[ 2086.596367] Code: 1f 44 00 00 0f 1f 44 00 00 55 49 89 ca 45 89
> c3 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 49 89 f4 53 48 83 e4 f0
> 48 83 ec 70 <48> 89 54 24 28 0f 1f 44 00 00 41 8b 46 28 4d 8b 6c 24 20
> 49 8b 5c
> <4>[ 2086.596368] RSP: 0018:ffff96fa4d1c8f90 EFLAGS: 00010086
> <4>[ 2086.596369] RAX: ffff892f4fc35300 RBX: ffff890fdca3ff78 RCX:
> ffff890fdca3ff78
> <4>[ 2086.596370] RDX: ffff890fdca3ff78 RSI: fffff393c1728fc0 RDI:
> ffff892fd3a9b300
> <4>[ 2086.596370] RBP: ffff96fa4d1c9030 R08: 0000000000000001 R09:
> ffffffffb66500a7
> <4>[ 2086.596371] R10: ffff890fdca3ff78 R11: 0000000000000001 R12:
> fffff393c1728fc0
> <4>[ 2086.596371] R13: fffff393c1728fc0 R14: ffff892fd3a9b300 R15:
> 0000000000000000
> <4>[ 2086.596372] FS:  0000000000000000(0000)
> GS:ffff892f4fc00000(0000) knlGS:0000000000000000
> <4>[ 2086.596373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[ 2086.596373] CR2: ffff96fa4d1c8f88 CR3: 00000010c8210000 CR4:
> 0000000000350ee0
> <4>[ 2086.596373] Call Trace:
> <4>[ 2086.596374]  <IRQ>
> <4>[ 2086.596374]  ? kmem_cache_free+0x3d2/0x420
> <4>[ 2086.596374]  ? mempool_free_slab+0x17/0x20
> <4>[ 2086.596375]  ? mempool_free_slab+0x17/0x20
> <4>[ 2086.596375]  ? mempool_free+0x2f/0x80
> <4>[ 2086.596376]  ? md_end_io+0x47/0x60
> <4>[ 2086.596376]  ? bio_endio+0xee/0x140
> <4>[ 2086.596376]  ? bio_chain_endio+0x2d/0x40
> <4>[ 2086.596377]  ? md_end_io+0x59/0x60
> <4>[ 2086.596377]  ? bio_endio+0xee/0x140
> <4>[ 2086.596378]  ? bio_chain_endio+0x2d/0x40
> ...
> <4>[ 2086.596485] Lost 340 message(s)!
>
> Dumping last: `watch -n0.1 ...`:
>
> stat 451 0 13738 76 67823521 0 17162428974 801228700 351 1886784
> 801228776 0 0 0 0 0 0
> inflight 0 354
>
> md_io/aliases 4
> md_io/align 8
> md_io/cache_dma 0
> md_io/cpu_partial 30
> md_io/cpu_slabs 1235 N0=288 N1=239 N2=463 N3=245
> md_io/destroy_by_rcu 0
> md_io/hwcache_align 0
> md_io/min_partial 5
> md_io/objects 126582 N0=26928 N1=28254 N2=44064 N3=27336
> md_io/object_size 40
> md_io/objects_partial 0
> md_io/objs_per_slab 102
> md_io/order 0
> md_io/partial 8 N2=4 N3=4
> md_io/poison 0
> md_io/reclaim_account 0
> md_io/red_zone 0
> md_io/remote_node_defrag_ratio 100
> md_io/sanity_checks 0
> md_io/slabs 1249 N0=264 N1=277 N2=436 N3=272
> md_io/slabs_cpu_partial 1172(1172) C0=14(14) C1=17(17) C2=17(17)
> C3=17(17) C4=18(18) C5=17(17) C6=18(18) C7=17(17) C8=21(21) C9=4(4)
> C10=21(21) C11=22(22) C12=21(21) C13=25(25) C14=15(15) C15
> =19(19) C16=24(24) C17=19(19) C18=22(22) C19=15(15) C20=28(28)
> C21=18(18) C22=18(18) C23=24(24) C24=17(17) C25=27(27) C26=8(8)
> C27=18(18) C28=17(17) C29=18(18) C30=21(21) C31=21(21) C32=9(9)
> C33=9(9) C34=24(24) C35=11(11) C36=19(19) C37=14(14) C38=18(18)
> C39=4(4) C40=23(23) C41=20(20) C42=18(18) C43=22(22) C44=21(21)
> C45=24(24) C46=20(20) C47=15(15) C48=17(17) C49=15(15) C50=16(1
> 6) C51=26(26) C52=21(21) C53=19(19) C54=16(16) C55=18(18) C56=26(26)
> C57=14(14) C58=18(18) C59=23(23) C60=18(18) C61=20(20) C62=19(19)
> C63=17(17)
> md_io/slab_size 40
> md_io/store_user 0
> md_io/total_objects 127398 N0=26928 N1=28254 N2=44472 N3=27744
> md_io/trace 0
> md_io/usersize 0
>
> > I am not able to reproduce the issue [...]
>
> It crashes on two different machines. Next week I'm going to upgrade a
> distro on an older machine (with Intel NVMe disks, different
> motherboard and Xeon instead of EPYC 2 CPU) running currently
> linux-5.4 without this issue. So I will let you know if switching to a
> newer kernel with "improved io stats accounting" makes it unstable or
> not.
>
> Best regards,
> Pawel Wiejacha.
>
> On Thu, 15 Apr 2021 at 08:35, Song Liu <song@xxxxxxxxxx> wrote:
> >
> > On Wed, Apr 14, 2021 at 5:36 PM Song Liu <song@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Apr 13, 2021 at 5:05 AM Paweł Wiejacha
> > > <pawel.wiejacha@xxxxxxxxxxxx> wrote:
> > > >
> > > > Hello Song,
> > > >
> > > > That code does not compile, but I guess that what you meant was
> > > > something like this:
> > >
> > > Yeah.. I am really sorry for the noise.
> > >
> > > >
> > > > diff --git drivers/md/md.c drivers/md/md.c
> > > > index 04384452a..cbc97a96b 100644
> > > > --- drivers/md/md.c
> > > > +++ drivers/md/md.c
> > > > @@ -78,6 +78,7 @@ static DEFINE_SPINLOCK(pers_lock);
> > > >
> > > >  static struct kobj_type md_ktype;
> > > >
> > > > +struct kmem_cache *md_io_cache;
> > > >  struct md_cluster_operations *md_cluster_ops;
> > > >  EXPORT_SYMBOL(md_cluster_ops);
> > > >  static struct module *md_cluster_mod;
> > > > @@ -5701,8 +5702,8 @@ static int md_alloc(dev_t dev, char *name)
> > > >          */
> > > >         mddev->hold_active = UNTIL_STOP;
> > > >
> > > > -   error = mempool_init_kmalloc_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
> > > > -                     sizeof(struct md_io));
> > > > +   error = mempool_init_slab_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
> > > > +                     md_io_cache);
> > > >     if (error)
> > > >         goto abort;
> > > >
> > > > @@ -9542,6 +9543,10 @@ static int __init md_init(void)
> > > >  {
> > > >     int ret = -ENOMEM;
> > > >
> > > > +   md_io_cache = KMEM_CACHE(md_io, 0);
> > > > +   if (!md_io_cache)
> > > > +       goto err_md_io_cache;
> > > > +
> > > >     md_wq = alloc_workqueue("md", WQ_MEM_RECLAIM, 0);
> > > >     if (!md_wq)
> > > >         goto err_wq;
> > > > @@ -9578,6 +9583,8 @@ static int __init md_init(void)
> > > >  err_misc_wq:
> > > >     destroy_workqueue(md_wq);
> > > >  err_wq:
> > > > +   kmem_cache_destroy(md_io_cache);
> > > > +err_md_io_cache:
> > > >     return ret;
> > > >  }
> > > >
> > > > @@ -9863,6 +9870,7 @@ static __exit void md_exit(void)
> > > >     destroy_workqueue(md_rdev_misc_wq);
> > > >     destroy_workqueue(md_misc_wq);
> > > >     destroy_workqueue(md_wq);
> > > > +   kmem_cache_destroy(md_io_cache);
> > > >  }
> > > >
> > > >  subsys_initcall(md_init);
> > >
> > > [...]
> > >
> > > >
> > > > $ watch -n0.2 'cat /proc/meminfo | paste - - | tee -a ~/meminfo'
> > > > MemTotal:       528235648 kB    MemFree:        20002732 kB
> > > > MemAvailable:   483890268 kB    Buffers:            7356 kB
> > > > Cached:         495416180 kB    SwapCached:            0 kB
> > > > Active:         96396800 kB     Inactive:       399891308 kB
> > > > Active(anon):      10976 kB     Inactive(anon):   890908 kB
> > > > Active(file):   96385824 kB     Inactive(file): 399000400 kB
> > > > Unevictable:       78768 kB     Mlocked:           78768 kB
> > > > SwapTotal:             0 kB     SwapFree:              0 kB
> > > > Dirty:          88422072 kB     Writeback:        948756 kB
> > > > AnonPages:        945772 kB     Mapped:            57300 kB
> > > > Shmem:             26300 kB     KReclaimable:    7248160 kB
> > > > Slab:            7962748 kB     SReclaimable:    7248160 kB
> > > > SUnreclaim:       714588 kB     KernelStack:       18288 kB
> > > > PageTables:        10796 kB     NFS_Unstable:          0 kB
> > > > Bounce:                0 kB     WritebackTmp:          0 kB
> > > > CommitLimit:    264117824 kB    Committed_AS:   21816824 kB
> > > > VmallocTotal:   34359738367 kB  VmallocUsed:      561588 kB
> > > > VmallocChunk:          0 kB     Percpu:            65792 kB
> > > > HardwareCorrupted:     0 kB     AnonHugePages:         0 kB
> > > > ShmemHugePages:        0 kB     ShmemPmdMapped:        0 kB
> > > > FileHugePages:         0 kB     FilePmdMapped:         0 kB
> > > > HugePages_Total:       0        HugePages_Free:        0
> > > > HugePages_Rsvd:        0        HugePages_Surp:        0
> > > > Hugepagesize:       2048 kB     Hugetlb:               0 kB
> > > > DirectMap4k:      541000 kB     DirectMap2M:    11907072 kB
> > > > DirectMap1G:    525336576 kB
> > > >
> > >
> > > And thanks for these information.
> > >
> > > I have set up a system to run the test, the code I am using is the top of the
> > > md-next branch. I will update later tonight on the status.
> >
> > I am not able to reproduce the issue after 6 hours. Maybe it is because I run
> > tests on 3 partitions of the same nvme SSD. I will try on a different host with
> > multiple SSDs.
> >
> > Pawel, have you tried to repro with md-next branch?
> >
> > Thanks,
> > Song