Re: [PATCH for v4.9] fs: don't scan the inode cache before SB_BORN is set

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 28, 2019 at 09:20:45PM +0800, Aaron Lu wrote:
> One of our servers recently hit a kernel crash and the callstack is:
> 
> [6469391.997662] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
> [6469392.005693] IP: [<ffffffff811cad80>] shmem_unused_huge_count+0x10/0x20
> [6469392.012412] PGD 1000c21067
> [6469392.015203] PUD ffc306067
> [6469392.018089] PMD 0
> [6469392.018627]
> [6469392.020303] Oops: 0000 [#1] SMP
> [6469392.023621] Modules linked in: kpatch_6iljwh9b(OE) memcg_force_swapin(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nfsd auth_rpcgss nfs_acl [last unloaded: memcg_force_swapin]
> [6469392.040177] CPU: 2 PID: 89058 Comm: ilogtail Tainted: G           OE K 4.9.93-010.ali3000.alios7.x86_64 #1
> [6469392.049996] Hardware name: Inventec     K900-1G                         /B900G2-1G       , BIOS A2.32 10/09/2014
> [6469392.060334] task: ffff8802217b1800 task.stack: ffffc9004ea88000
> [6469392.066418] RIP: 0010:[<ffffffff811cad80>]  [<ffffffff811cad80>] shmem_unused_huge_count+0x10/0x20
> [6469392.075563] RSP: 0018:ffffc9004ea8b6c0  EFLAGS: 00010282
> [6469392.081041] RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000001
> [6469392.088339] RDX: 0000000000000001 RSI: ffffc9004ea8b780 RDI: ffff881749bd2000
> [6469392.095635] RBP: ffffc9004ea8b6c0 R08: 28f5c28f5c28f5c3 R09: ffff88173bf3fce0
> [6469392.102934] R10: ffff88207ffd4000 R11: 0000000000000000 R12: ffff881749bd24c0
> [6469392.110233] R13: ffffc9004ea8b780 R14: 0000000000000000 R15: ffff88207ffd4000
> [6469392.117533] FS:  00007fe260420700(0000) GS:ffff88103fa80000(0000) knlGS:0000000000000000
> [6469392.125792] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [6469392.131703] CR2: 0000000000000070 CR3: 00000005bb46d000 CR4: 00000000001606f0
> [6469392.138999] Stack:
> [6469392.141185]  ffffc9004ea8b6f0 ffffffff81247bee 0000000000000020 0000000000000400
> [6469392.148811]  ffff881749bd24c0 0000000000000000 ffffc9004ea8b7d0 ffffffff811c431c
> [6469392.156436]  0000000000000020 0000000000000000 ffff88207b82c000 0000000000000001
> [6469392.164063] Call Trace:
> [6469392.166692]  [<ffffffff81247bee>] super_cache_count+0x3e/0xe0
> [6469392.172607]  [<ffffffff811c431c>] shrink_slab.part.38+0x11c/0x420
> [6469392.178875]  [<ffffffff811c4649>] shrink_slab+0x29/0x30
> [6469392.184273]  [<ffffffff811c93cf>] shrink_node+0xff/0x300
> [6469392.189756]  [<ffffffff811c96dd>] do_try_to_free_pages+0x10d/0x330
> [6469392.196104]  [<ffffffff811c9b65>] try_to_free_mem_cgroup_pages+0xd5/0x1b0
> [6469392.203063]  [<ffffffff81230b5d>] try_charge+0x14d/0x720
> [6469392.208551]  [<ffffffff8121b8e3>] ? kmem_cache_alloc+0xd3/0x1a0
> [6469392.214642]  [<ffffffff811b14e5>] ? mempool_alloc_slab+0x15/0x20
> [6469392.220825]  [<ffffffff81235b4e>] mem_cgroup_try_charge+0x6e/0x1b0
> [6469392.227177]  [<ffffffff811ae174>] __add_to_page_cache_locked+0x64/0x220
> [6469392.233961]  [<ffffffff811ae39e>] add_to_page_cache_lru+0x4e/0xe0
> [6469392.240242]  [<ffffffffa03ce2d1>] ext4_mpage_readpages+0x151/0x980 [ext4]
> [6469392.247211]  [<ffffffffa037edb5>] ext4_readpages+0x35/0x40 [ext4]
> [6469392.253474]  [<ffffffff811be9e7>] __do_page_cache_readahead+0x197/0x240
> [6469392.260260]  [<ffffffff811ae45c>] ? pagecache_get_page+0x2c/0x2a0
> [6469392.266523]  [<ffffffff811b0f4b>] filemap_fault+0x4db/0x590
> [6469392.272282]  [<ffffffffa0388fd6>] ext4_filemap_fault+0x36/0x50 [ext4]
> [6469392.278896]  [<ffffffff811e4a90>] __do_fault+0x80/0x170
> [6469392.284292]  [<ffffffff811e87b2>] do_fault+0x4c2/0x720
> [6469392.289603]  [<ffffffff8111513f>] ? futex_wait_queue_me+0x9f/0x120
> [6469392.295954]  [<ffffffff811e9162>] handle_mm_fault+0x512/0xc90
> [6469392.301874]  [<ffffffff8106eb8b>] __do_page_fault+0x24b/0x4d0
> [6469392.307796]  [<ffffffff811184c5>] ? SyS_futex+0x85/0x170
> [6469392.313280]  [<ffffffff8106ee40>] do_page_fault+0x30/0x80
> [6469392.318850]  [<ffffffff81003bf4>] ? do_syscall_64+0x74/0x180
> [6469392.324679]  [<ffffffff81722b68>] page_fault+0x28/0x30
> [6469392.329986] Code: 00 48 83 43 38 01 4c 89 e7 c6 <48> 8b 40 70 5d c3 66 2e 0f 1f 84
> [6469392.338183] RIP  [<ffffffff811cad80>] shmem_unused_huge_count+0x10/0x20
> [6469392.344990]  RSP <ffffc9004ea8b6c0>
> [6469392.348656] CR2: 0000000000000070
> 
> Google showed me Dave Chinner's fix and I think it is the right fix for
> our problem(not easy to reproduce in our production environment so I
> haven't been able to confirm).
> 
> Unfortunately, this commit is only back ported to v4.14 and v4.16 stable
> kernel, not v4.9 stable kernel, presumbly due to the rename of MS_BORN
> to SB_BORN starting from v4.14. To make this patch work on v4.9, I have
> done one minor change to Dave's commit: by keep using MS_BORN. I think
> this is correct, but since I know very little about fs code, please
> kindly review, thanks a lot for your time.

Backport looks good to me, thanks for the patch, I'll go queue it up
now.

greg k-h



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux