On Wed, Nov 07 2018, Rafael David Tinoco wrote: > On Wed, Nov 7, 2018 at 7:22 PM, Rafael David Tinoco > <rafael.tinoco@xxxxxxxxxx> wrote: >> During our functional tests, we have ran into the following issue for i686: >> >> [ 19.301067] BUG: unable to handle kernel NULL pointer dereference at 0000028a >> >> which might have triggered false positive for: >> >> [ 48.544348] BUG: sleeping function called from invalid context at >> /srv/oe/build/tmp-rpb-glibc/work-shared/intel-core2-32/kernel-source/include/linux/percpu-rwsem.h:34 >> >> For latest 4.20.0-rc1-next-20181107, when running fsync01 LTP test in >> this kernel. >> >> ---- >> >> [ 19.120300] BUG: unable to handle kernel NULL pointer dereference at 0000028a >> [ 19.127433] *pde = 00000000 >> [ 19.130311] Oops: 0000 [#1] SMP >> [ 19.133449] CPU: 2 PID: 2557 Comm: ata_id Not tainted >> 4.20.0-rc1-next-20181107 #1 >> [ 19.140920] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS >> 2.0b 07/27/2017 >> [ 19.148391] EIP: locks_remove_flock+0x60/0xb0 >> [ 19.152749] Code: 10 e8 b4 d3 ff ff 8b 43 14 83 4d 80 40 8b 70 58 >> 85 f6 74 46 8d 8d 58 ff ff ff ba 07 00 00 00 89 d8 ff d6 8b 45 d8 85 >> c0 74 0f <8b> 50 04 85 d2 74 08 8d 85 58 ff ff ff ff d2 8b 45 f0 65 33 >> 05 14 >> [ 19.171484] EAX: 00000286 EBX: f501c140 ECX: 00000000 EDX: ffffffff >> [ 19.177742] ESI: 00000000 EDI: f53978c8 EBP: f37dff14 ESP: f37dfe6c >> [ 19.183999] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010202 >> [ 19.190776] CR0: 80050033 CR2: 0000028a CR3: 343f9000 CR4: 003406d0 >> [ 19.197035] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 >> [ 19.203299] DR6: fffe0ff0 DR7: 00000400 >> [ 19.207152] Call Trace: >> [ 19.209604] ? lockdep_hardirqs_on+0xe4/0x190 >> [ 19.213961] ? ktime_get_coarse_real_ts64+0x4d/0xd0 >> [ 19.218838] ? trace_hardirqs_on+0x43/0xe0 >> [ 19.222930] ? __audit_syscall_entry+0xb2/0xf0 >> [ 19.227375] locks_remove_file+0x3e/0x1e0 >> [ 19.231380] __fput+0xc2/0x1f0 >> [ 19.234440] ____fput+0xd/0x10 >> [ 19.237499] task_work_run+0x7f/0xb0 >> [ 19.241078] exit_to_usermode_loop+0x9d/0xa0 >> [ 19.245349] do_fast_syscall_32+0x257/0x2a0 >> [ 19.249527] entry_SYSENTER_32+0x70/0xc8 >> [ 19.253444] EIP: 0xb7ee9991 >> [ 19.256234] Code: 8b 98 60 cd ff ff 85 d2 89 c8 74 02 89 0a 5b 5d >> c3 8b 04 24 c3 8b 14 24 c3 8b 1c 24 c3 8b 3c 24 c3 51 52 55 89 e5 0f >> 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 >> 8d 76 >> [ 19.274971] EAX: 00000000 EBX: 00000004 ECX: 00000043 EDX: b7e9a894 >> [ 19.281228] ESI: b7eb8000 EDI: 00000016 EBP: b7ccc6b0 ESP: bf9c7830 >> [ 19.287485] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246 >> [ 19.294263] Modules linked in: fuse >> [ 19.297754] CR2: 000000000000028a >> [ 19.301066] ---[ end trace 30875d5239ac018c ]--- >> >> And, with EIP: locks_remove_flock+0x60/0xb0, we have: >> >> c12777c0 <locks_remove_flock>: >> c12777c0: e8 eb 26 dd ff call c1049eb0 <__fentry__> >> c12777c5: 55 push %ebp >> c12777c6: 83 c2 20 add $0x20,%edx >> c12777c9: 89 e5 mov %esp,%ebp >> c12777cb: 57 push %edi >> c12777cc: 56 push %esi >> c12777cd: 53 push %ebx >> c12777ce: 89 c3 mov %eax,%ebx >> c12777d0: 81 ec 9c 00 00 00 sub $0x9c,%esp >> c12777d6: 65 a1 14 00 00 00 mov %gs:0x14,%eax >> c12777dc: 89 45 f0 mov %eax,-0x10(%ebp) >> c12777df: 31 c0 xor %eax,%eax >> c12777e1: 8b 02 mov (%edx),%eax >> c12777e3: 39 c2 cmp %eax,%edx >> c12777e5: 74 48 je c127782f >> <locks_remove_flock+0x6f> >> c12777e7: 8d 8d 58 ff ff ff lea -0xa8(%ebp),%ecx >> c12777ed: ba 08 00 00 00 mov $0x8,%edx >> c12777f2: 89 d8 mov %ebx,%eax >> c12777f4: 8b 7b 10 mov 0x10(%ebx),%edi >> c12777f7: e8 b4 d3 ff ff call c1274bb0 <flock_make_lock> >> c12777fc: 8b 43 14 mov 0x14(%ebx),%eax >> -------------- EIP >> c12777ff: 83 4d 80 40 orl $0x40,-0x80(%ebp) >> c1277803: 8b 70 58 mov 0x58(%eax),%esi >> c1277806: 85 f6 test %esi,%esi >> >> ---- >> >> Pointing to: >> >> locks_remove_flock(struct file *filp, struct file_lock_context *flctx) >> { >> struct file_lock fl; >> ... >> flock_make_lock(filp, LOCK_UN, &fl); >> ... >> } >> >> Since flock_make_lock() calls locks_alloc_lock() and kmem_cache_zalloc()... >> >> shouldn't "fl" be declared as *fl pointer ? > > NM for this one, just saw flock_make_lock() can return a ptr to struct > file_lock *, after creating it from slab, or just populate a stack > variable, like it is doing here. > > For: > > ... > flock_make_lock(filp, LOCK_UN, &fl); > fl.fl_flags |= FL_CLOSE; > ... > > I wonder if, for x86, we are just missing an initialization: > > memset(&fl, 0, sizeof(struct file_lock)); > > in the beginning of locks_remove_flock(). Yes we are, though I think locks_init_lock() is the better initialization. I'm not sure how I missed that when writing the patch. Thanks for testing! NeilBrown > > If not, then could filp->f_op->flock() (NFSv4 for this test) be > playing tricks with a "fl" coming from the stack ? > >> ---- >> >> Link: https://bugs.linaro.org/show_bug.cgi?id=4056 >> Full dmesg: https://lkft.validation.linaro.org/scheduler/job/497741#L964
Attachment:
signature.asc
Description: PGP signature