On Wed, Nov 7, 2018 at 7:22 PM, Rafael David Tinoco <rafael.tinoco@xxxxxxxxxx> wrote: > During our functional tests, we have ran into the following issue for i686: > > [ 19.301067] BUG: unable to handle kernel NULL pointer dereference at 0000028a > > which might have triggered false positive for: > > [ 48.544348] BUG: sleeping function called from invalid context at > /srv/oe/build/tmp-rpb-glibc/work-shared/intel-core2-32/kernel-source/include/linux/percpu-rwsem.h:34 > > For latest 4.20.0-rc1-next-20181107, when running fsync01 LTP test in > this kernel. > > ---- > > [ 19.120300] BUG: unable to handle kernel NULL pointer dereference at 0000028a > [ 19.127433] *pde = 00000000 > [ 19.130311] Oops: 0000 [#1] SMP > [ 19.133449] CPU: 2 PID: 2557 Comm: ata_id Not tainted > 4.20.0-rc1-next-20181107 #1 > [ 19.140920] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS > 2.0b 07/27/2017 > [ 19.148391] EIP: locks_remove_flock+0x60/0xb0 > [ 19.152749] Code: 10 e8 b4 d3 ff ff 8b 43 14 83 4d 80 40 8b 70 58 > 85 f6 74 46 8d 8d 58 ff ff ff ba 07 00 00 00 89 d8 ff d6 8b 45 d8 85 > c0 74 0f <8b> 50 04 85 d2 74 08 8d 85 58 ff ff ff ff d2 8b 45 f0 65 33 > 05 14 > [ 19.171484] EAX: 00000286 EBX: f501c140 ECX: 00000000 EDX: ffffffff > [ 19.177742] ESI: 00000000 EDI: f53978c8 EBP: f37dff14 ESP: f37dfe6c > [ 19.183999] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010202 > [ 19.190776] CR0: 80050033 CR2: 0000028a CR3: 343f9000 CR4: 003406d0 > [ 19.197035] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > [ 19.203299] DR6: fffe0ff0 DR7: 00000400 > [ 19.207152] Call Trace: > [ 19.209604] ? lockdep_hardirqs_on+0xe4/0x190 > [ 19.213961] ? ktime_get_coarse_real_ts64+0x4d/0xd0 > [ 19.218838] ? trace_hardirqs_on+0x43/0xe0 > [ 19.222930] ? __audit_syscall_entry+0xb2/0xf0 > [ 19.227375] locks_remove_file+0x3e/0x1e0 > [ 19.231380] __fput+0xc2/0x1f0 > [ 19.234440] ____fput+0xd/0x10 > [ 19.237499] task_work_run+0x7f/0xb0 > [ 19.241078] exit_to_usermode_loop+0x9d/0xa0 > [ 19.245349] do_fast_syscall_32+0x257/0x2a0 > [ 19.249527] entry_SYSENTER_32+0x70/0xc8 > [ 19.253444] EIP: 0xb7ee9991 > [ 19.256234] Code: 8b 98 60 cd ff ff 85 d2 89 c8 74 02 89 0a 5b 5d > c3 8b 04 24 c3 8b 14 24 c3 8b 1c 24 c3 8b 3c 24 c3 51 52 55 89 e5 0f > 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 > 8d 76 > [ 19.274971] EAX: 00000000 EBX: 00000004 ECX: 00000043 EDX: b7e9a894 > [ 19.281228] ESI: b7eb8000 EDI: 00000016 EBP: b7ccc6b0 ESP: bf9c7830 > [ 19.287485] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246 > [ 19.294263] Modules linked in: fuse > [ 19.297754] CR2: 000000000000028a > [ 19.301066] ---[ end trace 30875d5239ac018c ]--- > > And, with EIP: locks_remove_flock+0x60/0xb0, we have: > > c12777c0 <locks_remove_flock>: > c12777c0: e8 eb 26 dd ff call c1049eb0 <__fentry__> > c12777c5: 55 push %ebp > c12777c6: 83 c2 20 add $0x20,%edx > c12777c9: 89 e5 mov %esp,%ebp > c12777cb: 57 push %edi > c12777cc: 56 push %esi > c12777cd: 53 push %ebx > c12777ce: 89 c3 mov %eax,%ebx > c12777d0: 81 ec 9c 00 00 00 sub $0x9c,%esp > c12777d6: 65 a1 14 00 00 00 mov %gs:0x14,%eax > c12777dc: 89 45 f0 mov %eax,-0x10(%ebp) > c12777df: 31 c0 xor %eax,%eax > c12777e1: 8b 02 mov (%edx),%eax > c12777e3: 39 c2 cmp %eax,%edx > c12777e5: 74 48 je c127782f > <locks_remove_flock+0x6f> > c12777e7: 8d 8d 58 ff ff ff lea -0xa8(%ebp),%ecx > c12777ed: ba 08 00 00 00 mov $0x8,%edx > c12777f2: 89 d8 mov %ebx,%eax > c12777f4: 8b 7b 10 mov 0x10(%ebx),%edi > c12777f7: e8 b4 d3 ff ff call c1274bb0 <flock_make_lock> > c12777fc: 8b 43 14 mov 0x14(%ebx),%eax > -------------- EIP > c12777ff: 83 4d 80 40 orl $0x40,-0x80(%ebp) > c1277803: 8b 70 58 mov 0x58(%eax),%esi > c1277806: 85 f6 test %esi,%esi > > ---- > > Pointing to: > > locks_remove_flock(struct file *filp, struct file_lock_context *flctx) > { > struct file_lock fl; > ... > flock_make_lock(filp, LOCK_UN, &fl); > ... > } > > Since flock_make_lock() calls locks_alloc_lock() and kmem_cache_zalloc()... > > shouldn't "fl" be declared as *fl pointer ? NM for this one, just saw flock_make_lock() can return a ptr to struct file_lock *, after creating it from slab, or just populate a stack variable, like it is doing here. For: ... flock_make_lock(filp, LOCK_UN, &fl); fl.fl_flags |= FL_CLOSE; ... I wonder if, for x86, we are just missing an initialization: memset(&fl, 0, sizeof(struct file_lock)); in the beginning of locks_remove_flock(). If not, then could filp->f_op->flock() (NFSv4 for this test) be playing tricks with a "fl" coming from the stack ? > ---- > > Link: https://bugs.linaro.org/show_bug.cgi?id=4056 > Full dmesg: https://lkft.validation.linaro.org/scheduler/job/497741#L964