On 15/08/2019 03.40, Jeff Layton wrote:
On Wed, 2019-08-14 at 19:29 +0200, Ilya Dryomov wrote:
Jeff, the oops seems to be a NULL dereference in ceph_lock_message().
Please take a look.
(sorry for duplicate mail -- the other one ended up in moderation)
Thanks Ilya,
That function is pretty straightforward. We don't do a whole lot of
pointer chasing in there, so I'm a little unclear on where this would
have crashed. Right offhand, that kernel is probably missing
1b52931ca9b5b87 (ceph: remove duplicated filelock ref increase), but
that seems unlikely to result in an oops.
Hector, if you have the debuginfo for this kernel installed on one of
these machines, could you run gdb against the ceph.ko module and then
do:
gdb> list *(ceph_lock_message+0x212)
That may give me a better hint as to what went wrong.
This is what I get:
(gdb) list *(ceph_lock_message+0x212)
0xd782 is in ceph_lock_message
(/build/linux-hwe-B83fOS/linux-hwe-4.18.0/fs/ceph/locks.c:116).
111 req->r_wait_for_completion =
ceph_lock_wait_for_completion;
112
113 err = ceph_mdsc_do_request(mdsc, inode, req);
114
115 if (operation == CEPH_MDS_OP_GETFILELOCK) {
116 fl->fl_pid =
-le64_to_cpu(req->r_reply_info.filelock_reply->pid);
117 if (CEPH_LOCK_SHARED ==
req->r_reply_info.filelock_reply->type)
118 fl->fl_type = F_RDLCK;
119 else if (CEPH_LOCK_EXCL ==
req->r_reply_info.filelock_reply->type)
120 fl->fl_type = F_WRLCK;
Disasm:
0x000000000000d77b <+523>: mov 0x250(%rbx),%rdx
0x000000000000d782 <+530>: mov 0x20(%rdx),%rdx
0x000000000000d786 <+534>: neg %edx
0x000000000000d788 <+536>: mov %edx,0x48(%r15)
That means req->r_reply_info.filelock_reply was NULL.
--
Hector Martin (hector@xxxxxxxxxxxxxx)
Public Key: https://mrcn.st/pub
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com