Re: RBD Kernel panic rbd_dev_refresh

Alex Elder <elder@xxxxxxxx> · Thu, 12 Feb 2015 17:34:10 -0600

On 02/12/2015 10:19 AM, Ilya Dryomov wrote:
On Thu, Feb 12, 2015 at 4:24 PM, Hannes Landeholm <hannes@xxxxxxxxxxxxxx> wrote:
We don't have any debug symbols but here is a dump of the .ko at this address:

https://gist.github.com/hannes-landeholm/b4664e2e7e37ad13177c

It's likely this line (rbd_dev_refresh+0xcb)

3d5b:       4c 89 60 50             mov    %r12,0x50(%rax)

%rax here is null which causes the invalid write to address 0000000000000050.

I'm pretty sure it's the following line in rbd.c which is the offender
if you look at the context (below spinlock and shr, above call to
revalidate_disk).

set_capacity(rbd_dev->disk, size);

I concur with Hannes.  rbd_dev->disk.part0.nr_sectors is at offset
0x50 from the rbd_dev pointer.

I'll file a ticket and look into this.

Looking at the code, there is a race between checking the REMOVING
flag and the disk getting removed.  The cost of set_capacity is
low and could be done inside the spinlock, but that doesn't help
the revalidate_disk() call.  You probably need either to coordinate
with a semaphore or another rbd_dev->flags bit.

					-Alex

Thanks,

                 Ilya

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html