Re: deadlocks in rbd unmap and map

Ilya Dryomov <idryomov@xxxxxxxxx> · Wed, 20 Jul 2016 15:14:14 +0200

On Wed, Jul 20, 2016 at 1:23 AM, Patrick McLean <patrickm@xxxxxxxxxx> wrote:
> We got this on our rbd clients this morning, it is not actually a
> panic, but networking seems to have died on those boxes and since they
> are netbooted, they were dead.
>
> It looks like a crash in rbd_watch_cb:
>
> [68479.925931] Call Trace:
> [68479.928632]  [<ffffffff81140f2e>] ? wq_worker_sleeping+0xe/0x90
> [68479.934793]  [<ffffffff819738bc>] __schedule+0x50c/0xb90
> [68479.940349]  [<ffffffff81445db5>] ? put_io_context_active+0xa5/0xc0
> [68479.946866]  [<ffffffff81973f7c>] schedule+0x3c/0x90
> [68479.952077]  [<ffffffff81126554>] do_exit+0x7b4/0xc60
> [68479.957373]  [<ffffffff8109891c>] oops_end+0x9c/0xd0
> [68479.962579]  [<ffffffff81098d8b>] die+0x4b/0x70
> [68479.967357]  [<ffffffff81095d15>] do_general_protection+0xe5/0x1b0
> [68479.973780]  [<ffffffff8197b988>] general_protection+0x28/0x30
> [68479.979857]  [<ffffffffa01be161>] ? rbd_watch_cb+0x21/0x100 [rbd]
> [68479.986202]  [<ffffffff81173f8f>] ? up_read+0x1f/0x40
> [68479.991506]  [<ffffffffa00c3c09>] do_watch_notify+0x99/0x170 [libceph]
> [68479.998279]  [<ffffffff8114003a>] process_one_work+0x1da/0x660
> [68480.004352]  [<ffffffff8113ffac>] ? process_one_work+0x14c/0x660
> [68480.010601]  [<ffffffff8114050e>] worker_thread+0x4e/0x490
> [68480.016326]  [<ffffffff811404c0>] ? process_one_work+0x660/0x660
> [68480.022578]  [<ffffffff811404c0>] ? process_one_work+0x660/0x660
> [68480.028823]  [<ffffffff81146c51>] kthread+0x101/0x120
> [68480.034149]  [<ffffffff81979baf>] ret_from_fork+0x1f/0x40
> [68480.039796]  [<ffffffff81146b50>] ? kthread_create_on_node+0x250/0x250

The attached log starts with [68479.452369].  Do you have an earlier
chunk?  I'm specifically looking for whether the rbd_watch_cb() was the
first splat.

Any luck reproducing the hang with logging enabled?

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html