Re: occasional failure to unmap rbd

Ilya Dryomov <idryomov@xxxxxxxxx> · Fri, 25 Sep 2015 22:30:47 +0300

On Fri, Sep 25, 2015 at 7:41 PM, Jeff Epstein
<jeff.epstein@xxxxxxxxxxxxxxxx> wrote:
> On 09/25/2015 12:38 PM, Ilya Dryomov wrote:
>>
>> On Fri, Sep 25, 2015 at 7:17 PM, Jeff Epstein
>> <jeff.epstein@xxxxxxxxxxxxxxxx> wrote:
>>>
>>> We occasionally have a situation where we are unable to unmap an rbd.
>>> This
>>> occurs intermittently, with no obvious cause. For the most part, rbds can
>>> be
>>> unmapped fine, but sometimes we get this:
>>>
>>> # rbd unmap /dev/rbd450
>>> rbd: sysfs write failed
>>> rbd: unmap failed: (16) Device or resource busy
>>
>> Does it persist, i.e. can you unmap a few seconds after this?
>
> It seems to persist. We've been struggling with this for a few days.
>>>
>>> Is there any way to determine what exactly is blocking the unmap? Is
>>> there a
>>> way to force unmap?
>>
>> No, there is no way to force unmap.  The most likely reason for -EBUSY
>> is a positive open_count, meaning something has that device opened at
>> the time you do unmap.  I guess we could start outputting open_count to
>> dmesg in these cases, just to be sure.
>
>
> Is there any way to query the open_count? Or to forcibly reset it if it
> becomes inaccurate?

No, you can't reset it.  An inaccurate open_count is a bug.

There is no way to query it, as of now at least.  Like I said, we could
start dumping it to the kernel log, but it wouldn't say much - it's
pretty much always going to be 1.  It's just an int, it doesn't tell
you who to point finger at.

You said "The given rbd has an associated jbd2 process, but no
kworker."  Can you elaborate on that?  That could be the source of the
problem.

What's the output of "fuser -amv /dev/rbd450"?

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com