Re: Possible bug with image.list_lockers()

Gregory Farnum <greg@xxxxxxxxxxx> · Fri, 12 Jul 2013 15:34:10 -0700



On Thu, Jul 11, 2013 at 4:38 PM, Mandell Degerness
<mandell@xxxxxxxxxxxxxxx> wrote:
> I'm not certain what the correct behavior should be in this case, so
> maybe it is not a bug, but here is what is happening:
>
> When an OSD becomes full, a process fails and we unmount the rbd
> attempt to remove the lock associated with the rbd for the process.
> The unmount works fine, but removing the lock is failing right now
> because the list_lockers() function call never returns.
>
> Here is a code snippet I tried with a fake rbd lock on a test cluster:
>
> import rbd
> import rados
> with rados.Rados(conffile='/etc/ceph/ceph.conf') as cluster:
>   with cluster.open_ioctx('rbd') as ioctx:
>     with rbd.Image(ioctx, 'msd1') as image:
>       image.list_lockers()
>
> The process never returns, even after the ceph cluster is returned to
> healthy.  The only indication of the error is an error in the
> /var/log/messages file:
>
> Jul 11 23:25:05 node-172-16-0-13 python: 2013-07-11 23:25:05.826793
> 7ffc66d72700  0 client.6911.objecter  FULL, paused modify
> 0x7ffc687c6050 tid 2
>
> Any help would be greatly appreciated.
>
> ceph version:
>
> ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)

Interesting. Updating the lock state requires write access to the
object, which is why it blocks when the cluster gets full — removing
that would be a lot of work for very little gain. However, the request
should get woken up once the cluster is no longer full! Here's a
ticket: http://tracker.ceph.com/issues/5615
Josh or Yehuda, do you have any thoughts on obvious causes before we
dig into librados/objecter code?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com