On Thu, Jul 11, 2013 at 4:38 PM, Mandell Degerness <mandell@xxxxxxxxxxxxxxx> wrote: > I'm not certain what the correct behavior should be in this case, so > maybe it is not a bug, but here is what is happening: > > When an OSD becomes full, a process fails and we unmount the rbd > attempt to remove the lock associated with the rbd for the process. > The unmount works fine, but removing the lock is failing right now > because the list_lockers() function call never returns. > > Here is a code snippet I tried with a fake rbd lock on a test cluster: > > import rbd > import rados > with rados.Rados(conffile='/etc/ceph/ceph.conf') as cluster: > with cluster.open_ioctx('rbd') as ioctx: > with rbd.Image(ioctx, 'msd1') as image: > image.list_lockers() > > The process never returns, even after the ceph cluster is returned to > healthy. The only indication of the error is an error in the > /var/log/messages file: > > Jul 11 23:25:05 node-172-16-0-13 python: 2013-07-11 23:25:05.826793 > 7ffc66d72700 0 client.6911.objecter FULL, paused modify > 0x7ffc687c6050 tid 2 > > Any help would be greatly appreciated. > > ceph version: > > ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404) Interesting. Updating the lock state requires write access to the object, which is why it blocks when the cluster gets full — removing that would be a lot of work for very little gain. However, the request should get woken up once the cluster is no longer full! Here's a ticket: http://tracker.ceph.com/issues/5615 Josh or Yehuda, do you have any thoughts on obvious causes before we dig into librados/objecter code? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com