Hmm, hard to say off the top of my head. If you could enable "debug librbd = 20" logging on the buggy client that owns the lock, create a new snapshot, and attempt to delete it, it would be interesting to verify that the image is being properly refreshed. On Wed, Oct 25, 2017 at 9:23 AM, Piotr Dałek <piotr.dalek@xxxxxxxxxxxx> wrote: > On 17-10-25 02:39 PM, Jason Dillaman wrote: >> >> That log is showing that a snap remove request was made from a client >> that couldn't acquire the lock to a client that currently owns the >> lock. The client that currently owns the lock responded w/ an -ENOENT >> error that the snapshot doesn't exist. Depending on the maintenance >> operation requested, different errors codes are filtered out to handle >> the case where Ceph double (or more) delivers the request message to >> the lock owner. Normally this isn't an issue since the local client >> pre-checks the image state before sending the RPC message (i.e. snap >> remove will first locally ensure the snap exists and respond w/ >> -ENOENT if it doesn't). >> >> Therefore, in this case, the question is who is this rogue client that >> still owns the lock and is responding the a snap remove request but >> hasn't refreshed its state to know that the snapshot exists. > > > Thanks, that makes things clear. > > Seems like we have some Cinders utilizing Infernalis (9.2.1) librbd. Are you > aware of any bugs in 9.2.x that could cause such behavior? We've seen that > for the first time... > > -- > Piotr Dałek > piotr.dalek@xxxxxxxxxxxx > https://www.ovh.com/us/ > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com