On Thu, 2018-03-01 at 14:11 -0500, Jeff Layton wrote: > nfs-ganesha can store its client recovery database in a RADOS object. To > do this it uses librados with a write_op to store the value in the omap. > > This generally works, but I've been playing with containerizing ganesha > and there I've noticed that occasionally the string that is stored in > the value field of the omap is truncated. > > On my test rig, the value should be 63 bytes, but ends up only being 29. > Looking at the wire traffic, it's clear that the omap set operation is > not done until the daemon is being shut down, well after the > rados_write_op_operate call was issued. > > What I think is happening is that we generally end up with exclusive > caps on this object, and the client just caches the write operation. > When we go to shut down, we don't shut down the connection properly > (currently) and that causes it to miss writing out the object. > > I think I can probably fix this by shutting things down cleanly in the > clean shutdown case, but in this case, we really do need to > synchronously write out the omap key to the database even if we hold > exclusive caps on on the object. We're storing info to be used after a > major outage, and we need this data to be stored properly before we can > issue state based on it. > > So with all of that... how do I force the librados client to not buffer > things when rados_write_op_operate is called? I had an initial hope that > LIBRADOS_OPERATION_IGNORE_CACHE might do the right thing, but that seems > to have more to do with internal OSD operation. > Oof, I completely misunderstood this problem! I think it's still a relevant question (as I'm not certain anything guarantees this), but the problem actually seems to be that that client is sending a truncated omap value update just before it dies in some cases. I've opened a tracker bug here: http://tracker.ceph.com/issues/23194 -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html