On Fri, Apr 16, 2021 at 2:10 AM Mykola Golub <to.my.trociny@xxxxxxxxx> wrote: > > On Thu, Apr 15, 2021 at 02:42:17PM -0700, Neha Ojha wrote: > > > > We were seeing at least two types of such "messy" transactions: > > > > > > 1) On rados writing a new objects, one of the first transaction > > > operations is OP_TOUCH. It creates the object file, tries to split the > > > directory, aborts and skips creating the object spill_out attribute > > > due to this. > > > > > > 2) On rados deleting an object, one of the transactions operations is > > > OP_COLL_MOVE_RENAME, wich creates a temporary link, which triggers the > > > directory split and the error, the op is aborted in the middle leaving > > > the original object file not removed. > > > > > > So it looks like a bug and could be improved, but the question is if > > > the upsteam is still interested in improving the filestore in this > > > area? Should I report it to the tracker? > > > > Please feel free to create a tracker for it. Though we are not > > actively developing against filestore, if the fix for this issue isn't > > too invasive, I don't see any issues in merging it. > > Thank you, Neha! > > Here it is: https://tracker.ceph.com/issues/50395 > > We would like to work on the fix and I am thinking how to approach it. > > I think it is not correct that the result of the directory splitting > (partially related to operation) aborts a transaction operation in the > middle with the error code from directory split. And then the > transaction uses this error code to make a decision how to proceed > with the entire transaction (expecting that the error code came from > the operation itself). > > Options that I see: > > 1) Consider the failed directory split not critical to abort the > operation, i.e. ignore the error and proceed. > > 2) Consider it critical and just abort the osd. I think it makes sense to handle it like the way an EIO (with filestore_fail_eio enabled) is handled in filestore and abort. Neha Neha > > 3) replace the error code returned by directory split with some > special error code, so the upper layer (FileStore::_do_transaction) > could make a propriate decision (abort the transaction?). > > 4) Combination of (1) and (2): depending on the returned code condider > it either non-critical and ignore or critical and abort the osd. > > 5) Either (1) or (2) depending on a config option. > > I would vote for (1) or (4) but I would love to learn what other > people think about the issue and how it could be approached. > > Also, it would be nice to know why it was considered that ENODATA > could be ignored in the transaction. I suppose it is because it was > expected that ENODATA could be returned by getxattr only, when the > attribute does not exist, and it was not expected to affect the > transaction result? > > -- > Mykola Golub > _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx