Re: filestore: ENODATA error after directory split confuses transaction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 16, 2021 at 2:10 AM Mykola Golub <to.my.trociny@xxxxxxxxx> wrote:
>
> On Thu, Apr 15, 2021 at 02:42:17PM -0700, Neha Ojha wrote:
>
> > > We were seeing at least two types of such "messy" transactions:
> > >
> > > 1) On rados writing a new objects, one of the first transaction
> > > operations is OP_TOUCH. It creates the object file, tries to split the
> > > directory, aborts and skips creating the object spill_out attribute
> > > due to this.
> > >
> > > 2) On rados deleting an object, one of the transactions operations is
> > > OP_COLL_MOVE_RENAME, wich creates a temporary link, which triggers the
> > > directory split and the error, the op is aborted in the middle leaving
> > > the original object file not removed.
> > >
> > > So it looks like a bug and could be improved, but the question is if
> > > the upsteam is still interested in improving the filestore in this
> > > area? Should I report it to the tracker?
> >
> > Please feel free to create a tracker for it. Though we are not
> > actively developing against filestore, if the fix for this issue isn't
> > too invasive, I don't see any issues in merging it.
>
> Thank you, Neha!
>
> Here it is: https://tracker.ceph.com/issues/50395
>
> We would like to work on the fix and I am thinking how to approach it.
>
> I think it is not correct that the result of the directory splitting
> (partially related to operation) aborts a transaction operation in the
> middle with the error code from directory split. And then the
> transaction uses this error code to make a decision how to proceed
> with the entire transaction (expecting that the error code came from
> the operation itself).
>
> Options that I see:
>
> 1) Consider the failed directory split not critical to abort the
> operation, i.e. ignore the error and proceed.
>
> 2) Consider it critical and just abort the osd.

I think it makes sense to handle it like the way an EIO (with
filestore_fail_eio enabled) is handled in filestore and abort.

Neha


Neha
>
> 3) replace the error code returned by directory split with some
>    special error code, so the upper layer (FileStore::_do_transaction)
>    could make a propriate decision (abort the transaction?).
>
> 4) Combination of (1) and (2): depending on the returned code condider
>    it either non-critical and ignore or critical and abort the osd.
>
> 5) Either (1) or (2) depending on a config option.
>
> I would vote for (1) or (4) but I would love to learn what other
> people think about the issue and how it could be approached.
>
> Also, it would be nice to know why it was considered that ENODATA
> could be ignored in the transaction. I suppose it is because it was
> expected that ENODATA could be returned by getxattr only, when the
> attribute does not exist, and it was not expected to affect the
> transaction result?
>
> --
> Mykola Golub
>
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux