UBIFS question: Power-cuts after ubifs_leb_unmap()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am Montag, 30. Juli 2018, 08:55:24 CEST schrieb Adrian Hunter:
> On 26/07/18 01:29, Richard Weinberger wrote:
> > Artem, Adrian,
> > 
> > Am Dienstag, 24. Juli 2018, 10:01:30 CEST schrieb Artem Bityutskiy:
> >> Hi, I looked at this quickly and talked to Adrian. I cannot solve this
> >> for you but here are some thoughts.
> > 
> > Both of your thoughts are much appreciated!
> > 
> >> On Mon, 2018-07-09 at 12:11 +0200, Richard Weinberger wrote:
> >>> Artem, Adrian,
> >>>
> >>> While playing with a new UBI/UBIFS test framework I managed to hit
> >>> this error, 
> >>> with lprops self-checks enabled:
> >>>
> >>> [ 2412.268964] UBIFS error (ubi0:0 pid 708): scan_check_cb: bad
> >>> accounting of 
> >>> LEB 11: free 0, dirty 118072 flags 0x1, should be free 126976, dirty
> >>> 0
> >>
> >> So it has 126976 - 118072 = 8904 of used space.
> >>
> > 
> > [...]
> > 
> >> 1. If you believe some specific place like the one you pointed contains
> >> a bug, then emulate a power cut in that place and try to reproduce the
> >> bug.
> >>
> >> 2. Try to focus on recovery replay, this is the main suspect. Try to
> >> figure out what is that used space in that LEB?
> > 
> > After spending the third night in a row staring at hexdumps, I think I know
> > what is going on.
> > 
> > I see that LEB 11 recently got unmapped, but the LPT still accounts 8904 bytes
> > as used.
> > To double check that value I've analyzed the index tree. To my surprise it was 
> > referencing nodes at LEB 11!
> > Summing up the length of these nodes is exactly 8904. So, the LPT is correct.
> > 
> > All nodes on LEB 11 that are referenced by the index tree are of type 
> > UBIFS_INO_NODE and have UBIFS_XATTR_FL set.
> > So, our problem is xattr related. We forget xattr inodes and the free space 
> > accounting goes nuts sooner or later.
> > 
> > Inspecting the journal gave more details. Only buds of head type BASEHD and 
> > DATAHD are present. No GCHD. Therefore GC was not interrupted.
> > With the IO transaction log from before the power-cut I was able to figure 
> > where the orphaned xattr inodes belong to.
> > And indeed, the host inode of said xattr is in the journal for deletion.
> > 
> > In theory during replay, removal of the host inode should also remove all
> > xattr inodes and we are good.
> > So the question is, why fails UBIFS to remove the xattr inodes even if we have 
> > the host inode in the journal?
> > 
> > Another round of digging though the index uncovered the problem. The 
> > UBIFS_XENT_NODE nodes for all orphaned xattr inodes sat on LEB 11.
> > This means upon replay, when UBIFS tries to locate all xattr entries
> > of the host inode, it faces the dangling branch case and the loop in 
> > ubifs_tnc_remove_ino() terminates with -ENOENT. The host inode will be 
> > removed, but the xattr inodes stay.
> > 
> > Therefore the root of the problem is that UBIFS places only the host inode 
> > into the journal upon removal of a file with xattrs and assumes that upon 
> > replay we can discover all xattr inodes. That will fail if due to GC the LEB 
> > got already unmapped.
> > 
> > As intermediate solution I suggest adding also xattr inodes to the journal
> > upon unlink. This will make unlink more expensive but for now it is IMHO ok
> > until a more sophisticated solution is found.
> > Files usually have only a few xattr, so the cost is not that bad.
> > 
> > What do you guys think? Does my analysis make sense to you?
> 
> It doesn't seem like there is a way to avoid adding more information to the
> journal upon deletion, but you probably need to cater for the possibility
> that there is a power cut while writing that information e.g. it wouldn't be
> good to remove half the xattrs but leave the rest

With grouped nodes a power-cut is okay, the bigger problem is the number of xattrs.
We support up to 2^16. Do delete an inode via journal we need UBIFS_INO_NODE_SZ bytes
in the journal.
So, in worst case we need to write 2^16 times UBIFS_INO_NODE_SZ bytes to the journal.
I'm still looking into this option right now, just to be sure that this is really the
solution to the problem I see. Finding a stable reproducer is also not easy...

Another possibility is playing with ubifs_tnc_remove_ino(), upon unlink() it removes
all entries from the TNC and marks them as dirty.
The problem is that also all xattr inodes are marked as dirty.
If we manage to find a way that these get marked only as dirty when we are sure that
they are gone the GC will no longer unmap a "used" LEB.
But that all is very tricky.

Thanks,
//richard



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux