UBIFS question: Power-cuts after ubifs_leb_unmap()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 26/07/18 01:29, Richard Weinberger wrote:
> Artem, Adrian,
> 
> Am Dienstag, 24. Juli 2018, 10:01:30 CEST schrieb Artem Bityutskiy:
>> Hi, I looked at this quickly and talked to Adrian. I cannot solve this
>> for you but here are some thoughts.
> 
> Both of your thoughts are much appreciated!
> 
>> On Mon, 2018-07-09 at 12:11 +0200, Richard Weinberger wrote:
>>> Artem, Adrian,
>>>
>>> While playing with a new UBI/UBIFS test framework I managed to hit
>>> this error, 
>>> with lprops self-checks enabled:
>>>
>>> [ 2412.268964] UBIFS error (ubi0:0 pid 708): scan_check_cb: bad
>>> accounting of 
>>> LEB 11: free 0, dirty 118072 flags 0x1, should be free 126976, dirty
>>> 0
>>
>> So it has 126976 - 118072 = 8904 of used space.
>>
> 
> [...]
> 
>> 1. If you believe some specific place like the one you pointed contains
>> a bug, then emulate a power cut in that place and try to reproduce the
>> bug.
>>
>> 2. Try to focus on recovery replay, this is the main suspect. Try to
>> figure out what is that used space in that LEB?
> 
> After spending the third night in a row staring at hexdumps, I think I know
> what is going on.
> 
> I see that LEB 11 recently got unmapped, but the LPT still accounts 8904 bytes
> as used.
> To double check that value I've analyzed the index tree. To my surprise it was 
> referencing nodes at LEB 11!
> Summing up the length of these nodes is exactly 8904. So, the LPT is correct.
> 
> All nodes on LEB 11 that are referenced by the index tree are of type 
> UBIFS_INO_NODE and have UBIFS_XATTR_FL set.
> So, our problem is xattr related. We forget xattr inodes and the free space 
> accounting goes nuts sooner or later.
> 
> Inspecting the journal gave more details. Only buds of head type BASEHD and 
> DATAHD are present. No GCHD. Therefore GC was not interrupted.
> With the IO transaction log from before the power-cut I was able to figure 
> where the orphaned xattr inodes belong to.
> And indeed, the host inode of said xattr is in the journal for deletion.
> 
> In theory during replay, removal of the host inode should also remove all
> xattr inodes and we are good.
> So the question is, why fails UBIFS to remove the xattr inodes even if we have 
> the host inode in the journal?
> 
> Another round of digging though the index uncovered the problem. The 
> UBIFS_XENT_NODE nodes for all orphaned xattr inodes sat on LEB 11.
> This means upon replay, when UBIFS tries to locate all xattr entries
> of the host inode, it faces the dangling branch case and the loop in 
> ubifs_tnc_remove_ino() terminates with -ENOENT. The host inode will be 
> removed, but the xattr inodes stay.
> 
> Therefore the root of the problem is that UBIFS places only the host inode 
> into the journal upon removal of a file with xattrs and assumes that upon 
> replay we can discover all xattr inodes. That will fail if due to GC the LEB 
> got already unmapped.
> 
> As intermediate solution I suggest adding also xattr inodes to the journal
> upon unlink. This will make unlink more expensive but for now it is IMHO ok
> until a more sophisticated solution is found.
> Files usually have only a few xattr, so the cost is not that bad.
> 
> What do you guys think? Does my analysis make sense to you?

It doesn't seem like there is a way to avoid adding more information to the
journal upon deletion, but you probably need to cater for the possibility
that there is a power cut while writing that information e.g. it wouldn't be
good to remove half the xattrs but leave the rest



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux