UBIFS question: Power-cuts after ubifs_leb_unmap()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 30/07/18 10:28, Richard Weinberger wrote:
> Am Montag, 30. Juli 2018, 08:55:24 CEST schrieb Adrian Hunter:
>> On 26/07/18 01:29, Richard Weinberger wrote:
>>> Artem, Adrian,
>>>
>>> Am Dienstag, 24. Juli 2018, 10:01:30 CEST schrieb Artem Bityutskiy:
>>>> Hi, I looked at this quickly and talked to Adrian. I cannot solve this
>>>> for you but here are some thoughts.
>>>
>>> Both of your thoughts are much appreciated!
>>>
>>>> On Mon, 2018-07-09 at 12:11 +0200, Richard Weinberger wrote:
>>>>> Artem, Adrian,
>>>>>
>>>>> While playing with a new UBI/UBIFS test framework I managed to hit
>>>>> this error, 
>>>>> with lprops self-checks enabled:
>>>>>
>>>>> [ 2412.268964] UBIFS error (ubi0:0 pid 708): scan_check_cb: bad
>>>>> accounting of 
>>>>> LEB 11: free 0, dirty 118072 flags 0x1, should be free 126976, dirty
>>>>> 0
>>>>
>>>> So it has 126976 - 118072 = 8904 of used space.
>>>>
>>>
>>> [...]
>>>
>>>> 1. If you believe some specific place like the one you pointed contains
>>>> a bug, then emulate a power cut in that place and try to reproduce the
>>>> bug.
>>>>
>>>> 2. Try to focus on recovery replay, this is the main suspect. Try to
>>>> figure out what is that used space in that LEB?
>>>
>>> After spending the third night in a row staring at hexdumps, I think I know
>>> what is going on.
>>>
>>> I see that LEB 11 recently got unmapped, but the LPT still accounts 8904 bytes
>>> as used.
>>> To double check that value I've analyzed the index tree. To my surprise it was 
>>> referencing nodes at LEB 11!
>>> Summing up the length of these nodes is exactly 8904. So, the LPT is correct.
>>>
>>> All nodes on LEB 11 that are referenced by the index tree are of type 
>>> UBIFS_INO_NODE and have UBIFS_XATTR_FL set.
>>> So, our problem is xattr related. We forget xattr inodes and the free space 
>>> accounting goes nuts sooner or later.
>>>
>>> Inspecting the journal gave more details. Only buds of head type BASEHD and 
>>> DATAHD are present. No GCHD. Therefore GC was not interrupted.
>>> With the IO transaction log from before the power-cut I was able to figure 
>>> where the orphaned xattr inodes belong to.
>>> And indeed, the host inode of said xattr is in the journal for deletion.
>>>
>>> In theory during replay, removal of the host inode should also remove all
>>> xattr inodes and we are good.
>>> So the question is, why fails UBIFS to remove the xattr inodes even if we have 
>>> the host inode in the journal?
>>>
>>> Another round of digging though the index uncovered the problem. The 
>>> UBIFS_XENT_NODE nodes for all orphaned xattr inodes sat on LEB 11.
>>> This means upon replay, when UBIFS tries to locate all xattr entries
>>> of the host inode, it faces the dangling branch case and the loop in 
>>> ubifs_tnc_remove_ino() terminates with -ENOENT. The host inode will be 
>>> removed, but the xattr inodes stay.
>>>
>>> Therefore the root of the problem is that UBIFS places only the host inode 
>>> into the journal upon removal of a file with xattrs and assumes that upon 
>>> replay we can discover all xattr inodes. That will fail if due to GC the LEB 
>>> got already unmapped.
>>>
>>> As intermediate solution I suggest adding also xattr inodes to the journal
>>> upon unlink. This will make unlink more expensive but for now it is IMHO ok
>>> until a more sophisticated solution is found.
>>> Files usually have only a few xattr, so the cost is not that bad.
>>>
>>> What do you guys think? Does my analysis make sense to you?
>>
>> It doesn't seem like there is a way to avoid adding more information to the
>> journal upon deletion, but you probably need to cater for the possibility
>> that there is a power cut while writing that information e.g. it wouldn't be
>> good to remove half the xattrs but leave the rest
> 
> With grouped nodes a power-cut is okay, the bigger problem is the number of xattrs.
> We support up to 2^16. Do delete an inode via journal we need UBIFS_INO_NODE_SZ bytes
> in the journal.
> So, in worst case we need to write 2^16 times UBIFS_INO_NODE_SZ bytes to the journal.
> I'm still looking into this option right now, just to be sure that this is really the
> solution to the problem I see. Finding a stable reproducer is also not easy...
> 
> Another possibility is playing with ubifs_tnc_remove_ino(), upon unlink() it removes
> all entries from the TNC and marks them as dirty.
> The problem is that also all xattr inodes are marked as dirty.
> If we manage to find a way that these get marked only as dirty when we are sure that
> they are gone the GC will no longer unmap a "used" LEB.
> But that all is very tricky.

What about: add the inode to be deleted to orphans, then delete the xattrs
one at a time, then when they are all gone, delete the inode and remove it
from orphans.



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux