Re: ubifs: mounting fails due to error in orphan file handling

Miquel Raynal <miquel.raynal@xxxxxxxxxxx> · Wed, 5 Feb 2020 09:22:02 +0100

Hi Jef,

"Driesen Jef (JDI)" <Jef.Driesen@xxxxxxx> wrote on Tue, 28
Jan 2020 10:51:39 +0000:

> Hi,
> 
> We're experiencing some kind of file system corruption with the UBIFS 
> file system after power cuts. The problem shows up as an error during mount:
> 
> # mount -t ubifs ubi0:home /home
> mount: /home: special device ubi0:home does not exist.
> 
> The underlying UBI volumes are all fine:
> 
> # mtdinfo /dev/mtd0
> mtd0
> Name:                           ubi
> Type:                           nand
> Eraseblock size:                131072 bytes, 128.0 KiB
> Amount of eraseblocks:          8192 (1073741824 bytes, 1024.0 MiB)
> Minimum input/output unit size: 2048 bytes
> Sub-page size:                  2048 bytes
> OOB size:                       64 bytes
> Character device major/minor:   90:0
> Bad blocks are allowed:         true
> Device is writable:             true
> 
> # ubinfo -a
> UBI version:                    1
> Count of UBI devices:           1
> UBI control device major/minor: 10:58
> Present UBI devices:            ubi0
> 
> ubi0
> Volumes count:                           3
> Logical eraseblock size:                 126976 bytes, 124.0 KiB
> Total amount of logical eraseblocks:     8192 (1040187392 bytes, 992.0 MiB)
> Amount of available logical eraseblocks: 0 (0 bytes)
> Maximum count of volumes                 128
> Count of bad physical eraseblocks:       0
> Count of reserved physical eraseblocks:  160
> Current maximum erase counter value:     36
> Minimum input/output unit size:          2048 bytes
> Character device major/minor:            246:0
> Present volumes:                         0, 1, 2
> 
> Volume ID:   0 (on ubi0)
> Type:        dynamic
> Alignment:   1
> Size:        2676 LEBs (339787776 bytes, 324.0 MiB)
> State:       OK
> Name:        rfs2
> Character device major/minor: 246:1
> -----------------------------------
> Volume ID:   1 (on ubi0)
> Type:        dynamic
> Alignment:   1
> Size:        2676 LEBs (339787776 bytes, 324.0 MiB)
> State:       OK
> Name:        rfs3
> Character device major/minor: 246:2
> -----------------------------------
> Volume ID:   2 (on ubi0)
> Type:        dynamic
> Alignment:   1
> Size:        2674 LEBs (339533824 bytes, 323.8 MiB)
> State:       OK
> Name:        home
> Character device major/minor: 246:3
> 
> 
> I already debugged the ubifs kernel module to locate where exactly the 
> error is returned, and the call chain is:
> 
> ubifs_mount -> ubifs_fill_super -> mount_ubifs -> ubifs_mount_orphans → 
> kill_orphans -> do_kill_orphans -> ubifs_tnc_lookup -> ubifs_tnc_locate
> 
> The ubifs_tnc_locate function fails with -ENOENT because the 
> ubifs_lookup_level0 function returns 0.
> 
> If I patch the mount_ubifs function to call ubifs_mount_orphans with 
> zero for the unclean parameter (instead of the value of 
> c->need_recovery), then the mounting succeeds. Afterwards, when 
> rebooting once more with the original unpatched kernel, the file system 
> appears to be fixed again, and mounting succeeds.
> 
> I'm not really sure what's going on under the hood, but it looks like a 
> problem with the handling of the orphan files. With this knowledge, we 
> are now able to reproduce the problem reliable, by doing a power cut 
> while running the attached script. The scripts creates many files in a 
> loop, keeps them all open and removes them again. With this approach we 
> hit the problem about once every two attempts.
> 
> The problem appeared for the first time after we switched from kernel 
> v4.7 to v5.3. I tried with v5.4 and master too, in case we are hitting a 
> problem that is already fixed, but they show the same problem. After 
> doing some bisecting, this commit appears to have introduced the problem:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/ubifs/orphan.c?id=ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e
> 
> How can we fix this?

Just adding Richard into the loop, he is not available right now but
will probably be interested by this issue. On my side, I have no clue :)

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/