Hi, We're experiencing some kind of file system corruption with the UBIFS file system after power cuts. The problem shows up as an error during mount: # mount -t ubifs ubi0:home /home mount: /home: special device ubi0:home does not exist. The underlying UBI volumes are all fine: # mtdinfo /dev/mtd0 mtd0 Name: ubi Type: nand Eraseblock size: 131072 bytes, 128.0 KiB Amount of eraseblocks: 8192 (1073741824 bytes, 1024.0 MiB) Minimum input/output unit size: 2048 bytes Sub-page size: 2048 bytes OOB size: 64 bytes Character device major/minor: 90:0 Bad blocks are allowed: true Device is writable: true # ubinfo -a UBI version: 1 Count of UBI devices: 1 UBI control device major/minor: 10:58 Present UBI devices: ubi0 ubi0 Volumes count: 3 Logical eraseblock size: 126976 bytes, 124.0 KiB Total amount of logical eraseblocks: 8192 (1040187392 bytes, 992.0 MiB) Amount of available logical eraseblocks: 0 (0 bytes) Maximum count of volumes 128 Count of bad physical eraseblocks: 0 Count of reserved physical eraseblocks: 160 Current maximum erase counter value: 36 Minimum input/output unit size: 2048 bytes Character device major/minor: 246:0 Present volumes: 0, 1, 2 Volume ID: 0 (on ubi0) Type: dynamic Alignment: 1 Size: 2676 LEBs (339787776 bytes, 324.0 MiB) State: OK Name: rfs2 Character device major/minor: 246:1 ----------------------------------- Volume ID: 1 (on ubi0) Type: dynamic Alignment: 1 Size: 2676 LEBs (339787776 bytes, 324.0 MiB) State: OK Name: rfs3 Character device major/minor: 246:2 ----------------------------------- Volume ID: 2 (on ubi0) Type: dynamic Alignment: 1 Size: 2674 LEBs (339533824 bytes, 323.8 MiB) State: OK Name: home Character device major/minor: 246:3 I already debugged the ubifs kernel module to locate where exactly the error is returned, and the call chain is: ubifs_mount -> ubifs_fill_super -> mount_ubifs -> ubifs_mount_orphans → kill_orphans -> do_kill_orphans -> ubifs_tnc_lookup -> ubifs_tnc_locate The ubifs_tnc_locate function fails with -ENOENT because the ubifs_lookup_level0 function returns 0. If I patch the mount_ubifs function to call ubifs_mount_orphans with zero for the unclean parameter (instead of the value of c->need_recovery), then the mounting succeeds. Afterwards, when rebooting once more with the original unpatched kernel, the file system appears to be fixed again, and mounting succeeds. I'm not really sure what's going on under the hood, but it looks like a problem with the handling of the orphan files. With this knowledge, we are now able to reproduce the problem reliable, by doing a power cut while running the attached script. The scripts creates many files in a loop, keeps them all open and removes them again. With this approach we hit the problem about once every two attempts. The problem appeared for the first time after we switched from kernel v4.7 to v5.3. I tried with v5.4 and master too, in case we are hitting a problem that is already fixed, but they show the same problem. After doing some bisecting, this commit appears to have introduced the problem: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/ubifs/orphan.c?id=ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e How can we fix this? Jef
Attachment:
ubifs.sh
Description: ubifs.sh
______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/