Hi Jef, "Driesen Jef (JDI)" <Jef.Driesen@xxxxxxx> wrote on Tue, 28 Jan 2020 10:51:39 +0000: > Hi, > > We're experiencing some kind of file system corruption with the UBIFS > file system after power cuts. The problem shows up as an error during mount: > > # mount -t ubifs ubi0:home /home > mount: /home: special device ubi0:home does not exist. > > The underlying UBI volumes are all fine: > > # mtdinfo /dev/mtd0 > mtd0 > Name: ubi > Type: nand > Eraseblock size: 131072 bytes, 128.0 KiB > Amount of eraseblocks: 8192 (1073741824 bytes, 1024.0 MiB) > Minimum input/output unit size: 2048 bytes > Sub-page size: 2048 bytes > OOB size: 64 bytes > Character device major/minor: 90:0 > Bad blocks are allowed: true > Device is writable: true > > # ubinfo -a > UBI version: 1 > Count of UBI devices: 1 > UBI control device major/minor: 10:58 > Present UBI devices: ubi0 > > ubi0 > Volumes count: 3 > Logical eraseblock size: 126976 bytes, 124.0 KiB > Total amount of logical eraseblocks: 8192 (1040187392 bytes, 992.0 MiB) > Amount of available logical eraseblocks: 0 (0 bytes) > Maximum count of volumes 128 > Count of bad physical eraseblocks: 0 > Count of reserved physical eraseblocks: 160 > Current maximum erase counter value: 36 > Minimum input/output unit size: 2048 bytes > Character device major/minor: 246:0 > Present volumes: 0, 1, 2 > > Volume ID: 0 (on ubi0) > Type: dynamic > Alignment: 1 > Size: 2676 LEBs (339787776 bytes, 324.0 MiB) > State: OK > Name: rfs2 > Character device major/minor: 246:1 > ----------------------------------- > Volume ID: 1 (on ubi0) > Type: dynamic > Alignment: 1 > Size: 2676 LEBs (339787776 bytes, 324.0 MiB) > State: OK > Name: rfs3 > Character device major/minor: 246:2 > ----------------------------------- > Volume ID: 2 (on ubi0) > Type: dynamic > Alignment: 1 > Size: 2674 LEBs (339533824 bytes, 323.8 MiB) > State: OK > Name: home > Character device major/minor: 246:3 > > > I already debugged the ubifs kernel module to locate where exactly the > error is returned, and the call chain is: > > ubifs_mount -> ubifs_fill_super -> mount_ubifs -> ubifs_mount_orphans → > kill_orphans -> do_kill_orphans -> ubifs_tnc_lookup -> ubifs_tnc_locate > > The ubifs_tnc_locate function fails with -ENOENT because the > ubifs_lookup_level0 function returns 0. > > If I patch the mount_ubifs function to call ubifs_mount_orphans with > zero for the unclean parameter (instead of the value of > c->need_recovery), then the mounting succeeds. Afterwards, when > rebooting once more with the original unpatched kernel, the file system > appears to be fixed again, and mounting succeeds. > > I'm not really sure what's going on under the hood, but it looks like a > problem with the handling of the orphan files. With this knowledge, we > are now able to reproduce the problem reliable, by doing a power cut > while running the attached script. The scripts creates many files in a > loop, keeps them all open and removes them again. With this approach we > hit the problem about once every two attempts. > > The problem appeared for the first time after we switched from kernel > v4.7 to v5.3. I tried with v5.4 and master too, in case we are hitting a > problem that is already fixed, but they show the same problem. After > doing some bisecting, this commit appears to have introduced the problem: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/ubifs/orphan.c?id=ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e > > How can we fix this? Just adding Richard into the loop, he is not available right now but will probably be interested by this issue. On my side, I have no clue :) Thanks, Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/