Timo, On Thu, Feb 27, 2020 at 2:04 PM Timo Ketola <Timo.Ketola@xxxxxxxxxx> wrote: > We have a few i.MX6D devices which have corrupted their UBIFS filesystem > on power cut and refuse to mount them any more. > > The log says: > > > [ 10.382580] UBIFS (ubi1:0): background thread "ubifs_bgt1_0" started, PID 158 > > [ 10.408838] UBIFS (ubi1:0): recovery needed > > [ 10.802070] UBIFS error (ubi1:0 pid 157): ubifs_scan: corrupt empty space at > > LEB 99:114688 > > [ 10.809054] UBIFS error (ubi1:0 pid 157): ubifs_scanned_corruption: corruptio > > n at LEB 99:114688 > > [ 10.816471] UBIFS error (ubi1:0 pid 157): ubifs_scanned_corruption: first 819 > > 2 bytes from LEB 99:114688 > > [ 10.824585] 00000000: 06101831 713b7e1b 002e0640 00000000 000000a0 00000200 0 > > 0000554 00000000 1....~;q@...............T....... > > [ 10.824601] 00000020: 00000000 00000000 0001585b 00000000 0008c48d 00000000 5 > > d512897 00000000 ........[X...............(Q].... > > ... > > > [ 10.827751] UBIFS error (ubi1:0 pid 157): ubifs_scan: LEB 99 scanning failed > > [ 10.834615] UBIFS (ubi1:0): background thread "ubifs_bgt1_0" stops > > I think I found the culprit from the mtdblock contents. Fragment from > hexdump: > > > 3ca20000 55 42 49 23 01 00 00 00 00 00 00 00 00 00 00 04 |UBI#............| > > 3ca20010 00 00 08 00 00 00 10 00 0c 4d 7c ed 00 00 00 00 |.........M|.....| > > 3ca20020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > 3ca20030 00 00 00 00 00 00 00 00 00 00 00 00 cb 5d 1f 01 |.............]..| > > 3ca20040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > * > > 3ca20800 55 42 49 21 01 01 00 00 00 00 00 00 00 00 00 63 |UBI!...........c| > > 3ca20810 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > 3ca20820 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8d 07 |................| > > 3ca20830 00 00 00 00 00 00 00 00 00 00 00 00 91 2b 87 87 |.............+..| > > 3ca20840 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > * > > 3ca21000 31 18 10 06 30 3c 6d 96 cd 05 2e 00 00 00 00 00 |1...0<m.........| > > 3ca21010 a0 00 00 00 00 02 00 00 54 05 00 00 00 00 00 00 |........T.......| > > ... > > > 3ca3b8c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > * > > 3ca3c000 31 18 10 06 7b 71 87 8f 3c 06 2e 00 00 00 00 00 |1...{q..<.......| > > 3ca3c010 a0 00 00 00 00 02 00 00 54 05 00 00 00 00 00 00 |........T.......| > > 3ca3c020 00 00 00 00 00 00 00 00 5b 58 01 00 00 00 00 00 |........[X......| > > 3ca3c030 79 c3 08 00 00 00 00 00 97 28 51 5d 00 00 00 00 |y........(Q]....| > > 3ca3c040 19 58 6d 38 00 00 00 00 19 58 6d 38 00 00 00 00 |.Xm8.....Xm8....| > > 3ca3c050 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |................| > > 3ca3c060 eb 03 00 00 eb 03 00 00 a4 81 00 00 01 00 00 00 |................| > > 3ca3c070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > 3ca3c080 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 |................| > > 3ca3c090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > 3ca3c0a0 31 18 10 06 84 13 e1 a0 00 00 00 00 00 00 00 00 |1...............| > > 3ca3c0b0 1c 00 00 00 05 00 00 00 44 07 00 00 00 00 00 00 |........D.......| > > 3ca3c0c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > * > > 3ca3c800 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > > * > > 3ca3d000 31 18 10 06 1b 7e 3b 71 40 06 2e 00 00 00 00 00 |1....~;q@.......| So, in there is a whole 2KiB area 0xFF. It is also aligned, so it could be whole page. > > 3ca3d010 a0 00 00 00 00 02 00 00 54 05 00 00 00 00 00 00 |........T.......| > > 3ca3d020 00 00 00 00 00 00 00 00 5b 58 01 00 00 00 00 00 |........[X......| > > 3ca3d030 8d c4 08 00 00 00 00 00 97 28 51 5d 00 00 00 00 |.........(Q]....| > > 3ca3d040 19 58 6d 38 00 00 00 00 19 58 6d 38 00 00 00 00 |.Xm8.....Xm8....| > > 3ca3d050 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |................| > > 3ca3d060 eb 03 00 00 eb 03 00 00 a4 81 00 00 01 00 00 00 |................| > > 3ca3d070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > 3ca3d080 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 |................| > > 3ca3d090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > 3ca3d0a0 31 18 10 06 84 13 e1 a0 00 00 00 00 00 00 00 00 |1...............| > > 3ca3d0b0 1c 00 00 00 05 00 00 00 44 07 00 00 00 00 00 00 |........D.......| > > 3ca3d0c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > * > > 3ca3d800 31 18 10 06 c1 6b e6 57 42 06 2e 00 00 00 00 00 |1....k.WB.......| > > 3ca3d810 a0 00 00 00 00 02 00 00 54 05 00 00 00 00 00 00 |........T.......| > > 3ca3d820 00 00 00 00 00 00 00 00 5b 58 01 00 00 00 00 00 |........[X......| > > 3ca3d830 0d c5 08 00 00 00 00 00 97 28 51 5d 00 00 00 00 |.........(Q]....| > > 3ca3d840 19 58 6d 38 00 00 00 00 19 58 6d 38 00 00 00 00 |.Xm8.....Xm8....| > > 3ca3d850 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |................| > > 3ca3d860 eb 03 00 00 eb 03 00 00 a4 81 00 00 01 00 00 00 |................| > > 3ca3d870 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > 3ca3d880 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 |................| > > 3ca3d890 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > 3ca3d8a0 31 18 10 06 84 13 e1 a0 00 00 00 00 00 00 00 00 |1...............| > > 3ca3d8b0 1c 00 00 00 05 00 00 00 44 07 00 00 00 00 00 00 |........D.......| > > 3ca3d8c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > * > > 3ca3e000 31 18 10 06 0b 75 3d 9e 44 06 2e 00 00 00 00 00 |1....u=.D.......| > > IIUC, ubifs_scan finds empty space at 3ca3c800, stops scanning and > checks the rest of the LEB for being empty but finds something else at > 3ca3d000. Then recovery aborts and mounting fails. > > Do I understand correctly that empty space should always be continuous > at the end of the LEB? Correct. > How could this kind of corruption happen? Hard to say. Maybe bad timing settings which cause writes to have no effect. But usually this leads to ECC errors. If you can share the image with me I can have a look and with some luck we find traces. Is this a mainline kernel? Wonky drivers can lead to all kind of "interesting" results. :-> > Is there any way to recover from this? Not really. UBIFS' IO model got violated and it gives up. > Storage is NAND with 0x20000 erase block size and the kernel is 4.9.88. I guess 2KiB page size? -- Thanks, //richard ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/