Hou Tao, On Thu, Mar 5, 2020 at 10:15 AM Hou Tao <houtao1@xxxxxxxxxx> wrote: > > Carson Li Reports the following error: > > UBIFS error: ubifs_read_node_wbuf: expected node type 0 > Not a node, first 24 bytes: > Kernel panic - not syncing > CPU: 1 PID: 943 Comm: http-thread 4.4.83 #1 > panic+0x70/0x1e4 > ubifs_dump_node+0x6c/0x9a0 > ubifs_read_node_wbuf+0x350/0x384 > ubifs_tnc_read_node+0x54/0x214 > ubifs_tnc_locate+0x118/0x1b4 > ubifs_iget+0xb8/0x68c > ubifs_lookup+0x1b4/0x258 > lookup_real+0x30/0x4c > __lookup_hash+0x34/0x3c > walk_component+0xec/0x2a0 > path_lookupat+0x80/0xfc > filename_lookup+0x5c/0xfc > vfs_fstatat+0x4c/0x9c > SyS_stat64+0x14/0x30 > ret_fast_syscall+0x0/0x34 > > It seems the LEB used as DATA journal head is GC'ed, and ubifs_tnc_locate() > read an invalid node. But now the property of journal head LEB has > LPROPS_TAKEN flag set and GC will skip these LEBs. > > The actual situation of the problem is the LEB is GCed, freed and then > reused as journal head, and finally ubifs_tnc_locate() reads > an invalid node. And it can be reproduced by the following steps: > * create 128 empty files > * overwrite 8 files in backgroup repeatedly to trigger GC > * drop inode cache and stat these 128 empty files repeatedly > > We can simply fix the problem by removing the optimization of reading > wbuf when possible. But because taking spin lock and memcpying from > wbuf is much less time-consuming than reading from MTD device, so we fix > the logic of wbuf reading instead. I'm digging now into that issue. Did you also experiment with reading while tnc_mutex is locked? So, no race at all (having safely = 1 by default). Just to make sure we don't fix an no longer needed optimization. The code is already anything but trivial and adding more code makes me nervous. -- Thanks, //richard ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/