Hi Michael, As far as I can see, the nilfs thread is simply sleeping: [5187529.914580] [<ffffffff80d6ab5c>] ? __schedule+0x4a2/0x6ba [5187529.914580] [<ffffffff8054415d>] ? radix_tree_lookup_slot+0x10/0x24 [5187529.914580] [<ffffffff802b9a59>] ? find_get_entry+0x15/0x63 [5187529.914580] [<ffffffff802b9b6d>] ? pagecache_get_page+0x74/0x15c [5187529.914580] [<ffffffff80458a81>] ? nilfs_grab_buffer+0xa2/0xd9 The most probable reason was the crash of another kernel thread. But it's hard to say if the shared bug is the direct reason of such issue with nilfs's thread. Usually, radix tree has some memory reservation. But if the memory subsystem was severely corrupted then it could affect and radix tree too. It is not enough information for making any conclusions. But I suspect that the initial reason of the issue lives in the memory subsystem code. Thanks, Viacheslav Dubeyko. ---- Original Message ---- Subject: Re: Kernel Bug on Linux 4.1.31, possibly nilfs, not sure Sent: Nov 28, 2016 10:02 AM From: Michael Conrad <mconrad@xxxxxxxxxxxxxxx> To: linux-nilfs@xxxxxxxxxxxxxxx Cc: Hello, I found a bug that might be related. https://patchwork.kernel.org/patch/9339843/ On several of my crash dumps, I saw it die in "memcg_drain_all_list_lrus" and found this mentioned in a bug where FUSE was calling functions to modify the radix tree directly in a way that other kernel developers weren't expecting. (I don't fully understand the details) Our systems that are crashing are not using the FUSE driver for anything, but the crashes still coincide with nilfs2 activity. Is there a chance nilfs is affected by this same situation? I've attached another example crash log. We're still on 4.1.31, but will move to 4.1.35 soon. -Mike On 9/28/2016 4:59 PM, Michael Conrad wrote: > Hi, I've started getting "kernel bug" messages on a few systems. At > first I wasn't sure if it was related to faulty hardware or not. The > kernel stack traces do not always mention nilfs, but they roughly > relate to the times when we make nilfs snapshots of our system (rsync > from ext4 into nilfs). > > We use the same kernel image on several dozen systems, and aside from > one server, there have only been two crashes like this in the past two > months. (that "one server" though has crashed about four or five > times, which is why I suspected hardware at first) > > However I just had one happen on a Linode, which is pretty reliable > hardware, so I figured I'd post and see what people think. The end of > the kernel log is attached. > > I'm worried that something is corrupting kernel memory, and then > causing crashes in un-related parts of the kernel. I'm really not > sure how to narrow it down other than turning off the nilfs snapshots > and see if it continues to happen, though then i have to come up with > another backup solution in the mean time. > > It's worth noting that the server crashing frequently also has the > largest tree of files. Also, the nilfs filesystems on these systems > date back to various versions of nilfs from the 3.* kernel line. It's > possible that an old bug is lurking on the filesystem structure, but I > don't believe nilfs has a check tool yet, correct? > > Thanks, > Michael Conrad > ��.n��������+%������w��{.n�����{��x�~���n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�