Re: kernel BUG at fs/ext4/inode.c:1914 - page_buffers()

Ivan Zahariev <famzah@xxxxxxxxxxx> · Mon, 5 Dec 2022 23:50:48 +0200

On 5.12.2022 г. 23:10, Theodore Ts'o wrote:
Is it fair to say that your workload is using data=journaled and is
frequently truncating that might have been recently modified (hence
triggering the race between truncate and journalled writepages)?

The servers are hosting hundreds of users who run their own tasks and we 
have no control nor a way to closely observe their usage pattern. Unless 
you point us in a direction to debug this somehow.

"data=journaled" is definitely in place for all servers.

I wonder if you could come up with a more reliable reproducer so we
can test a particular patch.

We already tried different parallel combinations of mmap()'ed reading, 
direct and regular write(), drop_caches, sync(), etc. but we can't 
trigger the panic.

If you have any suggestions what we should try next as a reproducer, 
please share and we will try to implement and execute it.

Did I understand correctly that a possible reproducer would be a loop of 
heavy write() followed by truncate() of the same file? Should we 
randomly sync() and/or "echo 3 > /proc/sys/vm/drop_caches" to increase 
the chance of hitting the bug?

Best regards.
--Ivan