On 5.12.2022 г. 23:10, Theodore Ts'o wrote:
Is it fair to say that your workload is using data=journaled and is frequently truncating that might have been recently modified (hence triggering the race between truncate and journalled writepages)?
The servers are hosting hundreds of users who run their own tasks and we have no control nor a way to closely observe their usage pattern. Unless you point us in a direction to debug this somehow.
"data=journaled" is definitely in place for all servers.
I wonder if you could come up with a more reliable reproducer so we can test a particular patch.
We already tried different parallel combinations of mmap()'ed reading, direct and regular write(), drop_caches, sync(), etc. but we can't trigger the panic.
If you have any suggestions what we should try next as a reproducer, please share and we will try to implement and execute it.
Did I understand correctly that a possible reproducer would be a loop of heavy write() followed by truncate() of the same file? Should we randomly sync() and/or "echo 3 > /proc/sys/vm/drop_caches" to increase the chance of hitting the bug?
Best regards. --Ivan