On 5/7/20 8:44 PM, Chris Murphy wrote:
I would change very little until you track this down, if the goal is to track it down and get it fixed. I'm not sure if LVM thinp is supported with LVM raid still, which if it's not supported yet then I can understand using mdadm raid5 instead of LVM raid5.
My apologies if this ideas was considered and discarded already, but the bug being hard to reproduce right after reboot and the error being exactly the size of a page sounds like a memory use after free bug or similar.
A debug kernel build with one or more of these options may find the problem: CONFIG_DEBUG_PAGEALLOC CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT CONFIG_PAGE_POISONING + page_poison=1 CONFIG_KASAN --Sarah