On Thu, 19 Sept 2024 at 08:35, Christian Theune <ct@xxxxxxxxxxxxxxx> wrote: > > Happy to! I see there’s still some back and forth on the specific > patches. Let me know which kernel version and which patches I should > start trying out. I’m loosing track while following the discussion. Yeah, right now Jens is still going to run some more testing, but I think the plan is to just backport a4864671ca0b ("lib/xarray: introduce a new helper xas_get_order") 6758c1128ceb ("mm/filemap: optimize filemap folio adding") and I think we're at the point where you might as well start testing that if you have the cycles for it. Jens is mostly trying to confirm the root cause, but even without that, I think you running your load with those two changes back-ported is worth it. (Or even just try running it on plain 6.10 or 6.11, both of which already has those commits) > In preparation: I’m wondering whether the known reproducer gives > insight how I might force my load to trigger it more easily? Would > running the reproducer above and combining that with a running > PostgreSQL benchmark make sense? > > Otherwise we’d likely only be getting insight after weeks of not > seeing crashes … So considering how well the reproducer works for Jens and Chris, my main worry is whether your load might have some _additional_ issue. Unlikely, but still .. The two commits fix the repproducer, so I think the important thing to make sure is that it really fixes the original issue too. And yeah, I'd be surprised if it doesn't, but at the same time I would _not_ suggest you try to make your load look more like the case we already know gets fixed. So yes, it will be "weeks of not seeing crashes" until we'd be _really_ confident it's all the same thing, but I'd rather still have you test that, than test something else than what caused issues originally, if you see what I mean. Linus