Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)

Christian Theune <ct@xxxxxxxxxxxxxxx> · Thu, 19 Sep 2024 08:34:37 +0200

> On 19. Sep 2024, at 05:12, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> 
> On Thu, 19 Sept 2024 at 05:03, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>> 
>> I think we should just do the simple one-liner of adding a
>> "xas_reset()" to after doing xas_split_alloc() (or do it inside the
>> xas_split_alloc()).
> 
> .. and obviously that should be actually *verified* to fix the issue
> not just with the test-case that Chris and Jens have been using, but
> on Christian's real PostgreSQL load.
> 
> Christian?

Happy to! I see there’s still some back and forth on the specific patches. Let me know which kernel version and which patches I should start trying out. I’m loosing track while following the discussion. 

In preparation: I’m wondering whether the known reproducer gives insight how I might force my load to trigger it more easily? Would running the reproducer above and combining that with a running PostgreSQL benchmark make sense? 

Otherwise we’d likely only be getting insight after weeks of not seeing crashes … 

Christian

-- 
Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick