On Thu, Sep 12, 2024 at 11:18:34PM +0200, Christian Theune wrote: > This bug is very hard to reproduce but has been known to exist as a > “fluke” for a while already. I have invested a number of days trying > to come up with workloads to trigger it quicker than that stochastic > “once every few weeks in a fleet of 1.5k machines", but it eludes > me so far. I know that this also affects Facebook/Meta as well as > Cloudflare who are both running newer kernels (at least 6.1, 6.6, > and 6.9) with the above mentioned patch reverted. I’m from a much > smaller company and seeing that those guys are running with this patch > reverted (that now makes their kernel basically an untested/unsupported > deviation from the mainline) smells like desparation. I’m with a > much smaller team and company and I’m wondering why this isn’t > tackled more urgently from more hands to make it shallow (hopefully). This passive-aggressive nonsense is deeply aggravating. I've known about this bug for much longer, but like you I am utterly unable to reproduce it. I've spent months looking for the bug, and I cannot.