On Mon, 13 Sep 2021, Michael Schmitz wrote:
[23982.680000] list_add corruption. next->prev should be prev
(00b51e98), but was 00bb22d8. (next=00b75cd0).
I chased a similar list corruption bug (shadow LRU list corrupt in
mm/workingset.c:shadow_lru_isolate()) in 4.10. I believe that was
related to an out of bounds memory access - maybe get_reg() from
drivers/char/random.c but it might have been something else.
That bug had disappeared in 4.12, haven't seen it ever since.
Do all of your builds have BUG_ON_DATA_CORRUPTION and DEBUG_LIST enabled?
Incidentally - have you ever checked whether Al Viro's signal handling
fixes have an impact on these bugs?
I will try that patch series if you think it is related.
So far the problem seems to be confined to one machine. Stress tests on
other mac models did not yet reproduce the problem.