Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Finn,

On 13/09/21 17:22, Finn Thain wrote:
On Mon, 13 Sep 2021, Michael Schmitz wrote:


[23982.680000] list_add corruption. next->prev should be prev
(00b51e98), but was 00bb22d8. (next=00b75cd0).

I chased a similar list corruption bug (shadow LRU list corrupt in
mm/workingset.c:shadow_lru_isolate()) in 4.10. I believe that was
related to an out of bounds memory access - maybe get_reg() from
drivers/char/random.c but it might have been something else.

That bug had disappeared in 4.12, haven't seen it ever since.


Do all of your builds have BUG_ON_DATA_CORRUPTION and DEBUG_LIST enabled?

None had, but that particular list corruption had generated warnings, and null pointer accesses. __list_del() uses WRITE_ONCE() now, can't remember that from 4.10 (but the log for linux/list.h doesn't mention adding WRITE_ONCE so I suppose it must have been there).



Incidentally - have you ever checked whether Al Viro's signal handling
fixes have an impact on these bugs?


I will try that patch series if you think it is related.

Initial tests look promising (but I've said that before).

So far the problem seems to be confined to one machine. Stress tests on
other mac models did not yet reproduce the problem.

Yes, that's suspicious. I'll keep you posted.

Cheers,

	Michael



[Index of Archives]     [Video for Linux]     [Yosemite News]     [Linux S/390]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux