On Thu, Sep 17, 2020 at 10:30:12AM -0400, Theodore Y. Ts'o wrote: > On Thu, Sep 17, 2020 at 10:20:51AM +0800, Ming Lei wrote: > > > > Obviously there is other more serious issue, since 568f27006577 is > > completely reverted in your test, and you still see list corruption > > issue. > > > > So I'd suggest to find the big issue first. Once it is fixed, maybe > > everything becomes fine. > > ... > > Looks it is more like a memory corruption issue, is there any helpful log > > dumped when running kernel with kasan? > > Last night, I ran six VM's using -rc4 with and without KASAN; without > Kasan, half of them hung. With KASAN enabled, all of the test VM's > ran to completion. >From your last email, when you run -rc4 with revert of 568f27006577, you can observe list corruption easily. So can you enable KASAN on -rc4 with revert of 568f27006577 and see if it makes a difference? > > This strongly suggests whatever the problem is, it's timing related. > I'll run a larger set of test runs to see if this pattern is confirmed > today. Looks you enable lots of other debug options, such a lockdep, which has much much heavy runtime load. Maybe you can disable all non-KASAN debug option(non-KASAN memory debug options, lockdep, ...) and keep KASAN debug only and see if you are lucky. Thanks, Ming