On Thu, Jul 23, 2020 at 5:07 PM Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > I say that for full disclosure, so you don't wrack your brains > too much, when it may still turn out to be a screwup on my part. Sounds unlikely. If that patch applied even reasonably closely, I don't see how you'd see a list corruption that wasn't due to the patch. You'd have had to use the wrong spinlock by mistake due to munging it, or something crazy like that. The main list-handling change is (a) open-coding of that finish_wait() (b) slightly different heuristics for removal in the wakeup function where (a) was because my original version of finishing the wait needed to do that return code checking. So a normal "finish_wait()" just does list_del_init(&wait->entry); where-as my open-coded one replaced that with if (!list_empty(&wait->entry)) { list_del(&wait->entry); ret = -EINTR; } and apart from that "set return to -EINTR because nobody woke us up", it also uses just a regular "list_del()" rather than a "list_del_init()". That causes the next/prev field to be poisoned rather than re-initialized. But that second change was because the list entry is on the stack, and we're not touching it any more and are about to return, so I removed the "init" part. Anyway, (a) really looks harmless. Unless the poisoning now triggers some racy debug test that had been hidden by the "init". Hmm. In contrast, (b) means that the likely access patterns of irqs removing the wait entry from the list might be very different from before. The old "autoremove" semantics would only remove the entry from the list when try_to_wake_up() actually woke things up. Now, a successful bit state _always_ removes it, which was kind of the point. But it might cause very different list handling patterns. All the actual list handling looks "obviously safe" because it's protected by the spinlock, though... If you do get oopses with the new patch too, try to send me a copy, and maybe I'll stare at exactly where it happens register contents and go "aah". Linus