On Wed, Jan 8, 2025 at 11:14 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > On Tue, Jan 7, 2025 at 7:56 PM Nhat Pham <nphamcs@xxxxxxxxx> wrote: > > I may have found a simpler "proper" fix than disabling migration, > please see my suggestion in: > https://lore.kernel.org/lkml/CAJD7tkYpNNsbTZZqFoRh-FkXDgxONZEUPKk1YQv7-TFMWWQRzQ@xxxxxxxxxxxxxx/ Discovered that thread just now - sorry, too many emails to catch up on :) Taking a look now. > > > > > Is this a frequently occured problem in the wild? If so, we can > > disable migration to firefight, and then do the proper thing down the > > line. > > I don't believe so. Actually, I think the deadlock introduced by the > previous fix is more problematic than the UAF it fixes. > > Andrew, could you please pick up patch 1 (the revert) while we figure > out the alternative fix? It's important that it lands in v6.13 to > avoid the possibility of deadlock. Figuring out an alternative fix is > less important. Agree. Let's revert the "fix" first. CPU offlining is a much rarer event than this deadlocking scenario discovered by syzbot.