On Fri, Apr 19, 2024 at 9:45 AM Takashi Iwai <tiwai@xxxxxxx> wrote: > > On Fri, 19 Apr 2024 09:39:09 +0200, > Harshit Mogalapalli wrote: > > > > Hi Takashi, > > > > On 19/04/24 12:14, Takashi Iwai wrote: > > > On Thu, 18 Apr 2024 21:29:57 +0200, > > > Helge Deller wrote: > > >> > > >> On 4/18/24 16:26, Takashi Iwai wrote: > > >>> On Thu, 18 Apr 2024 16:06:52 +0200, > > >>> Nam Cao wrote: > > >>>> > > >>>> On 2024-04-18 Harshit Mogalapalli wrote: > > >>>>> While fuzzing 5.15.y kernel with Syzkaller, we noticed a INFO: task hung > > >>>>> bug in fb_deferred_io_work() > > >>>> > > >>>> Which framebuffer device are you using exactly? It is possible that > > >>>> the problem is with the device driver, not core framebuffer. > > >>> > > >>> Note that it was already known that using flush_delayed_work() caused > > >>> a problem. See the thread of the fix patch: > > >>> https://lore.kernel.org/all/20230129082856.22113-1-tiwai@xxxxxxx/ > > >> > > >> Harshit reported the hung tasks with kernel v5.15-stable, and can even reproduce > > >> that issue with kernel v6.9-rc4 although it has all of your patches from > > >> that referenced mail thread applied. > > >> So, what does your statement that "it was already known that it causes problems" exactly mean? > > >> Can it be fixed? Is someone looking into fixing it? > > > > > > My original fix was intentionally with cancel_delayed_work_sync() > > > because flush_delayed_work() didn't work. We knew that it'd miss some > > > last-minute queued change, but it's better than crash, so it was > > > applied in that way. > > > > > > > Thanks for sharing these details. > > > > > Then later on, the commit 33cd6ea9c067 changed cancel_*() to > > > flush_delayed_work() blindly, and the known problem resurfaced again. > > > > > > > I have reverted that commit, but still could see some other task hung > > message as shared here on other reply: > > > > https://lore.kernel.org/all/d2485cb9-277d-4b8e-9794-02f1efababc9@xxxxxxxxxx/ > > Yes, then it could be a different cause, I suppose. > The crash with flush_delayed_work() was a real crash, no hanging task, > IIRC. Neither cancel_delayed_work_sync() or flush_delayed_work() prevent new work from being scheduled after they return. But cancel_delayed_work_sync() at least makes sure the queue is empty so the problem becomes less apparent. Could this explain what we're seeing? > > Can you reproduce the issue with the latest Linus upstream, too? > > > thanks, > > Takashi