Hi Paul, > -----Original Message----- > From: dri-devel <dri-devel-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Paul > Menzel > Sent: Thursday, March 31, 2022 4:22 PM > To: Liu, Chuansheng <chuansheng.liu@xxxxxxxxx> > Cc: linux-fbdev@xxxxxxxxxxxxxxx; Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>; > akpm@xxxxxxxxxxxxxxxxxxxx; daniel@xxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > netdev@xxxxxxxxxxxxxxx; deller@xxxxxx; x86@xxxxxxxxxx; ast@xxxxxxxxxx; dri- > devel@xxxxxxxxxxxxxxxxxxxxx; andrii@xxxxxxxxxx; Song Liu <song@xxxxxxxxxx>; > Ingo Molnar <mingo@xxxxxxxxxx>; Thomas Gleixner <tglx@xxxxxxxxxxxxx>; > tzimmermann@xxxxxxx; Borislav Petkov <bp@xxxxxxxxx>; bpf@xxxxxxxxxxxxxxx; > Edgecombe, Rick P <rick.p.edgecombe@xxxxxxxxx>; kernel-team@xxxxxx > Subject: Re: [PATCH] fbdev: defio: fix the pagelist corruption > > Dear Chuansheng, > > > Am 31.03.22 um 02:06 schrieb Liu, Chuansheng: > > >> -----Original Message----- > >> From: Paul Menzel <pmenzel@xxxxxxxxxxxxx> > >> Sent: Thursday, March 31, 2022 12:47 AM > > […] > > >> Am 29.03.22 um 01:58 schrieb Liu, Chuansheng: > >> > >>>> -----Original Message----- > >>>> From: Paul Menzel > >>>> Sent: Monday, March 28, 2022 2:15 PM > >> > >>>> Am 28.03.22 um 02:58 schrieb Liu, Chuansheng: > >>>> > >>>>>> -----Original Message----- > >>>> > >>>>>> Sent: Saturday, March 26, 2022 4:11 PM > >>>> > >>>>>> Am 17.03.22 um 06:46 schrieb Chuansheng Liu: > >>>>>>> Easily hit the below list corruption: > >>>>>>> == > >>>>>>> list_add corruption. prev->next should be next (ffffffffc0ceb090), but > >>>>>>> was ffffec604507edc8. (prev=ffffec604507edc8). > >>>>>>> WARNING: CPU: 65 PID: 3959 at lib/list_debug.c:26 > >>>>>>> __list_add_valid+0x53/0x80 > >>>>>>> CPU: 65 PID: 3959 Comm: fbdev Tainted: G U > >>>>>>> RIP: 0010:__list_add_valid+0x53/0x80 > >>>>>>> Call Trace: > >>>>>>> <TASK> > >>>>>>> fb_deferred_io_mkwrite+0xea/0x150 > >>>>>>> do_page_mkwrite+0x57/0xc0 > >>>>>>> do_wp_page+0x278/0x2f0 > >>>>>>> __handle_mm_fault+0xdc2/0x1590 > >>>>>>> handle_mm_fault+0xdd/0x2c0 > >>>>>>> do_user_addr_fault+0x1d3/0x650 > >>>>>>> exc_page_fault+0x77/0x180 > >>>>>>> ? asm_exc_page_fault+0x8/0x30 > >>>>>>> asm_exc_page_fault+0x1e/0x30 > >>>>>>> RIP: 0033:0x7fd98fc8fad1 > >>>>>>> == > >>>>>>> > >>>>>>> Figure out the race happens when one process is adding &page->lru > into > >>>>>>> the pagelist tail in fb_deferred_io_mkwrite(), another process is > >>>>>>> re-initializing the same &page->lru in fb_deferred_io_fault(), which is > >>>>>>> not protected by the lock. > >>>>>>> > >>>>>>> This fix is to init all the page lists one time during initialization, > >>>>>>> it not only fixes the list corruption, but also avoids INIT_LIST_HEAD() > >>>>>>> redundantly. > >>>>>>> > >>>>>>> Fixes: 105a940416fc ("fbdev/defio: Early-out if page is already > enlisted") > >>>>>>> Cc: Thomas Zimmermann <tzimmermann@xxxxxxx> > >>>>>>> Signed-off-by: Chuansheng Liu <chuansheng.liu@xxxxxxxxx> > >>>>>>> --- > >>>>>>> drivers/video/fbdev/core/fb_defio.c | 9 ++++++++- > >>>>>>> 1 file changed, 8 insertions(+), 1 deletion(-) > >>>>>>> > >>>>>>> diff --git a/drivers/video/fbdev/core/fb_defio.c > b/drivers/video/fbdev/core/fb_defio.c > >>>>>>> index 98b0f23bf5e2..eafb66ca4f28 100644 > >>>>>>> --- a/drivers/video/fbdev/core/fb_defio.c > >>>>>>> +++ b/drivers/video/fbdev/core/fb_defio.c > >>>>>>> @@ -59,7 +59,6 @@ static vm_fault_t fb_deferred_io_fault(struct > vm_fault *vmf) > >>>>>>> printk(KERN_ERR "no mapping available\n"); > >>>>>>> > >>>>>>> BUG_ON(!page->mapping); > >>>>>>> - INIT_LIST_HEAD(&page->lru); > >>>>>>> page->index = vmf->pgoff; > >>>>>>> > >>>>>>> vmf->page = page; > >>>>>>> @@ -220,6 +219,8 @@ static void fb_deferred_io_work(struct > work_struct *work) > >>>>>>> void fb_deferred_io_init(struct fb_info *info) > >>>>>>> { > >>>>>>> struct fb_deferred_io *fbdefio = info->fbdefio; > >>>>>>> + struct page *page; > >>>>>>> + int i; > >>>>>>> > >>>>>>> BUG_ON(!fbdefio); > >>>>>>> mutex_init(&fbdefio->lock); > >>>>>>> @@ -227,6 +228,12 @@ void fb_deferred_io_init(struct fb_info *info) > >>>>>>> INIT_LIST_HEAD(&fbdefio->pagelist); > >>>>>>> if (fbdefio->delay == 0) /* set a default of 1 s */ > >>>>>>> fbdefio->delay = HZ; > >>>>>>> + > >>>>>>> + /* initialize all the page lists one time */ > >>>>>>> + for (i = 0; i < info->fix.smem_len; i += PAGE_SIZE) { > >>>>>>> + page = fb_deferred_io_page(info, i); > >>>>>>> + INIT_LIST_HEAD(&page->lru); > >>>>>>> + } > >>>>>>> } > >>>>>>> EXPORT_SYMBOL_GPL(fb_deferred_io_init); > >>>>>>> > >>>>>> Applying your patch on top of current Linus’ master branch, tty0 is > >>>>>> unusable and looks frozen. Sometimes network card still works, > sometimes > >>>>>> not. > >>>>> > >>>>> I don't see how the patch would cause below BUG call stack, need some > time to > >>>>> debug. Just few comments: > >>>>> 1. Will the system work well without this patch? > >>>> > >>>> Yes, the framebuffer works well without the patch. > >>>> > >>>>> 2. When you are sure the patch causes the regression you saw, please get > free > >>>> to submit one reverted patch, thanks : ) > >>>> > >>>> I think you for patch wasn’t submitted yet – at least not pulled by Linus. > >>> The patch has been in drm-tip, could you have a try with the latest drm-tip > to see > >>> if the Framebuffer works well, in that case, we could revert it in drm-tip then. > >> > >> With drm-tip (drm-tip: 2022y-03m-29d-13h-14m-35s UTC integration > >> manifest) everything works fine. (I had to disable amdgpu driver, as it > >> failed to build.) Is anyone able to explain that? > > > > My patch is for fixing another patch which is in the drm-tip at least, > > The referenced commit 105a940416fc in the Fixes tag is also in Linus’ > master branch. > > > so I assume applying my patch into Linus tree directly is not > > completely proper. That's my intention of asking your help for > > retesting drm-tip. > If there were such a relation, that would need to be documented in the > commit message. You should have seen it : ) > > > You mean everything working fine means another issue you hit is also > > gone? > No, I just mean the hang when applying your patch. > > Anyway, after figuring out, that drm-tip, is actually not behind Linus’ > master branch, I tried to figure out the differences, and it turns out > it’s also related to commit fac54e2bfb5b (x86/Kconfig: Select > HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP) [1], which is in > Linus’ > master branch, but not drm-tip. Note, I am using a 32-bit user space and > a 64-bit Linux kernel. Reverting commit fac54e2bfb5b, and having your > patch a applied, the hang is gone. Good to know you have figured it out, and the issue you hit is not related to my patch : ) > > I am adding the people involved in the other discussion to make them > aware of this failure case. > > > Kind regards, > > Paul > > > [1]: https://linux-regtracking.leemhuis.info/regzbot/mainline/