Hi Paul, > -----Original Message----- > From: Paul Menzel <pmenzel@xxxxxxxxxxxxx> > Sent: Thursday, March 31, 2022 12:47 AM > To: Liu, Chuansheng <chuansheng.liu@xxxxxxxxx> > Cc: tzimmermann@xxxxxxx; linux-fbdev@xxxxxxxxxxxxxxx; deller@xxxxxx; dri- > devel@xxxxxxxxxxxxxxxxxxxxx > Subject: Re: [PATCH] fbdev: defio: fix the pagelist corruption > > [Cc: -jayalk@xxxxxxxxxxxx as it bounces] > > Dear Chuansheng, > > > Am 29.03.22 um 01:58 schrieb Liu, Chuansheng: > > >> -----Original Message----- > >> From: Paul Menzel > >> Sent: Monday, March 28, 2022 2:15 PM > > >> Am 28.03.22 um 02:58 schrieb Liu, Chuansheng: > >> > >>>> -----Original Message----- > >> > >>>> Sent: Saturday, March 26, 2022 4:11 PM > >> > >>>> Am 17.03.22 um 06:46 schrieb Chuansheng Liu: > >>>>> Easily hit the below list corruption: > >>>>> == > >>>>> list_add corruption. prev->next should be next (ffffffffc0ceb090), but > >>>>> was ffffec604507edc8. (prev=ffffec604507edc8). > >>>>> WARNING: CPU: 65 PID: 3959 at lib/list_debug.c:26 > >>>>> __list_add_valid+0x53/0x80 > >>>>> CPU: 65 PID: 3959 Comm: fbdev Tainted: G U > >>>>> RIP: 0010:__list_add_valid+0x53/0x80 > >>>>> Call Trace: > >>>>> <TASK> > >>>>> fb_deferred_io_mkwrite+0xea/0x150 > >>>>> do_page_mkwrite+0x57/0xc0 > >>>>> do_wp_page+0x278/0x2f0 > >>>>> __handle_mm_fault+0xdc2/0x1590 > >>>>> handle_mm_fault+0xdd/0x2c0 > >>>>> do_user_addr_fault+0x1d3/0x650 > >>>>> exc_page_fault+0x77/0x180 > >>>>> ? asm_exc_page_fault+0x8/0x30 > >>>>> asm_exc_page_fault+0x1e/0x30 > >>>>> RIP: 0033:0x7fd98fc8fad1 > >>>>> == > >>>>> > >>>>> Figure out the race happens when one process is adding &page->lru into > >>>>> the pagelist tail in fb_deferred_io_mkwrite(), another process is > >>>>> re-initializing the same &page->lru in fb_deferred_io_fault(), which is > >>>>> not protected by the lock. > >>>>> > >>>>> This fix is to init all the page lists one time during initialization, > >>>>> it not only fixes the list corruption, but also avoids INIT_LIST_HEAD() > >>>>> redundantly. > >>>>> > >>>>> Fixes: 105a940416fc ("fbdev/defio: Early-out if page is already enlisted") > >>>>> Cc: Thomas Zimmermann <tzimmermann@xxxxxxx> > >>>>> Signed-off-by: Chuansheng Liu <chuansheng.liu@xxxxxxxxx> > >>>>> --- > >>>>> drivers/video/fbdev/core/fb_defio.c | 9 ++++++++- > >>>>> 1 file changed, 8 insertions(+), 1 deletion(-) > >>>>> > >>>>> diff --git a/drivers/video/fbdev/core/fb_defio.c > >> b/drivers/video/fbdev/core/fb_defio.c > >>>>> index 98b0f23bf5e2..eafb66ca4f28 100644 > >>>>> --- a/drivers/video/fbdev/core/fb_defio.c > >>>>> +++ b/drivers/video/fbdev/core/fb_defio.c > >>>>> @@ -59,7 +59,6 @@ static vm_fault_t fb_deferred_io_fault(struct > vm_fault *vmf) > >>>>> printk(KERN_ERR "no mapping available\n"); > >>>>> > >>>>> BUG_ON(!page->mapping); > >>>>> - INIT_LIST_HEAD(&page->lru); > >>>>> page->index = vmf->pgoff; > >>>>> > >>>>> vmf->page = page; > >>>>> @@ -220,6 +219,8 @@ static void fb_deferred_io_work(struct > work_struct *work) > >>>>> void fb_deferred_io_init(struct fb_info *info) > >>>>> { > >>>>> struct fb_deferred_io *fbdefio = info->fbdefio; > >>>>> + struct page *page; > >>>>> + int i; > >>>>> > >>>>> BUG_ON(!fbdefio); > >>>>> mutex_init(&fbdefio->lock); > >>>>> @@ -227,6 +228,12 @@ void fb_deferred_io_init(struct fb_info *info) > >>>>> INIT_LIST_HEAD(&fbdefio->pagelist); > >>>>> if (fbdefio->delay == 0) /* set a default of 1 s */ > >>>>> fbdefio->delay = HZ; > >>>>> + > >>>>> + /* initialize all the page lists one time */ > >>>>> + for (i = 0; i < info->fix.smem_len; i += PAGE_SIZE) { > >>>>> + page = fb_deferred_io_page(info, i); > >>>>> + INIT_LIST_HEAD(&page->lru); > >>>>> + } > >>>>> } > >>>>> EXPORT_SYMBOL_GPL(fb_deferred_io_init); > >>>>> > >>>> Applying your patch on top of current Linus’ master branch, tty0 is > >>>> unusable and looks frozen. Sometimes network card still works, sometimes > >>>> not. > >>> > >>> I don't see how the patch would cause below BUG call stack, need some > time to > >>> debug. Just few comments: > >>> 1. Will the system work well without this patch? > >> > >> Yes, the framebuffer works well without the patch. > >> > >>> 2. When you are sure the patch causes the regression you saw, please get > free > >> to submit one reverted patch, thanks : ) > >> > >> I think you for patch wasn’t submitted yet – at least not pulled by Linus. > > The patch has been in drm-tip, could you have a try with the latest drm-tip to > see if the > > Framebuffer works well, in that case, we could revert it in drm-tip then. > > With drm-tip (drm-tip: 2022y-03m-29d-13h-14m-35s UTC integration > manifest) everything works fine. (I had to disable amdgpu driver, as it > failed to build.) Is anyone able to explain that? My patch is for fixing another patch which is in the drm-tip at least, so I assume applying my patch into Linus tree directly is not completely proper. That's my intention of asking your help for retesting drm-tip. You mean everything working fine means another issue you hit is also gone? Best Regards Chuansheng