RE: [PATCH] fbdev: defio: fix the pagelist corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Paul,

> -----Original Message-----
> From: dri-devel <dri-devel-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Paul
> Menzel
> Sent: Saturday, March 26, 2022 4:11 PM
> To: Liu, Chuansheng <chuansheng.liu@xxxxxxxxx>
> Cc: linux-fbdev@xxxxxxxxxxxxxxx; deller@xxxxxx; dri-
> devel@xxxxxxxxxxxxxxxxxxxxx; tzimmermann@xxxxxxx; jayalk@xxxxxxxxxxxx
> Subject: Re: [PATCH] fbdev: defio: fix the pagelist corruption
> 
> Dear Chuansheng,
> 
> 
> Am 17.03.22 um 06:46 schrieb Chuansheng Liu:
> > Easily hit the below list corruption:
> > ==
> > list_add corruption. prev->next should be next (ffffffffc0ceb090), but
> > was ffffec604507edc8. (prev=ffffec604507edc8).
> > WARNING: CPU: 65 PID: 3959 at lib/list_debug.c:26
> > __list_add_valid+0x53/0x80
> > CPU: 65 PID: 3959 Comm: fbdev Tainted: G     U
> > RIP: 0010:__list_add_valid+0x53/0x80
> > Call Trace:
> >   <TASK>
> >   fb_deferred_io_mkwrite+0xea/0x150
> >   do_page_mkwrite+0x57/0xc0
> >   do_wp_page+0x278/0x2f0
> >   __handle_mm_fault+0xdc2/0x1590
> >   handle_mm_fault+0xdd/0x2c0
> >   do_user_addr_fault+0x1d3/0x650
> >   exc_page_fault+0x77/0x180
> >   ? asm_exc_page_fault+0x8/0x30
> >   asm_exc_page_fault+0x1e/0x30
> > RIP: 0033:0x7fd98fc8fad1
> > ==
> >
> > Figure out the race happens when one process is adding &page->lru into
> > the pagelist tail in fb_deferred_io_mkwrite(), another process is
> > re-initializing the same &page->lru in fb_deferred_io_fault(), which is
> > not protected by the lock.
> >
> > This fix is to init all the page lists one time during initialization,
> > it not only fixes the list corruption, but also avoids INIT_LIST_HEAD()
> > redundantly.
> >
> > Fixes: 105a940416fc ("fbdev/defio: Early-out if page is already
> > enlisted")
> > Cc: Thomas Zimmermann <tzimmermann@xxxxxxx>
> > Signed-off-by: Chuansheng Liu <chuansheng.liu@xxxxxxxxx>
> > ---
> >   drivers/video/fbdev/core/fb_defio.c | 9 ++++++++-
> >   1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/video/fbdev/core/fb_defio.c
> b/drivers/video/fbdev/core/fb_defio.c
> > index 98b0f23bf5e2..eafb66ca4f28 100644
> > --- a/drivers/video/fbdev/core/fb_defio.c
> > +++ b/drivers/video/fbdev/core/fb_defio.c
> > @@ -59,7 +59,6 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault
> *vmf)
> >   		printk(KERN_ERR "no mapping available\n");
> >
> >   	BUG_ON(!page->mapping);
> > -	INIT_LIST_HEAD(&page->lru);
> >   	page->index = vmf->pgoff;
> >
> >   	vmf->page = page;
> > @@ -220,6 +219,8 @@ static void fb_deferred_io_work(struct work_struct
> *work)
> >   void fb_deferred_io_init(struct fb_info *info)
> >   {
> >   	struct fb_deferred_io *fbdefio = info->fbdefio;
> > +	struct page *page;
> > +	int i;
> >
> >   	BUG_ON(!fbdefio);
> >   	mutex_init(&fbdefio->lock);
> > @@ -227,6 +228,12 @@ void fb_deferred_io_init(struct fb_info *info)
> >   	INIT_LIST_HEAD(&fbdefio->pagelist);
> >   	if (fbdefio->delay == 0) /* set a default of 1 s */
> >   		fbdefio->delay = HZ;
> > +
> > +	/* initialize all the page lists one time */
> > +	for (i = 0; i < info->fix.smem_len; i += PAGE_SIZE) {
> > +		page = fb_deferred_io_page(info, i);
> > +		INIT_LIST_HEAD(&page->lru);
> > +	}
> >   }
> >   EXPORT_SYMBOL_GPL(fb_deferred_io_init);
> >
> Applying your patch on top of current Linus’ master branch, tty0 is
> unusable and looks frozen. Sometimes network card still works, sometimes
> not.

I don't see how the patch would cause below BUG call stack, need some time to
debug. Just few comments:
1. Will the system work well without this patch?
2. When you are sure the patch causes the regression you saw, please get free to submit
one reverted patch, thanks : )

> 
>      $ git log --oneline -nodecorate -2
>      1b351a77ed33 (HEAD -> linus) fbdev: defio: fix the pagelist corruption
>      52d543b5497c (origin/master, origin/HEAD) Merge tag
> 'for-linus-5.17-1' of https://github.com/cminyard/linux-ipmi
> 
> ```
> [    5.256996] raw: 0000000000000000 0000000000000000 00000000ffffffff
> 0000000000000000
> [    5.269582] page dumped because: VM_BUG_ON_PAGE(compound &&
> compound_order(page) != order)
> [    5.279507] ------------[ cut here ]------------
> [    5.286406] kernel BUG at mm/page_alloc.c:1326!
> [    5.291814] invalid opcode: 0000 [#1] PREEMPT SMP
> [    5.296350] CPU: 0 PID: 167 Comm: systemd-udevd Not tainted
> 5.17.0-10753-g1b351a77ed33 #300
> [    5.304670] Hardware name: ASUS F2A85-M_PRO/F2A85-M_PRO, BIOS
> 4.16-337-gb87986e67b 03/25/2022
> [    5.313163] RIP: 0010:free_pcp_prepare+0x295/0x400
> [    5.317930] Code: 00 01 00 75 0b 48 8b 45 08 45 31 ff a8 01 74 4b 48
> 8b 45 00 a9 00 00 01 00 75 22 48 c7 c6 68 30 11 96 48 89 ef e8 cb 29 fd
> ff <0f> 0b 48 89 ef 41 83 c6 01 e8 bd f5 ff ff e9 2e fe ff ff 0f 1f 44
> [    5.336650] RSP: 0018:ffffa6634062f9c0 EFLAGS: 00010246
> [    5.341849] RAX: 000000000000004e RBX: ffffe4be80000000 RCX:
> 0000000000000000
> [    5.348957] RDX: 0000000000000000 RSI: ffffffff96136a37 RDI:
> 00000000ffffffff
> [    5.356063] RBP: ffffe4be840c0000 R08: 0000000000000000 R09:
> 00000000ffffdfff
> [    5.363170] R10: ffffa6634062f7f0 R11: ffffffff9652c4a8 R12:
> 0000000000000000
> [    5.370277] R13: 0000000000000009 R14: ffff91fd02ebc640 R15:
> ffffe4be840c0000
> [    5.377384] FS:  0000000000000000(0000) GS:ffff91fd7b400000(0063)
> knlGS:00000000f7eea800
> [    5.385443] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
> [    5.391164] CR2: 00000000f6f0e840 CR3: 0000000106b60000 CR4:
> 00000000000406f0
> [    5.398272] Call Trace:
> [    5.400697]  <TASK>
> [    5.402778]  free_unref_page+0x1b/0xf0
> [    5.406505]  __vunmap+0x216/0x2c0
> [    5.409798]  drm_fbdev_cleanup+0x5f/0xb0
> [    5.413698]  drm_fbdev_fb_destroy+0x15/0x30
> [    5.417857]  unregister_framebuffer+0x2c/0x40
> [    5.422191]  drm_client_dev_unregister+0x69/0xe0
> [    5.422962] usb usb4: New USB device found, idVendor=1d6b,
> idProduct=0003, bcdDevice= 5.17
> [    5.426784]  drm_dev_unregister+0x2e/0x80
> [    5.439005]  drm_dev_unplug+0x21/0x40
> [    5.442645]  simpledrm_remove+0x11/0x20
> [    5.446458]  platform_remove+0x1f/0x40
> [    5.450185]  __device_release_driver+0x17a/0x250
> [    5.454779]  device_release_driver+0x24/0x30
> [    5.459024]  bus_remove_device+0xd8/0x140
> [    5.463012]  device_del+0x18b/0x3f0
> [    5.466478]  ? idr_alloc_cyclic+0x50/0xb0
> [    5.470466]  platform_device_del.part.0+0x13/0x70
> [    5.475146]  platform_device_unregister+0x1c/0x30
> [    5.479824]  drm_aperture_detach_drivers+0xa1/0xd0
> [    5.484593]  drm_aperture_remove_conflicting_pci_framebuffers+0x3f/0x60
> [    5.491179]  radeon_pci_probe+0x54/0xf0 [radeon]
> [    5.495773]  local_pci_probe+0x45/0x80
> [    5.499499]  ? pci_match_device+0xd7/0x130
> [    5.503572]  pci_device_probe+0xc2/0x1e0
> [    5.507474]  really_probe+0x1f5/0x3d0
> [    5.511112]  __driver_probe_device+0xfe/0x180
> [    5.515446]  driver_probe_device+0x1e/0x90
> [    5.519518]  __driver_attach+0xc0/0x1c0
> [    5.523332]  ? __device_attach_driver+0xe0/0xe0
> [    5.527839]  ? __device_attach_driver+0xe0/0xe0
> [    5.532346]  bus_for_each_dev+0x78/0xc0
> [    5.536159]  bus_add_driver+0x149/0x1e0
> [    5.539973]  driver_register+0x8f/0xe0
> [    5.543699]  ? 0xffffffffc0741000
> [    5.546992]  do_one_initcall+0x44/0x200
> [    5.550806]  ? kmem_cache_alloc_trace+0x170/0x2c0
> [    5.555487]  do_init_module+0x4c/0x240
> [    5.559213]  __do_sys_finit_module+0xb4/0x120
> [    5.563547]  __do_fast_syscall_32+0x6b/0xe0
> [    5.567706]  do_fast_syscall_32+0x2f/0x70
> [    5.571693]  entry_SYSCALL_compat_after_hwframe+0x45/0x4d
> [    5.577067] RIP: 0023:0xf7efa549
> [    5.580273] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10
> 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 cd 0f 05 cd
> 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
> [    5.582805] usb usb4: New USB device strings: Mfr=3, Product=2,
> SerialNumber=1
> [    5.598992] RSP: 002b:00000000ff831c0c EFLAGS: 00200296 ORIG_RAX:
> 000000000000015e
> [    5.598996] RAX: ffffffffffffffda RBX: 0000000000000011 RCX:
> 00000000f7ed9e09
> [    5.598998] RDX: 0000000000000000 RSI: 0000000056a5c940 RDI:
> 0000000056a5c4c0
> [    5.598999] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> [    5.635047] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000000
> [    5.642154] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> [    5.649264]  </TASK>
> [    5.651427] Modules linked in: crct10dif_pclmul crc32_pclmul
> crc32c_intel ghash_clmulni_intel snd_hda_codec_realtek
> snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi radeon(+) r8169
> xhci_pci(+) realtek snd_hda_intel drm_ttm_helper snd_intel_dspcfg
> k10temp snd_hda_codec ttm snd_hda_core xhci_hcd snd_pcm sg ohci_hcd
> ehci_pci(+) snd_timer drm_dp_helper snd ehci_hcd soundcore i2c_piix4
> acpi_cpufreq coreboot_table fuse ipv6 autofs4
> [    5.690975] r8169 0000:04:00.0 enp4s0: renamed from eth0
> [    5.691589] ---[ end trace 0000000000000000 ]---
> [    5.704791] RIP: 0010:free_pcp_prepare+0x295/0x400
> [    5.709784] Code: 00 01 00 75 0b 48 8b 45 08 45 31 ff a8 01 74 4b 48
> 8b 45 00 a9 00 00 01 00 75 22 48 c7 c6 68 30 11 96 48 89 ef e8 cb 29 fd
> ff <0f> 0b 48 89 ef 41 83 c6 01 e8 bd f5 ff ff e9 2e fe ff ff 0f 1f 44
> [    5.731535] RSP: 0018:ffffa6634062f9c0 EFLAGS: 00010246
> [    5.752988] usb usb4: Product: xHCI Host Controller
> [    5.758571] usb usb4: Manufacturer: Linux 5.17.0-10753-g1b351a77ed33
> xhci-hcd
> [    5.767096] usb usb4: SerialNumber: 0000:03:00.0
> [    5.772213] hub 4-0:1.0: USB hub found
> [    5.782383] hub 4-0:1.0: 2 ports detected
> [    5.799251] RAX: 000000000000004e RBX: ffffe4be80000000 RCX:
> 0000000000000000
> [    5.810470] RDX: 0000000000000000 RSI: ffffffff96136a37 RDI:
> 00000000ffffffff
> [    5.817561] RBP: ffffe4be840c0000 R08: 0000000000000000 R09:
> 00000000ffffdfff
> [    5.824680] R10: ffffa6634062f7f0 R11: ffffffff9652c4a8 R12:
> 0000000000000000
> [    5.831739] R13: 0000000000000009 R14: ffff91fd02ebc640 R15:
> ffffe4be840c0000
> [    5.839445] FS:  0000000000000000(0000) GS:ffff91fd7b500000(0063)
> knlGS:00000000f7eea800
> [    5.847905] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
> [    5.854025] CR2: 000000005664d26c CR3: 0000000106b60000 CR4:
> 00000000000406e0
> ```
> 
> 
> Kind regards,
> 
> Paul
> 
> 
> PS: For some reason, the lore.kernel.org lists most messages twice [1].
> 
> PPS: I am actually wanted to analyze the new regression, and thought
> your patch might help, but made it worse. ;-) (The log excerpt is from
> Linux master.)
> 
> ```
> [    1.738965] BUG: Bad page state in process systemd-udevd  pfn:103003
> [    1.738974] fbcon: Taking over console
> [    1.740459] page:00000000c3b5c591 refcount:0 mapcount:0
> mapping:0000000
> 000000000 index:0x3 pfn:0x103003
> [    1.740466] head:000000009b49a8e9 order:9 compound_mapcount:0
> compound_
> pincount:0
> [    1.740468] flags: 0x2fffc000010000(head|node=0|zone=2|lastcpupid=0x3ff
> f)
> [    1.740473] raw: 002fffc000000000 fffff139840c0001 fffff139840c00c8 000
> 0000000000000
> [    1.740475] raw: 0000000000000000 0000000000000000 00000000ffffffff 000
> 0000000000000
> [    1.740477] head: 002fffc000010000 0000000000000000 dead000000000122
> 00
> 00000000000000
> [    1.740479] head: 0000000000000000 0000000000000000 00000000ffffffff 00
> 00000000000000
> [    1.740480] page dumped because: corrupted mapping in tail page
> ```
> 
> I am going to do that in another thread.
> 
> [1]:
> https://lore.kernel.org/all/20220317054602.28846-1-
> chuansheng.liu@xxxxxxxxx/




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux