Re: [PATCH] fbdev: defio: fix the pagelist corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Chuansheng,


Am 28.03.22 um 02:58 schrieb Liu, Chuansheng:

-----Original Message-----

Sent: Saturday, March 26, 2022 4:11 PM

Am 17.03.22 um 06:46 schrieb Chuansheng Liu:
Easily hit the below list corruption:
==
list_add corruption. prev->next should be next (ffffffffc0ceb090), but
was ffffec604507edc8. (prev=ffffec604507edc8).
WARNING: CPU: 65 PID: 3959 at lib/list_debug.c:26
__list_add_valid+0x53/0x80
CPU: 65 PID: 3959 Comm: fbdev Tainted: G     U
RIP: 0010:__list_add_valid+0x53/0x80
Call Trace:
   <TASK>
   fb_deferred_io_mkwrite+0xea/0x150
   do_page_mkwrite+0x57/0xc0
   do_wp_page+0x278/0x2f0
   __handle_mm_fault+0xdc2/0x1590
   handle_mm_fault+0xdd/0x2c0
   do_user_addr_fault+0x1d3/0x650
   exc_page_fault+0x77/0x180
   ? asm_exc_page_fault+0x8/0x30
   asm_exc_page_fault+0x1e/0x30
RIP: 0033:0x7fd98fc8fad1
==

Figure out the race happens when one process is adding &page->lru into
the pagelist tail in fb_deferred_io_mkwrite(), another process is
re-initializing the same &page->lru in fb_deferred_io_fault(), which is
not protected by the lock.

This fix is to init all the page lists one time during initialization,
it not only fixes the list corruption, but also avoids INIT_LIST_HEAD()
redundantly.

Fixes: 105a940416fc ("fbdev/defio: Early-out if page is already enlisted")
Cc: Thomas Zimmermann <tzimmermann@xxxxxxx>
Signed-off-by: Chuansheng Liu <chuansheng.liu@xxxxxxxxx>
---
   drivers/video/fbdev/core/fb_defio.c | 9 ++++++++-
   1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
index 98b0f23bf5e2..eafb66ca4f28 100644
--- a/drivers/video/fbdev/core/fb_defio.c
+++ b/drivers/video/fbdev/core/fb_defio.c
@@ -59,7 +59,6 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault *vmf)
   		printk(KERN_ERR "no mapping available\n");

   	BUG_ON(!page->mapping);
-	INIT_LIST_HEAD(&page->lru);
   	page->index = vmf->pgoff;

   	vmf->page = page;
@@ -220,6 +219,8 @@ static void fb_deferred_io_work(struct work_struct *work)
   void fb_deferred_io_init(struct fb_info *info)
   {
   	struct fb_deferred_io *fbdefio = info->fbdefio;
+	struct page *page;
+	int i;

   	BUG_ON(!fbdefio);
   	mutex_init(&fbdefio->lock);
@@ -227,6 +228,12 @@ void fb_deferred_io_init(struct fb_info *info)
   	INIT_LIST_HEAD(&fbdefio->pagelist);
   	if (fbdefio->delay == 0) /* set a default of 1 s */
   		fbdefio->delay = HZ;
+
+	/* initialize all the page lists one time */
+	for (i = 0; i < info->fix.smem_len; i += PAGE_SIZE) {
+		page = fb_deferred_io_page(info, i);
+		INIT_LIST_HEAD(&page->lru);
+	}
   }
   EXPORT_SYMBOL_GPL(fb_deferred_io_init);

Applying your patch on top of current Linus’ master branch, tty0 is
unusable and looks frozen. Sometimes network card still works, sometimes
not.

I don't see how the patch would cause below BUG call stack, need some time to
debug. Just few comments:
1. Will the system work well without this patch?

Yes, the framebuffer works well without the patch.

2. When you are sure the patch causes the regression you saw, please get free to submit
one reverted patch, thanks : )

I think you for patch wasn’t submitted yet – at least not pulled by Linus.

      $ git log --oneline -nodecorate -2
      1b351a77ed33 (HEAD -> linus) fbdev: defio: fix the pagelist corruption
      52d543b5497c (origin/master, origin/HEAD) Merge tag 'for-linus-5.17-1' of https://github.com/cminyard/linux-ipmi

```
[    5.256996] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[    5.269582] page dumped because: VM_BUG_ON_PAGE(compound && compound_order(page) != order)
[    5.279507] ------------[ cut here ]------------
[    5.286406] kernel BUG at mm/page_alloc.c:1326!
[    5.291814] invalid opcode: 0000 [#1] PREEMPT SMP
[    5.296350] CPU: 0 PID: 167 Comm: systemd-udevd Not tainted 5.17.0-10753-g1b351a77ed33 #300
[    5.304670] Hardware name: ASUS F2A85-M_PRO/F2A85-M_PRO, BIOS 4.16-337-gb87986e67b 03/25/2022
[    5.313163] RIP: 0010:free_pcp_prepare+0x295/0x400
[    5.317930] Code: 00 01 00 75 0b 48 8b 45 08 45 31 ff a8 01 74 4b 48 8b 45 00 a9 00 00 01 00 75 22 48 c7 c6 68 30 11 96 48 89 ef e8 cb 29 fd ff <0f> 0b 48 89 ef 41 83 c6 01 e8 bd f5 ff ff e9 2e fe ff ff 0f 1f 44
[    5.336650] RSP: 0018:ffffa6634062f9c0 EFLAGS: 00010246
[    5.341849] RAX: 000000000000004e RBX: ffffe4be80000000 RCX: 0000000000000000
[    5.348957] RDX: 0000000000000000 RSI: ffffffff96136a37 RDI: 00000000ffffffff
[    5.356063] RBP: ffffe4be840c0000 R08: 0000000000000000 R09: 00000000ffffdfff
[    5.363170] R10: ffffa6634062f7f0 R11: ffffffff9652c4a8 R12: 0000000000000000
[    5.370277] R13: 0000000000000009 R14: ffff91fd02ebc640 R15: ffffe4be840c0000
[    5.377384] FS:  0000000000000000(0000) GS:ffff91fd7b400000(0063) knlGS:00000000f7eea800
[    5.385443] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[    5.391164] CR2: 00000000f6f0e840 CR3: 0000000106b60000 CR4: 00000000000406f0
[    5.398272] Call Trace:
[    5.400697]  <TASK>
[    5.402778]  free_unref_page+0x1b/0xf0
[    5.406505]  __vunmap+0x216/0x2c0
[    5.409798]  drm_fbdev_cleanup+0x5f/0xb0
[    5.413698]  drm_fbdev_fb_destroy+0x15/0x30
[    5.417857]  unregister_framebuffer+0x2c/0x40
[    5.422191]  drm_client_dev_unregister+0x69/0xe0
[    5.422962] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.17
[    5.426784]  drm_dev_unregister+0x2e/0x80
[    5.439005]  drm_dev_unplug+0x21/0x40
[    5.442645]  simpledrm_remove+0x11/0x20
[    5.446458]  platform_remove+0x1f/0x40
[    5.450185]  __device_release_driver+0x17a/0x250
[    5.454779]  device_release_driver+0x24/0x30
[    5.459024]  bus_remove_device+0xd8/0x140
[    5.463012]  device_del+0x18b/0x3f0
[    5.466478]  ? idr_alloc_cyclic+0x50/0xb0
[    5.470466]  platform_device_del.part.0+0x13/0x70
[    5.475146]  platform_device_unregister+0x1c/0x30
[    5.479824]  drm_aperture_detach_drivers+0xa1/0xd0
[    5.484593]  drm_aperture_remove_conflicting_pci_framebuffers+0x3f/0x60
[    5.491179]  radeon_pci_probe+0x54/0xf0 [radeon]
[    5.495773]  local_pci_probe+0x45/0x80
[    5.499499]  ? pci_match_device+0xd7/0x130
[    5.503572]  pci_device_probe+0xc2/0x1e0
[    5.507474]  really_probe+0x1f5/0x3d0
[    5.511112]  __driver_probe_device+0xfe/0x180
[    5.515446]  driver_probe_device+0x1e/0x90
[    5.519518]  __driver_attach+0xc0/0x1c0
[    5.523332]  ? __device_attach_driver+0xe0/0xe0
[    5.527839]  ? __device_attach_driver+0xe0/0xe0
[    5.532346]  bus_for_each_dev+0x78/0xc0
[    5.536159]  bus_add_driver+0x149/0x1e0
[    5.539973]  driver_register+0x8f/0xe0
[    5.543699]  ? 0xffffffffc0741000
[    5.546992]  do_one_initcall+0x44/0x200
[    5.550806]  ? kmem_cache_alloc_trace+0x170/0x2c0
[    5.555487]  do_init_module+0x4c/0x240
[    5.559213]  __do_sys_finit_module+0xb4/0x120
[    5.563547]  __do_fast_syscall_32+0x6b/0xe0
[    5.567706]  do_fast_syscall_32+0x2f/0x70
[    5.571693]  entry_SYSCALL_compat_after_hwframe+0x45/0x4d
[    5.577067] RIP: 0023:0xf7efa549
[    5.580273] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 cd 0f 05 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
[    5.582805] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    5.598992] RSP: 002b:00000000ff831c0c EFLAGS: 00200296 ORIG_RAX: 000000000000015e
[    5.598996] RAX: ffffffffffffffda RBX: 0000000000000011 RCX: 00000000f7ed9e09
[    5.598998] RDX: 0000000000000000 RSI: 0000000056a5c940 RDI: 0000000056a5c4c0
[    5.598999] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[    5.635047] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    5.642154] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    5.649264]  </TASK>
[    5.651427] Modules linked in: crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi radeon(+) r8169 xhci_pci(+) realtek snd_hda_intel drm_ttm_helper snd_intel_dspcfg k10temp snd_hda_codec ttm snd_hda_core xhci_hcd snd_pcm sg ohci_hcd ehci_pci(+) snd_timer drm_dp_helper snd ehci_hcd soundcore i2c_piix4 acpi_cpufreq coreboot_table fuse ipv6 autofs4
[    5.690975] r8169 0000:04:00.0 enp4s0: renamed from eth0
[    5.691589] ---[ end trace 0000000000000000 ]---
[    5.704791] RIP: 0010:free_pcp_prepare+0x295/0x400
[    5.709784] Code: 00 01 00 75 0b 48 8b 45 08 45 31 ff a8 01 74 4b 48 8b 45 00 a9 00 00 01 00 75 22 48 c7 c6 68 30 11 96 48 89 ef e8 cb 29 fd ff <0f> 0b 48 89 ef 41 83 c6 01 e8 bd f5 ff ff e9 2e fe ff ff 0f 1f 44
[    5.731535] RSP: 0018:ffffa6634062f9c0 EFLAGS: 00010246
[    5.752988] usb usb4: Product: xHCI Host Controller
[    5.758571] usb usb4: Manufacturer: Linux 5.17.0-10753-g1b351a77ed33 xhci-hcd
[    5.767096] usb usb4: SerialNumber: 0000:03:00.0
[    5.772213] hub 4-0:1.0: USB hub found
[    5.782383] hub 4-0:1.0: 2 ports detected
[    5.799251] RAX: 000000000000004e RBX: ffffe4be80000000 RCX: 0000000000000000
[    5.810470] RDX: 0000000000000000 RSI: ffffffff96136a37 RDI: 00000000ffffffff
[    5.817561] RBP: ffffe4be840c0000 R08: 0000000000000000 R09: 00000000ffffdfff
[    5.824680] R10: ffffa6634062f7f0 R11: ffffffff9652c4a8 R12: 0000000000000000
[    5.831739] R13: 0000000000000009 R14: ffff91fd02ebc640 R15: ffffe4be840c0000
[    5.839445] FS:  0000000000000000(0000) GS:ffff91fd7b500000(0063) knlGS:00000000f7eea800
[    5.847905] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[    5.854025] CR2: 000000005664d26c CR3: 0000000106b60000 CR4: 00000000000406e0
```

PS: For some reason, the lore.kernel.org lists most messages twice [1].

PPS: I am actually wanted to analyze the new regression, and thought
your patch might help, but made it worse. ;-) (The log excerpt is from
Linux master.)

```
[    1.738965] BUG: Bad page state in process systemd-udevd  pfn:103003
[    1.738974] fbcon: Taking over console
[    1.740459] page:00000000c3b5c591 refcount:0 mapcount:0 mapping:0000000 000000000 index:0x3 pfn:0x103003
[    1.740466] head:000000009b49a8e9 order:9 compound_mapcount:0 compound_pincount:0
[    1.740468] flags: 0x2fffc000010000(head|node=0|zone=2|lastcpupid=0x3ff f)
[    1.740473] raw: 002fffc000000000 fffff139840c0001 fffff139840c00c8 000 0000000000000
[    1.740475] raw: 0000000000000000 0000000000000000 00000000ffffffff 000 0000000000000
[    1.740477] head: 002fffc000010000 0000000000000000 dead000000000122 00 00000000000000
[    1.740479] head: 0000000000000000 0000000000000000 00000000ffffffff 00 00000000000000
[    1.740480] page dumped because: corrupted mapping in tail page
```

I am going to do that in another thread.

This is [2].


Kind regards,

Paul


[1]: https://lore.kernel.org/all/20220317054602.28846-1-chuansheng.liu@xxxxxxxxx/
[2]: https://lore.kernel.org/bpf/7edcd673-decf-7b4e-1f6e-f2e0e26f757a@xxxxxxxxxxxxx/



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux