Re: [BUG REPORT][PATCH 1/2] drm/amdgpu: Modify indirect buffer packages for resubmission

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 30, 2023 at 10:25 AM Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> wrote:
>
> On Fri, Jun 30, 2023 at 10:34 PM Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> wrote:
> >
> > On Tue, May 30, 2023 at 06:23:53AM +0000, Zhu, Jiadong wrote:
> > > -----Original Message-----
> > > From: Zhu, Jiadong <Jiadong.Zhu@xxxxxxx>
> > > Sent: Friday, May 26, 2023 9:19 AM
> > > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> > > Cc: Zhu, Jiadong <Jiadong.Zhu@xxxxxxx>
> > > Subject: [PATCH 1/2] drm/amdgpu: Modify indirect buffer packages for resubmission
> > >
> > > From: Jiadong Zhu <Jiadong.Zhu@xxxxxxx>
> > >
> > > When the preempted IB frame resubmitted to cp, we need to modify the frame data including:
> > > 1. set PRE_RESUME 1 in CONTEXT_CONTROL.
> > > 2. use meta data(DE and CE) read from CSA in WRITE_DATA.
> > >
> > > Add functions to save the location the first time IBs emitted and callback to patch the package when resubmission happens.
> > >
> > > Signed-off-by: Jiadong Zhu <Jiadong.Zhu@xxxxxxx>
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c     | 18 ++++++
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h     |  9 +++
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 60 ++++++++++++++++++++  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h | 15 +++++
> > >  4 files changed, 102 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> >
> > <...snip...>
> >
> > > +void amdgpu_ring_mux_ib_mark_offset(struct amdgpu_ring_mux *mux,
> > > +                                   struct amdgpu_ring *ring, u64 offset,
> > > +                                   enum amdgpu_ring_mux_offset_type type) {
> > > +       struct amdgpu_mux_entry *e;
> > > +       struct amdgpu_mux_chunk *chunk;
> > > +
> > > +       e = amdgpu_ring_mux_sw_entry(mux, ring);
> > > +       if (!e) {
> > > +               DRM_ERROR("cannot find entry!\n");
> > > +               return;
> > > +       }
> > > +
> > > +       chunk = list_last_entry(&e->list, struct amdgpu_mux_chunk, entry);
> > > +       if (!chunk) {
> > > +               DRM_ERROR("cannot find chunk!\n");
> > > +               return;
> > > +       }
> > > +
> > > +       switch (type) {
> > > +       case AMDGPU_MUX_OFFSET_TYPE_CONTROL:
> > > +               chunk->cntl_offset = offset;
> > > +               break;
> >
> > Hello folks,
>
> (+Cc'ing Alex Deucher)
>
> > While booting my laptop I just observed a slab out of bounds error from KASAN on this line.
>
> sorry for confusion, the offending function is not the function above
> (amdgpu_ring_mux_ib_mark_offset) but amdgpu_sw_ring_ib_mark_offset()
>
> This was tested on top of commit e55e5df193d247a38a
> (" csky: fix up lock_mm_and_find_vma() conversion")
> so you can check the line numbers on top of the commit :)
>
> let me know if you need more information.

I believe this was fixed in this patch:
https://gitlab.freedesktop.org/agd5f/linux/-/commit/ef3c36a6e025e9b16ca3321479ba016841fa17a0

Alex

> thanks!
>
> > This splat disappears after reverting this whole series.
> > config file is attached.
> >
> > Thanks!
> >
> > [   18.223441] ==================================================================
> > [   18.223444] BUG: KASAN: slab-out-of-bounds in amdgpu_sw_ring_ib_mark_offset (drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c:503 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c:428) amdgpu
> > [   18.224601] Write of size 8 at addr ffff8881208eb898 by task kworker/8:1/104
> >
> > [   18.224611] Hardware name: LENOVO 21DL/LNVNB161216, BIOS JPCN20WW(V1.06) 09/20/2022
> > [   18.224614] Workqueue: events amdgpu_device_delayed_init_work_handler [amdgpu]
> > [   18.225558] Call Trace:
> > [   18.225561]  <TASK>
> > [   18.225563] dump_stack_lvl (lib/dump_stack.c:108)
> > [   18.225572] print_report (mm/kasan/report.c:365 (discriminator 1) mm/kasan/report.c:475 (discriminator 1))
> > [   18.225579] ? __virt_addr_valid (./include/linux/mmzone.h:1908 (discriminator 1) ./include/linux/mmzone.h:2004 (discriminator 1) arch/x86/mm/physaddr.c:65 (discriminator 1))
> > [   18.225585] ? amdgpu_sw_ring_ib_mark_offset (drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c:503 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c:428) amdgpu
> > [   18.226530] kasan_report (mm/kasan/report.c:590)
> > [   18.226535] ? amdgpu_sw_ring_ib_mark_offset (drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c:503 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c:428) amdgpu
> > [   18.227478] amdgpu_sw_ring_ib_mark_offset (drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c:503 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c:428) amdgpu
> > [   18.228426] gfx_v9_0_ring_emit_ib_gfx (drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h:373 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:5169) amdgpu
> > [   18.229397] amdgpu_ib_schedule (drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c:252 (discriminator 1)) amdgpu
> > [   18.230342] gfx_v9_0_ring_test_ib (drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1052) amdgpu
> > [   18.231295] ? __pfx_gfx_v9_0_ring_test_ib (drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1023) amdgpu
> > [   18.232251] ? lock_acquire (kernel/locking/lockdep.c:467 (discriminator 4) kernel/locking/lockdep.c:5763 (discriminator 4) kernel/locking/lockdep.c:5726 (discriminator 4))
> > [   18.232259] ? __pfx_lock_acquire (kernel/locking/lockdep.c:5729)
> > [   18.232263] ? __pfx_lock_release (kernel/locking/lockdep.c:5769)
> > [   18.232267] amdgpu_ib_ring_tests (drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c:436 (discriminator 1)) amdgpu
> > [   18.233219] ? __pfx_gfx_v9_0_ring_test_ib (drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1023) amdgpu
> > [   18.234180] amdgpu_device_delayed_init_work_handler (drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3061) amdgpu
> > [   18.235110] process_one_work (kernel/workqueue.c:2602)
> > [   18.235119] ? __pfx_process_one_work (kernel/workqueue.c:2491)
> > [   18.235123] ? lock_acquired (kernel/locking/lockdep.c:6009 kernel/locking/lockdep.c:6042)
> > [   18.235130] worker_thread (./include/linux/list.h:292 (discriminator 2) kernel/workqueue.c:2749 (discriminator 2))
> > [   18.235136] ? __pfx_worker_thread (kernel/workqueue.c:2691)
> > [   18.235140] kthread (kernel/kthread.c:389)
> > [   18.235144] ? __pfx_kthread (kernel/kthread.c:342)
> > [   18.235148] ret_from_fork (arch/x86/entry/entry_64.S:314)
> > [   18.235155]  </TASK>
> >
> > [   18.235158] Allocated by task 456:
> > [   18.235160] kasan_save_stack (mm/kasan/common.c:46)
> > [   18.235163] kasan_set_track (mm/kasan/common.c:52 (discriminator 1))
> > [   18.235166] __kasan_kmalloc (mm/kasan/common.c:374 mm/kasan/common.c:383)
> > [   18.235168] __kmalloc (mm/slab_common.c:986 mm/slab_common.c:998)
> > [   18.235172] amdgpu_ring_mux_init (drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c:155 (discriminator 1)) amdgpu
> > [   18.236112] gfx_v9_0_sw_init (drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:2128) amdgpu
> > [   18.237083] amdgpu_device_init (drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:2516 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3993) amdgpu
> > [   18.238021] amdgpu_driver_load_kms (drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:146) amdgpu
> > [   18.238948] amdgpu_pci_probe (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2188) amdgpu
> > [   18.239904] local_pci_probe (drivers/pci/pci-driver.c:325)
> > [   18.239911] pci_device_probe (drivers/pci/pci-driver.c:392 (discriminator 1) drivers/pci/pci-driver.c:417 (discriminator 1) drivers/pci/pci-driver.c:460 (discriminator 1))
> > [   18.239915] really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658)
> > [   18.239920] __driver_probe_device (drivers/base/dd.c:798)
> > [   18.239923] driver_probe_device (drivers/base/dd.c:828)
> > [   18.239926] __driver_attach (drivers/base/dd.c:1215)
> > [   18.239928] bus_for_each_dev (drivers/base/bus.c:368)
> > [   18.239931] bus_add_driver (drivers/base/bus.c:674)
> > [   18.239934] driver_register (drivers/base/driver.c:246)
> > [   18.239938] do_one_initcall (init/main.c:1232)
> > [   18.239942] do_init_module (kernel/module/main.c:2522)
> > [   18.239947] load_module (kernel/module/main.c:2973)
> > [   18.239950] init_module_from_file (kernel/module/main.c:3149)
> > [   18.239953] __x64_sys_finit_module (./include/linux/file.h:44 kernel/module/main.c:3172 kernel/module/main.c:3154 kernel/module/main.c:3154)
> > [   18.239956] do_syscall_64 (arch/x86/entry/common.c:50 (discriminator 1) arch/x86/entry/common.c:80 (discriminator 1))
> > [   18.239960] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
> >
> > [   18.239967] The buggy address belongs to the object at ffff8881208eb800
> > which belongs to the cache kmalloc-128 of size 128
> > [   18.239969] The buggy address is located 24 bytes to the right of
> > allocated 128-byte region [ffff8881208eb800, ffff8881208eb880)
> >
> > [   18.239973] The buggy address belongs to the physical page:
> > [   18.239975] page:000000008ed9aff4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1208ea
> > [   18.239979] head:000000008ed9aff4 order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> > [   18.239981] flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> > [   18.239986] page_type: 0xffffffff()
> > [   18.239990] raw: 0017ffffc0010200 ffff8881000428c0 dead000000000122 0000000000000000
> > [   18.239992] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
> > [   18.239994] page dumped because: kasan: bad access detected
> >
> > [   18.239997] Memory state around the buggy address:
> > [   18.239999]  ffff8881208eb780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > [   18.240001]  ffff8881208eb800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > [   18.240003] >ffff8881208eb880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > [   18.240005]                             ^
> > [   18.240007]  ffff8881208eb900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc
> > [   18.240009]  ffff8881208eb980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > [   18.240010] ==================================================================
> >
> >
> > --
> > Hyeonggon Yoo
> >
> > Undergraduate | Chungnam National University




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux