On Sun, Feb 7, 2021 at 10:28 PM Andrey Grodzovsky <Andrey.Grodzovsky@xxxxxxx> wrote: > > > > On 2/5/21 5:10 PM, Daniel Vetter wrote: > > On Fri, Feb 5, 2021 at 5:22 PM Andrey Grodzovsky > > <Andrey.Grodzovsky@xxxxxxx> wrote: > >> > >> Daniel, ping. Also, please refer to the other thread with Bjorn from pci-dev > >> on the same topic I added you to. > > > > Summarizing my take over there for here plus maybe some more > > clarification. There's two problems: > > > > - You must guarantee that after the ->remove callback of your driver > > is finished, there's no more mmio or any other hw access. A > > combination of stopping stuff and drm_dev_enter/exit can help with > > that. This prevents the use-after-free issue. > > > > - For the actual hotunplug time, i.e. anything that can run while your > > driver is used up to the point where ->remove callback has finished > > stopp hw access you must guarantee that code doesn't blow up when it > > gets bogus reads (in the form of 0xff values). drm_dev_enter/exit > > can't help you with that. Plus you should make sure that we're not > > spending forever waiting for a big pile of mmio access all to time out > > because you never bail out - some coarse-grained drm_dev_enter/exit > > might help here. > > > > Plus finally the userspace access problem: You must guarantee that > > after ->remove has finished that none of the uapi or cross-driver > > access points (driver ioctl, dma-buf, dma-fence, anything else that > > hangs around) can reach the data structures/memory mappings/whatever > > which have been released as part of your ->remove callback. > > drm_dev_enter/exit is again the tool of choice here. > > > > So you have to use drm_dev_enter/exit for some of the problems we face > > on hotunplug, but it's not the tool that can handle the actual hw > > hotunplug race conditions for you. > > > > Unfortunately the hw hotunplug race condition is an utter pain to > > test, since essentially you need to validate your driver against > > spurious 0xff reads at any moment. And I don't even have a clever idea > > to simulate this, e.g. by forcefully replacing the iobar mapping: What > > we'd need is a mapping that allows reads (so we can fill a page with > > 0xff and use that everywhere), but instead of rejecting writes, allows > > them, but drops them (so that the 0xff stays intact). Maybe we could > > simulate this with some kernel debug tricks (kinda like mmiotrace) > > with a read-only mapping and dropping every write every time we fault. > > Clarification - as far as I know there are no page fault handlers for kernel > mappings. And we are talking about kernel mappings here, right ? If there were > I could solve all those issues the same as I do for user mappings, by > invalidating all existing mappings in the kernel (both kmaps and ioreamps)and > insert dummy zero or ~0 filled page instead. > Also, I assume forcefully remapping the IO BAR to ~0 filled page would involve > ioremap API and it's not something that I think can be easily done according to > am answer i got to a related topic a few weeks ago > https://www.spinics.net/lists/linux-pci/msg103396.html (that was the only reply > i got) mmiotrace can, but only for debug, and only on x86 platforms: https://www.kernel.org/doc/html/latest/trace/mmiotrace.html Should be feasible (but maybe not worth the effort) to extend this to support fake unplug. > > > But ugh ... > > > > Otoh validating an entire driver like amdgpu without such a trick > > against 0xff reads is practically impossible. So maybe you need to add > > this as one of the tasks here? > > Or I could just for validation purposes return ~0 from all reg reads in the code > and ignore writes if drm_dev_unplugged, this could already easily validate a big > portion of the code flow under such scenario. Hm yeah if your really wrap them all, that should work too. Since iommappings have __iomem pointer type, as long as amdgpu is sparse warning free, should be doable to guarantee this. -Daniel > Andrey > > > -Daniel > > > >> > >> Andrey > >> > >> On 1/29/21 2:25 PM, Christian König wrote: > >>> Am 29.01.21 um 18:35 schrieb Andrey Grodzovsky: > >>>> > >>>> On 1/29/21 10:16 AM, Christian König wrote: > >>>>> Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky: > >>>>>> > >>>>>> On 1/19/21 1:59 PM, Christian König wrote: > >>>>>>> Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky: > >>>>>>>> > >>>>>>>> On 1/19/21 1:05 PM, Daniel Vetter wrote: > >>>>>>>>> [SNIP] > >>>>>>>> So say writing in a loop to some harmless scratch register for many times > >>>>>>>> both for plugged > >>>>>>>> and unplugged case and measure total time delta ? > >>>>>>> > >>>>>>> I think we should at least measure the following: > >>>>>>> > >>>>>>> 1. Writing X times to a scratch reg without your patch. > >>>>>>> 2. Writing X times to a scratch reg with your patch. > >>>>>>> 3. Writing X times to a scratch reg with the hardware physically disconnected. > >>>>>>> > >>>>>>> I suggest to repeat that once for Polaris (or older) and once for Vega or > >>>>>>> Navi. > >>>>>>> > >>>>>>> The SRBM on Polaris is meant to introduce some delay in each access, so it > >>>>>>> might react differently then the newer hardware. > >>>>>>> > >>>>>>> Christian. > >>>>>> > >>>>>> > >>>>>> See attached results and the testing code. Ran on Polaris (gfx8) and > >>>>>> Vega10(gfx9) > >>>>>> > >>>>>> In summary, over 1 million WWREG32 in loop with and without this patch you > >>>>>> get around 10ms of accumulated overhead ( so 0.00001 millisecond penalty for > >>>>>> each WWREG32) for using drm_dev_enter check when writing registers. > >>>>>> > >>>>>> P.S Bullet 3 I cannot test as I need eGPU and currently I don't have one. > >>>>> > >>>>> Well if I'm not completely mistaken that are 100ms of accumulated overhead. > >>>>> So around 100ns per write. And even bigger problem is that this is a ~67% > >>>>> increase. > >>>> > >>>> > >>>> My bad, and 67% from what ? How u calculate ? > >>> > >>> My bad, (308501-209689)/209689=47% increase. > >>> > >>>>> > >>>>> I'm not sure how many write we do during normal operation, but that sounds > >>>>> like a bit much. Ideas? > >>>> > >>>> Well, u suggested to move the drm_dev_enter way up but as i see it the problem > >>>> with this is that it increase the chance of race where the > >>>> device is extracted after we check for drm_dev_enter (there is also such > >>>> chance even when it's placed inside WWREG but it's lower). > >>>> Earlier I propsed that instead of doing all those guards scattered all over > >>>> the code simply delay release of system memory pages and unreserve of > >>>> MMIO ranges to until after the device itself is gone after last drm device > >>>> reference is dropped. But Daniel opposes delaying MMIO ranges unreserve to after > >>>> PCI remove code because according to him it will upset the PCI subsytem. > >>> > >>> Yeah, that's most likely true as well. > >>> > >>> Maybe Daniel has another idea when he's back from vacation. > >>> > >>> Christian. > >>> > >>>> > >>>> Andrey > >>>> > >>>>> > >>>>> Christian. > >>>> _______________________________________________ > >>>> amd-gfx mailing list > >>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx > >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C7810d8d6f03443ce2e0408d8ca22ea99%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481598615581693%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zTV6FTpL3titmMTVEPxxVT8e5lTKVsLViwZudEsNn%2Bw%3D&reserved=0 > >>>> > >>> > > > > > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx