Hi On Thu, 2025-02-13 at 16:23 -0500, Demi Marie Obenour wrote: > On Wed, Feb 12, 2025 at 06:10:40PM -0800, Matthew Brost wrote: > > Version 5 of GPU SVM. Thanks to everyone (especially Sima, Thomas, > > Alistair, Himal) for their numerous reviews on revision 1, 2, 3 > > and for > > helping to address many design issues. > > > > This version has been tested with IGT [1] on PVC, BMG, and LNL. > > Also > > tested with level0 (UMD) PR [2]. > > What is the plan to deal with not being able to preempt while a page > fault is pending? This seems like an easy DoS vector. My > understanding > is that SVM is mostly used by compute workloads on headless systems. > Recent AMD client GPUs don't support SVM, so programs that want to > run > on client systems should not require SVM if they wish to be portable. > > Given the potential for abuse, I think it would be best to require > explicit administrator opt-in to enable SVM, along with possibly > having > a timeout to resolve a page fault (after which the context is > killed). > Since I expect most uses of SVM to be in the datacenter space (for > the > reasons mentioned above), I don't believe this will be a major > limitation in practice. Programs that wish to run on client systems > already need to use explicit memory transfer or pinned userptr, and > administrators of compute clusters should be willing to enable this > feature because only one workload will be using a GPU at a time. While not directly having addressed the potential DoS issue you mention, there is an associated deadlock possibility that may happen due to not being able to preempt a pending pagefault. That is if a dma- fence job is requiring the same resources held up by the pending page- fault, and then the pagefault servicing is dependent on that dma-fence to be signaled in one way or another. That deadlock is handled by only allowing either page-faulting jobs or dma-fence jobs on a resource (hw engine or hw engine group) that can be used by both at a time, blocking synchronously in the exec IOCTL until the resource is available for the job type. That means LR jobs waits for all dma-fence jobs to complete, and dma-fence jobs wait for all LR jobs to preempt. So a dma-fence job wait could easily mean "wait for all outstanding pagefaults to be serviced". Whether, on the other hand, that is a real DoS we need to care about, is probably a topic for debate. The directions we've had so far are that it's not. Nothing is held up indefinitely, what's held up can be Ctrl-C'd by the user and core mm memory management is not blocked since mmu_notifiers can execute to completion and shrinkers / eviction can execute while a page-fault is pending. Thanks, Thomas