Hi! On Fri, 2025-02-14 at 11:14 -0500, Demi Marie Obenour wrote: > On Fri, Feb 14, 2025 at 09:47:13AM +0100, Thomas Hellström wrote: > > Hi > > > > On Thu, 2025-02-13 at 16:23 -0500, Demi Marie Obenour wrote: > > > On Wed, Feb 12, 2025 at 06:10:40PM -0800, Matthew Brost wrote: > > > > Version 5 of GPU SVM. Thanks to everyone (especially Sima, > > > > Thomas, > > > > Alistair, Himal) for their numerous reviews on revision 1, 2, > > > > 3 > > > > and for > > > > helping to address many design issues. > > > > > > > > This version has been tested with IGT [1] on PVC, BMG, and LNL. > > > > Also > > > > tested with level0 (UMD) PR [2]. > > > > > > What is the plan to deal with not being able to preempt while a > > > page > > > fault is pending? This seems like an easy DoS vector. My > > > understanding > > > is that SVM is mostly used by compute workloads on headless > > > systems. > > > Recent AMD client GPUs don't support SVM, so programs that want > > > to > > > run > > > on client systems should not require SVM if they wish to be > > > portable. > > > > > > Given the potential for abuse, I think it would be best to > > > require > > > explicit administrator opt-in to enable SVM, along with possibly > > > having > > > a timeout to resolve a page fault (after which the context is > > > killed). > > > Since I expect most uses of SVM to be in the datacenter space > > > (for > > > the > > > reasons mentioned above), I don't believe this will be a major > > > limitation in practice. Programs that wish to run on client > > > systems > > > already need to use explicit memory transfer or pinned userptr, > > > and > > > administrators of compute clusters should be willing to enable > > > this > > > feature because only one workload will be using a GPU at a time. > > > > While not directly having addressed the potential DoS issue you > > mention, there is an associated deadlock possibility that may > > happen > > due to not being able to preempt a pending pagefault. That is if a > > dma- > > fence job is requiring the same resources held up by the pending > > page- > > fault, and then the pagefault servicing is dependent on that dma- > > fence > > to be signaled in one way or another. > > > > That deadlock is handled by only allowing either page-faulting jobs > > or > > dma-fence jobs on a resource (hw engine or hw engine group) that > > can be > > used by both at a time, blocking synchronously in the exec IOCTL > > until > > the resource is available for the job type. That means LR jobs > > waits > > for all dma-fence jobs to complete, and dma-fence jobs wait for all > > LR > > jobs to preempt. So a dma-fence job wait could easily mean "wait > > for > > all outstanding pagefaults to be serviced". > > > > Whether, on the other hand, that is a real DoS we need to care > > about, > > is probably a topic for debate. The directions we've had so far are > > that it's not. Nothing is held up indefinitely, what's held up can > > be > > Ctrl-C'd by the user and core mm memory management is not blocked > > since > > mmu_notifiers can execute to completion and shrinkers / eviction > > can > > execute while a page-fault is pending. > > The problem is that a program that uses a page-faulting job can lock > out > all other programs on the system from using the GPU for an indefinite > period of time. In a GUI session, this means a frozen UI, which > makes > recovery basically impossible without drastic measures (like > rebooting > or logging in over SSH). That counts as a quite effective denial of > service from an end-user perspective, and unless I am mistaken it > would > be very easy to trigger by accident: just start a page-faulting job > that > loops forever. I think the easiest remedy for this is that if a page-faulting job is either by purpose or mistake crafted in such a way that it holds up preemption when preemption is needed (like in the case I described, a dma-fence job is submitted) the driver will hit a preemption timeout and kill the pagefaulting job. (I think that is already handled in all cases in the xe driver but I would need to double check). So this would then boil down to the system administrator configuring the preemption timeout. Thanks, Thomas