Re: [PATCH v5 00/32] Introduce GPU SVM and Xe SVM implementation

Demi Marie Obenour <demi@xxxxxxxxxxxxxxxxxxxxxx> · Fri, 14 Feb 2025 13:36:10 -0500



On Fri, Feb 14, 2025 at 05:26:48PM +0100, Thomas Hellström wrote:
> Hi!
> 
> On Fri, 2025-02-14 at 11:14 -0500, Demi Marie Obenour wrote:
> > On Fri, Feb 14, 2025 at 09:47:13AM +0100, Thomas Hellström wrote:
> > > Hi
> > > 
> > > On Thu, 2025-02-13 at 16:23 -0500, Demi Marie Obenour wrote:
> > > > On Wed, Feb 12, 2025 at 06:10:40PM -0800, Matthew Brost wrote:
> > > > > Version 5 of GPU SVM. Thanks to everyone (especially Sima,
> > > > > Thomas,
> > > > > Alistair, Himal) for their numerous reviews on revision 1, 2,
> > > > > 3 
> > > > > and for
> > > > > helping to address many design issues.
> > > > > 
> > > > > This version has been tested with IGT [1] on PVC, BMG, and LNL.
> > > > > Also
> > > > > tested with level0 (UMD) PR [2].
> > > > 
> > > > What is the plan to deal with not being able to preempt while a
> > > > page
> > > > fault is pending?  This seems like an easy DoS vector.  My
> > > > understanding
> > > > is that SVM is mostly used by compute workloads on headless
> > > > systems.
> > > > Recent AMD client GPUs don't support SVM, so programs that want
> > > > to
> > > > run
> > > > on client systems should not require SVM if they wish to be
> > > > portable.
> > > > 
> > > > Given the potential for abuse, I think it would be best to
> > > > require
> > > > explicit administrator opt-in to enable SVM, along with possibly
> > > > having
> > > > a timeout to resolve a page fault (after which the context is
> > > > killed).
> > > > Since I expect most uses of SVM to be in the datacenter space
> > > > (for
> > > > the
> > > > reasons mentioned above), I don't believe this will be a major
> > > > limitation in practice.  Programs that wish to run on client
> > > > systems
> > > > already need to use explicit memory transfer or pinned userptr,
> > > > and
> > > > administrators of compute clusters should be willing to enable
> > > > this
> > > > feature because only one workload will be using a GPU at a time.
> > > 
> > > While not directly having addressed the potential DoS issue you
> > > mention, there is an associated deadlock possibility that may
> > > happen
> > > due to not being able to preempt a pending pagefault. That is if a
> > > dma-
> > > fence job is requiring the same resources held up by the pending
> > > page-
> > > fault, and then the pagefault servicing is dependent on that dma-
> > > fence
> > > to be signaled in one way or another.
> > > 
> > > That deadlock is handled by only allowing either page-faulting jobs
> > > or
> > > dma-fence jobs on a resource (hw engine or hw engine group) that
> > > can be
> > > used by both at a time, blocking synchronously in the exec IOCTL
> > > until
> > > the resource is available for the job type. That means LR jobs
> > > waits
> > > for all dma-fence jobs to complete, and dma-fence jobs wait for all
> > > LR
> > > jobs to preempt. So a dma-fence job wait could easily mean "wait
> > > for
> > > all outstanding pagefaults to be serviced".
> > > 
> > > Whether, on the other hand, that is a real DoS we need to care
> > > about,
> > > is probably a topic for debate. The directions we've had so far are
> > > that it's not. Nothing is held up indefinitely, what's held up can
> > > be
> > > Ctrl-C'd by the user and core mm memory management is not blocked
> > > since
> > > mmu_notifiers can execute to completion and shrinkers / eviction
> > > can
> > > execute while a page-fault is pending.
> > 
> > The problem is that a program that uses a page-faulting job can lock
> > out
> > all other programs on the system from using the GPU for an indefinite
> > period of time.  In a GUI session, this means a frozen UI, which
> > makes
> > recovery basically impossible without drastic measures (like
> > rebooting
> > or logging in over SSH).  That counts as a quite effective denial of
> > service from an end-user perspective, and unless I am mistaken it
> > would
> > be very easy to trigger by accident: just start a page-faulting job
> > that
> > loops forever.
> 
> I think the easiest remedy for this is that if a page-faulting job is
> either by purpose or mistake crafted in such a way that it holds up
> preemption when preemption is needed (like in the case I described, a
> dma-fence job is submitted) the driver will hit a preemption timeout
> and kill the pagefaulting job. (I think that is already handled in all
> cases in the xe driver but I would need to double check). So this would
> then boil down to the system administrator configuring the preemption
> timeout.

That makes sense!  That turns a DoS into "Don't submit pagefaulting jobs
on an interactive system, they won't be reliable."
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
Attachment:
signature.asc

Description: PGP signature