Re: [PATCH v5 00/32] Introduce GPU SVM and Xe SVM implementation

Demi Marie Obenour <demi@xxxxxxxxxxxxxxxxxxxxxx> · Fri, 14 Feb 2025 11:14:10 -0500

On Fri, Feb 14, 2025 at 09:47:13AM +0100, Thomas Hellström wrote:
> Hi
> 
> On Thu, 2025-02-13 at 16:23 -0500, Demi Marie Obenour wrote:
> > On Wed, Feb 12, 2025 at 06:10:40PM -0800, Matthew Brost wrote:
> > > Version 5 of GPU SVM. Thanks to everyone (especially Sima, Thomas,
> > > Alistair, Himal) for their numerous reviews on revision 1, 2, 3 
> > > and for
> > > helping to address many design issues.
> > > 
> > > This version has been tested with IGT [1] on PVC, BMG, and LNL.
> > > Also
> > > tested with level0 (UMD) PR [2].
> > 
> > What is the plan to deal with not being able to preempt while a page
> > fault is pending?  This seems like an easy DoS vector.  My
> > understanding
> > is that SVM is mostly used by compute workloads on headless systems.
> > Recent AMD client GPUs don't support SVM, so programs that want to
> > run
> > on client systems should not require SVM if they wish to be portable.
> > 
> > Given the potential for abuse, I think it would be best to require
> > explicit administrator opt-in to enable SVM, along with possibly
> > having
> > a timeout to resolve a page fault (after which the context is
> > killed).
> > Since I expect most uses of SVM to be in the datacenter space (for
> > the
> > reasons mentioned above), I don't believe this will be a major
> > limitation in practice.  Programs that wish to run on client systems
> > already need to use explicit memory transfer or pinned userptr, and
> > administrators of compute clusters should be willing to enable this
> > feature because only one workload will be using a GPU at a time.
> 
> While not directly having addressed the potential DoS issue you
> mention, there is an associated deadlock possibility that may happen
> due to not being able to preempt a pending pagefault. That is if a dma-
> fence job is requiring the same resources held up by the pending page-
> fault, and then the pagefault servicing is dependent on that dma-fence
> to be signaled in one way or another.
> 
> That deadlock is handled by only allowing either page-faulting jobs or
> dma-fence jobs on a resource (hw engine or hw engine group) that can be
> used by both at a time, blocking synchronously in the exec IOCTL until
> the resource is available for the job type. That means LR jobs waits
> for all dma-fence jobs to complete, and dma-fence jobs wait for all LR
> jobs to preempt. So a dma-fence job wait could easily mean "wait for
> all outstanding pagefaults to be serviced".
> 
> Whether, on the other hand, that is a real DoS we need to care about,
> is probably a topic for debate. The directions we've had so far are
> that it's not. Nothing is held up indefinitely, what's held up can be
> Ctrl-C'd by the user and core mm memory management is not blocked since
> mmu_notifiers can execute to completion and shrinkers / eviction can
> execute while a page-fault is pending.

The problem is that a program that uses a page-faulting job can lock out
all other programs on the system from using the GPU for an indefinite
period of time.  In a GUI session, this means a frozen UI, which makes
recovery basically impossible without drastic measures (like rebooting
or logging in over SSH).  That counts as a quite effective denial of
service from an end-user perspective, and unless I am mistaken it would
be very easy to trigger by accident: just start a page-faulting job that
loops forever.

The simplest way to prevent this would be to require DRM master
privileges to spawn page-faulting jobs.  Only the Wayland compositor or
X server will normally have these, and they will never submit a
page-faulting job.  My understanding is that other IOCTLs that can mess
up a compositor also require DRM master privileges, and submitting a
page-faulting job seems to qualify.

There is still a legitimate use-case for running long-running workloads
on a GPU used for an interactive session.  However, DMA fencing compute
jobs can long running as long as they are preemptable, and they are
preemptable as long as they don't need page faults.  Sima, Faith, and
Christian have already come up with a solution for long-running Vulkan
compute.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
Attachment:
signature.asc

Description: PGP signature