Re: [PATCH RFC 12/18] rust: drm: sched: Add GPU scheduler abstraction

Daniel Vetter <daniel@xxxxxxxx> · Tue, 18 Apr 2023 10:45:33 +0200

On Wed, Apr 05, 2023 at 09:29:02PM +0200, Daniel Vetter wrote:
> On Wed, Apr 05, 2023 at 05:43:01PM +0200, Daniel Vetter wrote:
> > On Tue, Mar 07, 2023 at 11:25:37PM +0900, Asahi Lina wrote:
> > > +/// An armed DRM scheduler job (not yet submitted)
> > > +pub struct ArmedJob<'a, T: JobImpl>(Box<Job<T>>, PhantomData<&'a T>);
> > > +
> > > +impl<'a, T: JobImpl> ArmedJob<'a, T> {
> > > +    /// Returns the job fences
> > > +    pub fn fences(&self) -> JobFences<'_> {
> > > +        JobFences(unsafe { &mut *self.0.job.s_fence })
> > > +    }
> > > +
> > > +    /// Push the job for execution into the scheduler
> > > +    pub fn push(self) {
> > > +        // After this point, the job is submitted and owned by the scheduler
> > > +        let ptr = match self {
> > > +            ArmedJob(job, _) => Box::<Job<T>>::into_raw(job),
> > > +        };
> > 
> > If I get this all right then this all makes sure that drivers can't use
> > the job after push and they don't forgot to call arm.
> > 
> > What I'm not seeing is how we force drivers to call push once they've
> > called arm? I haven't check what the code does, but from the docs it
> > sounds like if you don't call push then drop will get called. Which wreaks
> > the book-keeping on an armed job. Or is there someting that prevents
> > ArmedJob<T> from having the Drop trait and so the only way to not go boom
> > is by pushing it?
> > 
> > Googling for "rust undroppable" seems to indicate that this isn't a thing
> > rust can do?
> 
> Another thing that I just realized: The driver must ensure that the
> arm->push sequence on a given drm_sched_entity isn't interrupte by another
> thread doing the same, i.e. you need to wrap it all in a lock, and it
> always needs to be the same lock for a given entity.
> 
> I have no idea how to guarantee that, but I guess somehow we should?

Ok I was wrong here, pushing the job is optional, but the locking rules
are still the same.

I think we can solve this in rust with:
- passing &mut Entity to a new submit_job function. that way locking rules
  are left to the driver, which I think is best.
- the submit_job also takes a closure, and passes the armed job as a &mut
  ArmedJob to it. That way we guarantee that the armed job never survives
  longer than the mutex guard (or whatever trick the driver is using) for
  the Entity
- that closure probably should have Result return type which submit_job
  just passes on, because some drivers (when you support userptr that is)
  need to be able to bail out. since the ArmedJob is borred it shouldn't
  be able to escape through the return value
- only ArmedJob has push_job

I think with that we fully uphold the drm_sched arm/push_job contract on
the rust side?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch