Re: [RFC PATCH v3 2/4] PCI/doe: Add Data Object Exchange support

Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx> · Fri, 14 May 2021 12:15:38 +0100

On Fri, May 14, 2021 at 09:47:55AM +0100, Jonathan Cameron wrote:
> On Thu, 13 May 2021 14:20:38 -0700
> Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> 
> > On Tue, May 11, 2021 at 9:52 AM Jonathan Cameron
> > <Jonathan.Cameron@xxxxxxxxxx> wrote:
> > >
> > > On Thu, 6 May 2021 14:59:34 -0700
> > > Ira Weiny <ira.weiny@xxxxxxxxx> wrote:
> > >  
> > > > On Tue, Apr 20, 2021 at 12:54:49AM +0800, Jonathan Cameron wrote:  
> > > > > +
> > > > > +static int pci_doe_send_req(struct pci_doe *doe, struct pci_doe_exchange *ex)
> > > > > +{
> > > > > +   struct pci_dev *pdev = doe->pdev;
> > > > > +   u32 val;
> > > > > +   int i;
> > > > > +
> > > > > +   /*
> > > > > +    * Check the DOE busy bit is not set. If it is set, this could indicate
> > > > > +    * someone other than Linux (e.g. firmware) is using the mailbox. Note
> > > > > +    * it is expected that firmware and OS will negotiate access rights via
> > > > > +    * an, as yet to be defined method.
> > > > > +    */
> > > > > +   pci_read_config_dword(pdev, doe->cap + PCI_DOE_STATUS, &val);
> > > > > +   if (FIELD_GET(PCI_DOE_STATUS_BUSY, val))
> > > > > +           return -EBUSY;  
> > > >
> > > > In discussion with Dan we believe that user space could also be issuing
> > > > commands and would potentially cause us to be locked out.
> > > >
> > > > We agree that firmware should be out of the way here and if it is blocking
> > > > the OS there is not much we can do about it.
> > > >
> > > > However, if user space is using the mailbox we need to synchronize with them
> > > > via pci_cfg_access_[try]lock().  This should avoid this EBUSY condition.  
> > >
> > > Hi Ira, thanks for taking a look.
> > >
> > > So the question here is whether we can ever safely work with a
> > > userspace that is accessing the DOE.  I think the answer is no we can't.
> > >
> > > We'd have no way of knowing that userspace left the DOE in a clean state
> > > without resetting every time we want to use it (which can take 1 second)
> > > or doing significant sanity checking (can we tell if something is
> > > in flight?).  Note that if userspace and kernel were talking different
> > > protocols nothing sensible could be done to prevent them receiving each
> > > other's answers (unless you can rely on userspace holding the lock until
> > > it is done - which you can't as who trusts userspace?)  
> > 
> > There is no ability for userpsace to lock out the kernel, only kernel
> > locking out userspace.
> 
> Hi Dan,
> 
> Got it. Writing userspace to code with arbitrary kernel
> breakage of exchanges userspace initialized is going to be nasty. 
> 
> > 
> > > You could do
> > > something horrible like back off after peeking at the protocol to see
> > > if it might be yours, but even that only works assuming the two are
> > > trying to talk different protocols (talking the same protocol isn't allowed
> > > but no way to enforce that using just pci_cfg_access_lock()).  
> > 
> > Wait why isn't pci_cfg_access_lock() sufficient? The userspace DOE
> > transfer is halted, the kernel validates the state of DOE, does it's
> > work and releases the lock.
> 
> It's that 'validate the state of the DOE' which is the problem.  I 'think'
> the only way to do that is to issue an abort every time and I'm really
> not liking the fact that adds a potential 1 second sleep to every
> DOE access from the kernel.

IIUC an abort would mean game over for *every* transaction in flight,
not sure that's the best way of preventing userspace from mingling
but as you mentioned I don't think there is a way around it with the
current protocol.

I don't see how a lock would solve this issue either - how would it ?

Basically you have to stop userspace from issuing requests for the
duration of a request/response (per-protocol) session, right ?

Lorenzo