On Fri, May 14, 2021 at 09:47:55AM +0100, Jonathan Cameron wrote: > On Thu, 13 May 2021 14:20:38 -0700 > Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > On Tue, May 11, 2021 at 9:52 AM Jonathan Cameron > > <Jonathan.Cameron@xxxxxxxxxx> wrote: > > > > > > On Thu, 6 May 2021 14:59:34 -0700 > > > Ira Weiny <ira.weiny@xxxxxxxxx> wrote: > > > > > > > On Tue, Apr 20, 2021 at 12:54:49AM +0800, Jonathan Cameron wrote: > > > > > + > > > > > +static int pci_doe_send_req(struct pci_doe *doe, struct pci_doe_exchange *ex) > > > > > +{ > > > > > + struct pci_dev *pdev = doe->pdev; > > > > > + u32 val; > > > > > + int i; > > > > > + > > > > > + /* > > > > > + * Check the DOE busy bit is not set. If it is set, this could indicate > > > > > + * someone other than Linux (e.g. firmware) is using the mailbox. Note > > > > > + * it is expected that firmware and OS will negotiate access rights via > > > > > + * an, as yet to be defined method. > > > > > + */ > > > > > + pci_read_config_dword(pdev, doe->cap + PCI_DOE_STATUS, &val); > > > > > + if (FIELD_GET(PCI_DOE_STATUS_BUSY, val)) > > > > > + return -EBUSY; > > > > > > > > In discussion with Dan we believe that user space could also be issuing > > > > commands and would potentially cause us to be locked out. > > > > > > > > We agree that firmware should be out of the way here and if it is blocking > > > > the OS there is not much we can do about it. > > > > > > > > However, if user space is using the mailbox we need to synchronize with them > > > > via pci_cfg_access_[try]lock(). This should avoid this EBUSY condition. > > > > > > Hi Ira, thanks for taking a look. > > > > > > So the question here is whether we can ever safely work with a > > > userspace that is accessing the DOE. I think the answer is no we can't. > > > > > > We'd have no way of knowing that userspace left the DOE in a clean state > > > without resetting every time we want to use it (which can take 1 second) > > > or doing significant sanity checking (can we tell if something is > > > in flight?). Note that if userspace and kernel were talking different > > > protocols nothing sensible could be done to prevent them receiving each > > > other's answers (unless you can rely on userspace holding the lock until > > > it is done - which you can't as who trusts userspace?) > > > > There is no ability for userpsace to lock out the kernel, only kernel > > locking out userspace. > > Hi Dan, > > Got it. Writing userspace to code with arbitrary kernel > breakage of exchanges userspace initialized is going to be nasty. > > > > > > You could do > > > something horrible like back off after peeking at the protocol to see > > > if it might be yours, but even that only works assuming the two are > > > trying to talk different protocols (talking the same protocol isn't allowed > > > but no way to enforce that using just pci_cfg_access_lock()). > > > > Wait why isn't pci_cfg_access_lock() sufficient? The userspace DOE > > transfer is halted, the kernel validates the state of DOE, does it's > > work and releases the lock. > > It's that 'validate the state of the DOE' which is the problem. I 'think' > the only way to do that is to issue an abort every time and I'm really > not liking the fact that adds a potential 1 second sleep to every > DOE access from the kernel. IIUC an abort would mean game over for *every* transaction in flight, not sure that's the best way of preventing userspace from mingling but as you mentioned I don't think there is a way around it with the current protocol. I don't see how a lock would solve this issue either - how would it ? Basically you have to stop userspace from issuing requests for the duration of a request/response (per-protocol) session, right ? Lorenzo