Re: [PATCH 04/23] cxl/pci: Implement Interface Ready Timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 22, 2021 at 9:54 AM Jonathan Cameron
<Jonathan.Cameron@xxxxxxxxxx> wrote:
>
> On Mon, 22 Nov 2021 09:17:31 -0800
> Ben Widawsky <ben.widawsky@xxxxxxxxx> wrote:
>
> > On 21-11-22 15:02:27, Jonathan Cameron wrote:
> > > On Fri, 19 Nov 2021 16:02:31 -0800
> > > Ben Widawsky <ben.widawsky@xxxxxxxxx> wrote:
> > >
> > > > The original driver implementation used the doorbell timeout for the
> > > > Mailbox Interface Ready bit to piggy back off of, since the latter
> > > > doesn't have a defined timeout. This functionality, introduced in
> > > > 8adaf747c9f0 ("cxl/mem: Find device capabilities"), can now be improved
> > > > since a timeout has been defined with an ECN to the 2.0 spec.
> > > >
> > > > While devices implemented prior to the ECN could have an arbitrarily
> > > > long wait and still be within spec, the max ECN value (256s) is chosen
> > > > as the default for all devices. All vendors in the consortium agreed to
> > > > this amount and so it is reasonable to assume no devices made will
> > > > exceed this amount.
> > >
> > > Optimistic :)
> > >
> >
> > Reasonable to assume is certainly not the same as "in reality". I can soften
> > this wording.
> >
> > > >
> > > > Signed-off-by: Ben Widawsky <ben.widawsky@xxxxxxxxx>
> > > > ---
> > > > This patch did not exist in RFCv2
> > > > ---
> > > >  drivers/cxl/pci.c | 29 +++++++++++++++++++++++++++++
> > > >  1 file changed, 29 insertions(+)
> > > >
> > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > > index 6c8d09fb3a17..2cef9fec8599 100644
> > > > --- a/drivers/cxl/pci.c
> > > > +++ b/drivers/cxl/pci.c
> > > > @@ -2,6 +2,7 @@
> > > >  /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> > > >  #include <linux/io-64-nonatomic-lo-hi.h>
> > > >  #include <linux/module.h>
> > > > +#include <linux/delay.h>
> > > >  #include <linux/sizes.h>
> > > >  #include <linux/mutex.h>
> > > >  #include <linux/list.h>
> > > > @@ -298,6 +299,34 @@ static int cxl_pci_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *c
> > > >  static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
> > > >  {
> > > >   const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET);
> > > > + unsigned long timeout;
> > > > + u64 md_status;
> > > > + int rc;
> > > > +
> > > > + /*
> > > > +  * CXL 2.0 ECN "Add Mailbox Ready Time" defines a capability field to
> > > > +  * dictate how long to wait for the mailbox to become ready. For
> > > > +  * simplicity, and to handle devices that might have been implemented
> > >
> > > I'm not keen on the 'for simplicity' argument here.  If the device is advertising
> > > a lower value, then that is what we should use.  It's fine to wait the max time
> > > if nothing is specified.  It'll cost us a few lines of code at most unless
> > > I am missing something...
> > >
> > > Jonathan
> > >
> >
> > Let me pose it a different way, if a device advertises 1s, but for whatever
> > takes 4s to come up, should we penalize it over the device advertising 256s?
>
> Yes, because it is buggy.  A compliance test should have failed on this anyway.
>
> > The
> > way this field is defined in the spec would [IMHO] lead vendors to simply put
> > the max field in there to game the driver, so why not start off with just
> > insisting they don't?
>
> Given reading this value and getting a big number gives the implication that
> the device is meant to be really slow to initialize, I'd expect that to push
> vendors a little in the directly of putting realistic values in).
>
> Maybe we should print the value in the log to make them look silly ;)

A print message on the way to a static default timeout value is about
all a device's self reported timeout is useful.

"device not ready after waiting %d seconds, continuing to wait up to %d seconds"

It's also not clear to me that the Linux default timeout should be so
generous at 256 seconds. It might be suitable to just complain about
devices that are taking more than 60 seconds to initialize with an
option to override that timeout for odd outliers. Otherwise encourage
hardware implementations to beat the Linux timeout value to get
support out of the box.

I notice that not even libata waits more than a minute for a given
device to finish post-reset shenanigans, so might as well set 60
seconds as what the driver will tolerate out of the box.



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux