Re: [PATCH 05/23] cxl/pci: Don't poll doorbell for mailbox access

Dan Williams <dan.j.williams@xxxxxxxxx> · Mon, 29 Nov 2021 11:37:34 -0800

On Mon, Nov 29, 2021 at 11:32 AM Ben Widawsky <ben.widawsky@xxxxxxxxx> wrote:
[..]
> >
> > Right, there's no harm in the check, it just seems overly paranoid to
> > me if it was already checked once. Until a doorbell timeout happens
> > it's an extra MMIO cycle that can saved for a "what happened?" check
> > after a timeout.
>
> Well I suspect we're just rearranging the deck chairs on the Titanic now, but...

Not so much, just trying to get this driver in line with other error
handling designs.

> I see doorbell timeouts as disconnected from whether or not the mailbox
> interface is ready. If they were the same, we wouldn't need both bits and we
> could just wait extra long for the doorbell when probing.
>
> In other words, I expect if the interface goes unready, doorbell timeout will
> occur, but I don't think we should assume if doorbell timeout occurs, the
> interface is no longer ready. I don't purport to know why a doorbell timeout
> might occur while the interface remains available (likely a firmware bug, I
> presume).
>
> It does seem interesting to check if the interface is no longer ready on timeout
> though.

So I'm just modeling this off of NVME error handling where there is a
Controller Fatal Status bit that could be checked every transaction,
but instead the driver waits until a command timeout to collect if the
device went fatal / not-ready.