Re: [PATCH v3 2/2] PCI: rcar: Return all Fs from read which triggered an exception

Marek Vasut <marek.vasut@xxxxxxxxx> · Mon, 24 Jan 2022 06:46:47 +0100

On 1/23/22 17:49, Pali Rohár wrote:

Hi,

[...]

I must admit that this patch from its initial version evolved into giant hack...
https://lore.kernel.org/linux-pci/20210514200549.431275-1-marek.vasut@xxxxxxxxx/

During review of the previous patch I have asked some important
questions but I have not got any answer to them. So I'm reminding it:
https://lore.kernel.org/linux-pci/20210805183024.ftdwknkttfwwogks@pali/

So could please answer what happens when PCIe controller is in some
non-L* state and either MMIO happen or config read happens or config
write happens?

What kind of non-L state ?

E.g. Hot Reset, Detect, Polling, Configuration or Recovery.

Do you have some specific test which fails ?

Yes, by putting PCIe controller into one of those states. I have already
wrote you in some previous email to trigger hot reset as this is the
easiest test and can be done also by userspace (setpci).

Link goes to Recovery state automatically when doing link retraining
(e.g. by setting RT bit in PCIe Root Port config space) and from
Recovery to Configuration or directly back to L0. So testing this path
needs precise timing and repeating it more times to trigger.

So the easiest test is really via PCIe Hot Reset by setting Secondary
Bus Reset bit in Bridge Control register of PCIe Root Port. After this
is link in Hot Reset and does not go back to L0 until you clear that
bit. So in this state you can do all these operations which cause
aborts, like calling that kernel function which is reading from config
space which belongs to device on the other end of the PCIe link or doing
MMIO read / write operation of mapped memory which again belongs to
other end of PCIe link.

Or instead of Hot Reset, you can set link disable bit in config space of
PCIe Root Port. Then link also would not be in L0 state (until you clear
that bit), so again you have lot of time to do same tests.

Can you give me the exact setpci invocation ? If so, then I can test 
this for you on the hardware.

This patch addresses the case where the link transition to L1 state has to
be completed manually. If the CPU accesses the config space before that
happened, you get an imprecise data abort.

Yes, I see. But it does not have to complete and the question how is
handled this case... And that is why is needed to know what happens in
such cases.

And IIRC you cannot go from L1 state directly to L0, but only via
Recovery state. And from Recovery you may end up in Detect state.
(e.g. after hot unplug or if some buggy card with kernel quirk is used)

It is really important to know this fact.

I'm in impression that this patch still is not enough as similar issues
are also in other PCIe controllers which I know...

Do you have a suggestion for a patch which would be enough on this hardware
?

I do not have enough information.

I see