Re: [PATCH v13 2/4] EINJ: Add CXL error type support

Dan Williams <dan.j.williams@xxxxxxxxx> · Wed, 21 Feb 2024 12:41:33 -0800

Ben Cheatham wrote:
> 
> 
> On 2/21/24 11:43 AM, Dan Williams wrote:
> > Ben Cheatham wrote:
> >> Remove CXL protocol error types from the EINJ module and move them to
> >> a new einj_cxl module. The einj_cxl module implements the necessary
> >> handling for CXL protocol error injection and exposes an API for the
> >> CXL core to use said functionality. Because the CXL error types
> >> require special handling, only allow them to be injected through the
> >> einj_cxl module and return an error when attempting to inject through
> >> "regular" EINJ.
> > 
> > So Robustness Principle says be conservative in what you send and
> > liberal in what you accept. So cleaning up the reporting of CXL
> > capabilities over to the new interface is consistent with that
> > principle, but not removing the ability to inject via the legacy
> > interface. Especially since that has been the status quo for a few
> > kernel cycles is there a good reason to actively prevent usage of that
> > path?
> > 
> 
> For CXL 2.0+ ports it's fine since EINJ only expects an SBDF which is
> pretty readily accessible by the user. CXL 1.1/1.0 ports however, it's a bit
> of a headache. It would require the user to find the address of the RCRB
> for the port and supply that to the EINJ module. I originally had this option
> anyway, but I think it got shot down for being too obtuse to use (I think by
> you, but it's been a while xD). If you think it's still worthwhile I can
> remove the restriction for both types of ports or just the 2.0+ ports.

So, to be clear, yes I NAKd that being the primary interface, and I am
not changing my mind on it being too obtuse to inflict on end users.
However, just because it is too obtuse to be the primary interface does
not mean that if someone runs that gauntlet, or has expert knowledge of
where RCRB is located, that they be tripped up at the last moment from
specifying that parameter via the legacy path.

So consider it an undocumented feature, and I think it only ends up
being a one line change to let that parameter through, if it can be done
safely.

That said, if there is any concern about input validation and safety
then keep the policy as you have it.

> For CXL 1.0/1.1 ports there's also the security issue of being able to inject
> to any address since the way it works is by skipping the memory address
> checks, but since this is a debug module I don't think it's that big
> of a deal.

Say more here, this seems like just the safety issue I just mentioned
that would justify forcing folks through the CXL interface. Depending on
how severe this is it might also require backporting the removal of CXL
injection from older kernels.

The main concern would be whether einj needs to disabled when kernel
lockdown is enabled.