RE: [RFC PATCH 0/1] DOE usage with pcie/portdrv

"Hindman, Gavin" <gavin.hindman@xxxxxxxxx> · Wed, 11 May 2022 20:22:19 +0000

>-----Original Message-----
>From: Dan Williams <dan.j.williams@xxxxxxxxx>
>Sent: Wednesday, May 11, 2022 12:42 PM
>To: Lukas Wunner <lukas@xxxxxxxxx>
>Cc: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>; Hindman, Gavin
><gavin.hindman@xxxxxxxxx>; Linuxarm <linuxarm@xxxxxxxxxx>; Weiny, Ira
><ira.weiny@xxxxxxxxx>; Linux PCI <linux-pci@xxxxxxxxxxxxxxx>; linux-
>cxl@xxxxxxxxxxxxxxx; CHUCK_LEVER <chuck.lever@xxxxxxxxxx>
>Subject: Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
>
>On Wed, May 11, 2022 at 12:14 PM Lukas Wunner <lukas@xxxxxxxxx> wrote:
>>
>> On Mon, May 09, 2022 at 10:48:06AM +0100, Jonathan Cameron wrote:
>> > On Sat, 7 May 2022 12:18:48 +0200 Lukas Wunner <lukas@xxxxxxxxx>
>wrote:
>> > > I'm still somewhat undecided on the kernel vs. user space question.
>> >
>> > Likewise.  I feel a few more prototypes are needed to come to clear
>> > conclusion.
>>
>> Gavin Hindman (+cc) raised an important point off-list:
>>
>> When an IDE-capable device is runtime suspended to D3hot and later
>> runtime resumed to D0, it may not preserve its internal state.
>> (The No_Soft_Reset bit in the Power Management Control/Status Register
>> tells us whether the device is capable of preserving internal state
>> over a transition to D3hot, see PCIe r6.0, sec. 7.5.2.2.)
>
>I think power-management effects relative to IDE is a soft spot of the
>specification. If the link goes down then yes, IDE needs to be re-established,
>but as far as I can see that's a policy tradeoff to support runtime reset or
>support link encryption.
>
>> Likewise, when an IDE-capable device is reset (e.g. due to Downstream
>> Port Containment, AER or a bus reset initiated by user space),
>> internal state is lost and must be reconstructed by pci_restore_state().
>> That state includes the SPDM session or IDE encryption.
>>
>> If setting up an SPDM session is dependent on user space, the kernel
>> would have to leave a device in an inoperable state after runtime
>> resume or reset, until user space gets around to initiate SPDM.
>
>Yes, this seems acceptable from the perspective of server platforms that can
>make the power management vs security tradeoff.
>

Agree, though more and more we need to be thinking about sustainability and cost-of-ownership and having to keep devices awake in order to meet security goals is somewhat contrary to that objective.  I fully realize those are not technical constraints, but IMO should still be considered.  Latency for deadline-driven tasks was my original consideration, not just security - power-management features commonly get turned off due to resume latency, and this would appear to have the potential to extend resume latencies even in kernel, let alone waiting for user-space response. Again, obviously not a hard design constraint, but seems worthy of consideration 

>>
>> I think that would be a terrible user experience.  We've gone to great
>> lengths to make reset recovery as seamless and quick as possible.
>> (E.g. hot-plugged NVMe drives survive a reset without the driver being
>> unbound, those would be prime candidates for IDE encryption.) It won't
>> help the acceptance of IDE if it breaks that seamlessness.
>>
>> So that's a strong argument for an in-kernel SPDM implementation.
>
>The SPDM message passing will always need to be supported in-kernel.
>It's the certificate parsing and attestation flow that is proposed to be in
>userspace. So perform CMA with userspace up-calls, and then insert a key-id
>into the kernel for ongoing SPDM message passing.