Re: [PATCH v2 11/12] PM, libnvdimm: Add 'mem-quiet' state and callback for firmware activation

Dan Williams <dan.j.williams@xxxxxxxxx> · Thu, 9 Jul 2020 08:56:02 -0700

On Thu, Jul 9, 2020 at 8:01 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
>
> On Mon, Jul 06, 2020 at 06:59:32PM -0700, Dan Williams wrote:
> > The runtime firmware activation capability of Intel NVDIMM devices
> > requires memory transactions to be disabled for 100s of microseconds.
> > This timeout is large enough to cause in-flight DMA to fail and other
> > application detectable timeouts. Arrange for firmware activation to be
> > executed while the system is "quiesced", all processes and device-DMA
> > frozen.
> >
> > It is already required that invoking device ->freeze() callbacks is
> > sufficient to cease DMA. A device that continues memory writes outside
> > of user-direction violates expectations of the PM core to be to
> > establish a coherent hibernation image.
> >
> > That said, RDMA devices are an example of a device that access memory
> > outside of user process direction. RDMA drivers also typically assume
> > the system they are operating in will never be hibernated. A solution
> > for RDMA collisions with firmware activation is outside the scope of
> > this change and may need to rely on being able to survive the platform
> > imposed memory controller quiesce period.
>
> Yikes.  I don't think we should support such a broken runtime firmware
> activation.

Yikes indeed, that matches my initial reaction.

So I can say that the platform folks recognize that this situation is
untenable and you can see in the interface that it is built to support
future platforms that can activate without a memory quiesce period.
The question is what to do in the meantime. It turns out that despite
my initial skeptical reaction a significant number of users are
willing to manage this quiesce (or even race this quiesce!) to avoid a
server reboot which is basically guaranteed to knock a "9" off of a "5
nines" uptime system. My minimum requirement for supporting this in
Linux was to at least have a safe way to mitigate the risks of a race
and that's what led me to the hibernate path.