Re: [GIT PULL] device-dax for 5.1: PMEM as RAM

Dan Williams <dan.j.williams@xxxxxxxxx> · Fri, 15 Mar 2019 10:33:41 -0700

On Mon, Mar 11, 2019 at 5:08 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Mar 11, 2019 at 8:37 AM Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> >
> > Another feature the userspace tooling can support for the PMEM as RAM
> > case is the ability to complete an Address Range Scrub of the range
> > before it is added to the core-mm. I.e at least ensure that previously
> > encountered poison is eliminated.
>
> Ok, so this at least makes sense as an argument to me.
>
> In the "PMEM as filesystem" part, the errors have long-term history,
> while in "PMEM as RAM" the memory may be physically the same thing,
> but it doesn't have the history and as such may not be prone to
> long-term errors the same way.
>
> So that validly argues that yes, when used as RAM, the likelihood for
> errors is much lower because they don't accumulate the same way.

Hi Linus,

The question about a new enumeration mechanism for this has been
raised, but I don't expect a response before the merge window closes.
While it percolates, how do you want to proceed in the meantime?

The kernel could export it's knowledge of the situation in
/sys/devices/system/cpu/vulnerabilities?

Otherwise, the exposure can be reduced in the volatile-RAM case by
scanning for and clearing errors before it is onlined as RAM. The
userspace tooling for that can be in place before v5.1-final. There's
also runtime notifications of errors via acpi_nfit_uc_error_notify()
from background scrubbers on the DIMM devices. With that mechanism the
kernel could proactively clear newly discovered poison in the volatile
case, but that would be additional development more suitable for v5.2.

I understand the concern, and the need to highlight this issue by
tapping the brakes on feature development, but I don't see PMEM as RAM
making the situation worse when the exposure is also there via DAX in
the PMEM case. Volatile-RAM is arguably a safer use case since it's
possible to repair pages where the persistent case needs active
application coordination.

Please take another look at merging this for v5.1, or otherwise let me
know what software changes you'd like to see to move this forward. I'm
also open to the idea of just teaching memcpy_mcsafe() to use rep; mov
as if it was always recoverable and relying on the error being mapped
out after reboot if it was not recoverable. At reboot the driver gets
notification of physical addresses that caused a previous crash so
that software can avoid a future consumption.

git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/devdax-for-5.1