Re: [RFC PATCH v3 2/4] dax: Check for data cache aliasing at runtime

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Fri, 2 Feb 2024 14:29:05 -0500

On 2024-02-02 12:37, Dan Williams wrote:
Mathieu Desnoyers wrote:
[...]


The alternative route I intend to take is to audit all callers
of alloc_dax() and make sure they all save the alloc_dax() return
value in a struct dax_device * local variable first for the sake
of checking for IS_ERR(). This will leave the xyz->dax_dev pointer
initialized to NULL in the error case and simplify the rest of
error checking.

I could maybe get on board with that, but it needs a comment somewhere
about the asymmetric subtlety.

Is this "somewhere" at every alloc_dax() call site, or do you have
something else in mind ?




   		return;
   
   	if (dax_dev->holder_data != NULL)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 4e8fdcb3f1c8..b69c9e442cf4 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -560,17 +560,19 @@ static int pmem_attach_disk(struct device *dev,
   	dax_dev = alloc_dax(pmem, &pmem_dax_ops);
   	if (IS_ERR(dax_dev)) {
   		rc = PTR_ERR(dax_dev);
-		goto out;
+		if (rc != -EOPNOTSUPP)
+			goto out;

If I compare the before / after this change, if previously
pmem_attach_disk() was called in a configuration with FS_DAX=n, it would
result in a NULL pointer dereference.

No, alloc_dax() only returns NULL CONFIG_DAX=n case, not the
CONFIG_FS_DAX=n case.

Indeed, I was wrong there.

So that means that pmem devices on ARM have been
possible without FS_DAX. So, in order for alloc_dax() returning
ERR_PTR(-EOPNOTSUPP) to not regress pmem device availability this error
path needs to be changed.
Good point. We're moving the depends on !(ARM || MIPS |PARC) from FS_DAX
Kconfig to a runtime check in alloc_dax(), which is used whenever DAX=y,
which includes configurations that had FS_DAX=n previously.

I'll change the error path in pmem_attack_disk to treat -EOPNOTSUPP
alloc_dax() return value as non-fatal.


This would be an error handling fix all by itself. Do we really want
to return successfully if dax is unsupported, or should we return
an error here ?

Per above, there is no error handling fix, and pmem block device
available should not depend on alloc_dax() succeeding.

I agree on treating alloc_dax() failure as non-fatal. There is
however one error handling fix to nvdimm/pmem which I plan to
introduce as an initial patch before this change:

    nvdimm/pmem: Fix leak on dax_add_host() failure
    
    Fix a leak on dax_add_host() error, where "goto out_cleanup_dax" is done
    before setting pmem->dax_dev, which therefore issues the two following
    calls on NULL pointers:
    
    out_cleanup_dax:
            kill_dax(pmem->dax_dev);
            put_dax(pmem->dax_dev);
    
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 4e8fdcb3f1c8..9fe358090720 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -566,12 +566,11 @@ static int pmem_attach_disk(struct device *dev,
 	set_dax_nomc(dax_dev);
 	if (is_nvdimm_sync(nd_region))
 		set_dax_synchronous(dax_dev);
+	pmem->dax_dev = dax_dev;
 	rc = dax_add_host(dax_dev, disk);
 	if (rc)
 		goto out_cleanup_dax;
 	dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
-	pmem->dax_dev = dax_dev;
-
 	rc = device_add_disk(dev, disk, pmem_attribute_groups);
 	if (rc)
 		goto out_remove_host;


The real question is what to do about device-dax. I *think* it is not
affected by cpu_dcache aliasing because it never accesses user mappings
through a kernel alias. I doubt device-dax is in use on these platforms,
but we might need another fixup for that if someone screams about the
alloc_dax() behavior change making them lose device-dax access.

By "device-dax", I understand you mean drivers/dax/Kconfig:DEV_DAX.

Based on your analysis, is alloc_dax() still the right spot where
to place this runtime check ? Which call sites are responsible
for invoking alloc_dax() for device-dax ?

If we know which call sites do not intend to use the kernel linear
mapping, we could introduce a flag (or a new variant of the alloc_dax()
API) that would either enforce or skip the check.

[...]


Here what I'm seeing so far:

- devm_release_mem_region() is never called after devm_request_mem_region(). Not
    on error, neither on teardown,

devm_release_mem_region() is called from virtio_fs_probe() context. That

I guess you mean "devm_request_mem_region()" here.

means that when virtio_fs_probe() returns an error the driver core will
automatically call devm_request_mem_region().

And "devm_release_mem_region()" here.


- pgmap is never freed on error after devm_kzalloc.

That is what the "devm_" in devm_kzalloc() does, free the memory on
driver-probe failure, or after the driver remove callback is invoked.

Got it.



   {
+	struct dax_device *dax_dev __free(cleanup_dax) = NULL;
   	struct virtio_shm_region cache_reg;
   	struct dev_pagemap *pgmap;
   	bool have_cache;
@@ -804,6 +808,15 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
   	if (!IS_ENABLED(CONFIG_FUSE_DAX))
   		return 0;
   
+	dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
+	if (IS_ERR(dax_dev)) {
+		int rc = PTR_ERR(dax_dev);
+
+		if (rc == -EOPNOTSUPP)
+			return 0;
+		return rc;
+	}

What is gained by moving this allocation here ?

The gain is to fail early in virtio_fs_setup_dax() since the fundamental
dependency of alloc_dax() success is not met. For example why let the
setup progress to devm_memremap_pages() when alloc_dax() is going to
return ERR_PTR(-EOPNOTSUPP).

What I don't know is whether there is a dependency requiring to do
devm_request_mem_region(), devm_kzalloc(), devm_memremap_pages()
before calling alloc_dax() ?

Those 3 calls are used to populate:

        fs->window_phys_addr = (phys_addr_t) cache_reg.addr;
        fs->window_len = (phys_addr_t) cache_reg.len;

and then alloc_dax() takes "fs" as private data parameter. So it's
unclear to me whether we can swap the invocation order. I suspect
that it is not an issue because it is only used to populate
dax_dev->private, but I prefer to confirm this with you just to be
on the safe side.

[...]

Thanks,

Mathieu



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com