Re: enclosure: fix sysfs symlinks creation when using multipath

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/16/2017 10:41 AM, Douglas Miller wrote:
On 03/16/2017 01:49 PM, James Bottomley wrote:
On Wed, 2017-03-15 at 19:39 -0400, Martin K. Petersen wrote:
Maurizio Lombardi <mlombard@xxxxxxxxxx> writes:

With multipath, it may happen that the same device is passed to
enclosure_add_device() multiple times and that the
enclosure_add_links() function fails to create the symlinks because
the device's sysfs directory entry is still NULL.  In this case,
the
links will never be created because all the subsequent calls to
enclosure_add_device() will immediately fail with EEXIST.
James?
Well I don't think the patch is the correct way to do this.  The
problem is that if we encounter an error creating the links, we
shouldn't add the device to the enclosure.  There's no need of a
links_created variable (see below).

However, more interesting is why the link creation failed in the first
place.  The device clearly seems to exist because it was added to sysfs
at time index 19.2 and the enclosure didn't try to use it until 60.0.
  Can you debug this a bit more, please?  I can't see anything specific
to multipath in the trace, so whatever this is looks like it could
happen in the single path case as well.

James

diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
index 65fed71..ae89082 100644
--- a/drivers/misc/enclosure.c
+++ b/drivers/misc/enclosure.c
@@ -375,6 +375,7 @@ int enclosure_add_device(struct enclosure_device *edev, int component,
               struct device *dev)
  {
      struct enclosure_component *cdev;
+    int err;
        if (!edev || component >= edev->components)
          return -EINVAL;
@@ -384,12 +385,15 @@ int enclosure_add_device(struct enclosure_device *edev, int component,
      if (cdev->dev == dev)
          return -EEXIST;
  -    if (cdev->dev)
+    if (cdev->dev) {
          enclosure_remove_links(cdev);
-
-    put_device(cdev->dev);
-    cdev->dev = get_device(dev);
-    return enclosure_add_links(cdev);
+        put_device(cdev->dev);
+        cdev->dev = NULL;
+    }
+    err = enclosure_add_links(cdev);
+    if (!err)
+        cdev->dev = get_device(dev);
+    return err;
  }
  EXPORT_SYMBOL_GPL(enclosure_add_device);
After stumbling across the NULL pointer panic, I was able to use Maurizio's second patch below:

diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
index 65fed71..6ac07ea 100644
--- a/drivers/misc/enclosure.c
+++ b/drivers/misc/enclosure.c
@@ -375,6 +375,7 @@ int enclosure_add_device(struct enclosure_device *edev, int component,
                         struct device *dev)
 {
        struct enclosure_component *cdev;
+       int err;

        if (!edev || component >= edev->components)
                return -EINVAL;
@@ -384,12 +385,17 @@ int enclosure_add_device(struct enclosure_device *edev, int component,
        if (cdev->dev == dev)
                return -EEXIST;

-       if (cdev->dev)
+       if (cdev->dev) {
                enclosure_remove_links(cdev);
-
-       put_device(cdev->dev);
+               put_device(cdev->dev);
+       }
        cdev->dev = get_device(dev);
-       return enclosure_add_links(cdev);
+       err = enclosure_add_links(cdev);
+       if (err) {
+               cdev->dev = NULL;
+               put_device(cdev->dev);
+       }
+       return err;
 }
 EXPORT_SYMBOL_GPL(enclosure_add_device);


I am able to pass my testing with this patch. I don't see an official submit of this patch, but will respond to it when I see one. Again, I am seeing the problem even without multipath.

Just to respond to James' question on the cause. What I observed was a race condition between udevd (ses_init()) and a worker thread (do_scan_async()), where the worker thread is creating the directories that are the target of the symlinks being created by udevd. Something was happening when udevd caught up with the worker thread (so the target directory did not exist) and it seemed the worker thread either got preempted or else just could not stay ahead of udevd. This means that udevd started failing to create symlinks even though the worker thread eventually got them all created. I did observe what appeared to be preemption, as the creation of directories stopped until udevd finished failing all the (rest of the) symlinks. Although there may have been other explanations for what I saw.




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux