On Mon, 2024-03-04 at 22:47 -0800, Saravana Kannan wrote: > On Mon, Mar 4, 2024 at 8:49 AM Herve Codina <herve.codina@xxxxxxxxxxx> wrote: > > > > Hi Rob, > > > > On Mon, 4 Mar 2024 09:22:02 -0600 > > Rob Herring <robh@xxxxxxxxxx> wrote: > > > > ... > > > > > > > @@ -853,6 +854,14 @@ static void free_overlay_changeset(struct > > > > > overlay_changeset *ovcs) > > > > > { > > > > > int i; > > > > > > > > > > + /* > > > > > + * Wait for any ongoing device link removals before removing some of > > > > > + * nodes. Drop the global lock while waiting > > > > > + */ > > > > > + mutex_unlock(&of_mutex); > > > > > + device_link_wait_removal(); > > > > > + mutex_lock(&of_mutex); > > > > > > > > I'm still not convinced we need to drop the lock. What happens if > > > > someone else > > > > grabs the lock while we are in device_link_wait_removal()? Can we > > > > guarantee that > > > > we can't screw things badly? > > > > > > It is also just ugly because it's the callers of > > > free_overlay_changeset() that hold the lock and now we're releasing it > > > behind their back. > > > > > > As device_link_wait_removal() is called before we touch anything, can't > > > it be called before we take the lock? And do we need to call it if > > > applying the overlay fails? > > Rob, > > This[1] scenario Luca reported seems like a reason for the > device_link_wait_removal() to be where Herve put it. That example > seems reasonable. > > [1] - https://lore.kernel.org/all/20231220181627.341e8789@booty/ > I'm still not totally convinced about that. Why not putting the check right before checking the kref in __of_changeset_entry_destroy(). I'll contradict myself a bit because this is just theory but if we look at pci_stop_dev(), which AFAIU, could be reached from a sysfs write(), we have: device_release_driver(&dev->dev); ... of_pci_remove_node(dev); of_changeset_revert(np->data); of_changeset_destroy(np->data); So looking at the above we would hit the same issue if we flush the queue in free_overlay_changeset() - as the queue won't be flushed at all and we could have devlink removal due to device_release_driver(). Right? Again, completely theoretical but seems like a reasonable one plus I'm not understanding the push against having the flush in __of_changeset_entry_destroy(). Conceptually, it looks the best place to me but I may be missing some issue in doing it there? > > > > > > > Indeed, having device_link_wait_removal() is not needed when applying the > > overlay fails. > > > > I can call device_link_wait_removal() from the caller of_overlay_remove() > > but not before the lock is taken. > > We need to call it between __of_changeset_revert_notify() and > > free_overlay_changeset() and so, the lock is taken. > > > > This lead to the following sequence: > > --- 8< --- > > int of_overlay_remove(int *ovcs_id) > > { > > ... > > mutex_lock(&of_mutex); > > ... > > > > ret = __of_changeset_revert_notify(&ovcs->cset); > > ... > > > > ret_tmp = overlay_notify(ovcs, OF_OVERLAY_POST_REMOVE); > > ... > > > > mutex_unlock(&of_mutex); > > device_link_wait_removal(); > > mutex_lock(&of_mutex); > > > > free_overlay_changeset(ovcs); > > ... > > mutex_unlock(&of_mutex); > > ... > > } > > --- 8< --- > > > > In this sequence, the question is: > > Do we need to release the mutex lock while device_link_wait_removal() is > > called ? > > In general I hate these kinds of sequences that release a lock and > then grab it again quickly. It's not always a bug, but my personal > take on that is 90% of these introduce a bug. > > Drop the unlock/lock and we'll deal a deadlock if we actually hit one. > I'm also fairly certain that device_link_wait_removal() can't trigger > something else that can cause an OF overlay change while we are in the > middle of one. And like Rob said, I'm not sure this unlock/lock is a > good solution for that anyway. Totally agree. Unless we really see a deadlock this is a very bad idea (IMHO). Even on the PCI code, it seems to me that we're never destroying a changeset from a device/kobj_type release callback. That would be super weird right? - Nuno Sá >