On Mon, Oct 04, 2021 at 06:45:07PM +0300, Leon Romanovsky wrote: > On Mon, Oct 04, 2021 at 05:19:40PM +0300, Ido Schimmel wrote: > > On Sun, Oct 03, 2021 at 09:12:06PM +0300, Leon Romanovsky wrote: > > > From: Leon Romanovsky <leonro@xxxxxxxxxx> > > > > > > After changes to allow dynamically set the reload_up/_down callbacks, > > > we ensure that properly supported devlink ops are not accessible before > > > devlink_register, which is last command in the initialization sequence. > > > > > > It makes devlink_reload_enable/_disable not relevant anymore and can be > > > safely deleted. > > > > > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx> > > > > [...] > > > > > diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c > > > index cb6645012a30..09e48fb232a9 100644 > > > --- a/drivers/net/netdevsim/dev.c > > > +++ b/drivers/net/netdevsim/dev.c > > > @@ -1512,7 +1512,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev) > > > > > > nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY; > > > devlink_register(devlink); > > > - devlink_reload_enable(devlink); > > > return 0; > > > > > > err_psample_exit: > > > @@ -1566,9 +1565,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev) > > > struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev); > > > struct devlink *devlink = priv_to_devlink(nsim_dev); > > > > > > - devlink_reload_disable(devlink); > > > devlink_unregister(devlink); > > > - > > > nsim_dev_reload_destroy(nsim_dev); > > > > > > nsim_bpf_dev_exit(nsim_dev); > > > > I didn't remember why devlink_reload_{enable,disable}() were added in > > the first place so it was not clear to me from the commit message why > > they can be removed. It is described in commit a0c76345e3d3 ("devlink: > > disallow reload operation during device cleanup") with a reproducer. > > It was added because devlink ops were accessible by the user space very > early in the driver lifetime. All my latest devlink patches are the > attempt to fix this arch/design/implementation issue. The reproducer in the commit message executed the reload after the device was fully initialized. IIRC, the problem there was that nothing prevented these two tasks from racing: devlink dev reload netdevsim/netdevsim10 echo 10 > /sys/bus/netdevsim/del_device The title also talks about forbidding reload during device cleanup. > > > > > Tried the reproducer with this series and I cannot reproduce the issue. > > Wasn't quite sure why, but it does not seem to be related to "changes to > > allow dynamically set the reload_up/_down callbacks", as this seems to > > be specific to mlx5. > > You didn't reproduce because of my series that moved > devlink_register()/devlink_unregister() to be last/first commands in > .probe()/.remove() flows. Agree, that is what I wrote in the next paragraph of my reply. > > Patch to allow dynamically set ops was needed because mlx5 had logic > like this: > if(something) > devlink_reload_enable() > > And I needed a way to keep this if ... condition. > > > > > IIUC, the reason that the race described in above mentioned commit can > > no longer happen is related to the fact that devlink_unregister() is > > called first in the device dismantle path, after your previous patches. > > Since both the reload operation and devlink_unregister() hold > > 'devlink_mutex', it is not possible for the reload operation to race > > with device dismantle. > > > > Agree? If so, I think it would be good to explain this in the commit > > message unless it's clear to everyone else. > > I don't agree for very simple reason that devlink_mutex is going to be > removed very soon and it is really not a reason why devlink reload is > safer now when before. > > The reload can't race due to: > 1. devlink_unregister(), which works as a barrier to stop accesses > from the user space. > 2. reference counting that ensures that all in-flight commands are counted. > 3. wait_for_completion that blocks till all commands are done. So the wait_for_completion() is what prevents the race, not 'devlink_mutex' that is taken later. This needs to be explained in the commit message to make it clear why the removal is safe. Thanks