On 01/21/2020 08:09 AM, Josef Bacik wrote: > On 1/20/20 7:45 AM, Sun Ke wrote: >> Open /dev/nbdX first, the config_refs will be 1 and >> the pointers in nbd_device are still null. Disconnect >> /dev/nbdX, then reference a null recv_workq. The >> protection by config_refs in nbd_genl_disconnect is useless. >> >> To fix it, just add a check for a non null task_recv in >> nbd_genl_disconnect. >> >> Signed-off-by: Sun Ke <sunke32@xxxxxxxxxx> >> --- >> v1 -> v2: >> >> add an omitted mutex_unlock. >> --- >> drivers/block/nbd.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c >> index b4607dd96185..668bc9cb92ed 100644 >> --- a/drivers/block/nbd.c >> +++ b/drivers/block/nbd.c >> @@ -2008,6 +2008,10 @@ static int nbd_genl_disconnect(struct sk_buff >> *skb, struct genl_info *info) >> index); >> return -EINVAL; >> } >> + if (!nbd->task_recv) { >> + mutex_unlock(&nbd_index_mutex); >> + return -EINVAL; >> + } >> if (!refcount_inc_not_zero(&nbd->refs)) { >> mutex_unlock(&nbd_index_mutex); >> printk(KERN_ERR "nbd: device at index %d is going down\n", >> > > This doesn't even really protect us, we need to have the > nbd->config_lock held here to make sure it's ok. The IOCTL path is safe > because it creates the device on open so it's sure to exist by the time > we get to the disconnect, we don't have that for genl_disconnect. So > I'd add the config_mutex before getting the config_ref, and then do the > check, something like > > mutex_lock(&nbd->config_lock); > if (!refcount_inc_not_zero(&nbd->refs)) { > } > if (!nbd->recv_workq) { > } > mutex_unlock(&nbd->config_lock); > We will be doing a mix of checks/behavior. Maybe we want to settle on one? It seems the code, before my patch, would let you do a open() then do a nbd_genl_disconnect. This function would then try to cleanup what it could and return success. To keep the current behavior/style in nbd_disconnect_and_put would you want to do: nbd_disconnect_and_put() .... if (nbd->task_recv) flush_workqueue(nbd->recv_workq); ? Alternatively, I think if we want to make it so calling nbd_genl_disconnect is not allowed on a device that we have not done a successful nbd_genl_connect/nbd_start_device call on then we want to add the new state bit to indicate nbd_start_device was successful. Or, we could stick to one variable that gets set at start and always use that to indicate nbd_start_device was called ok. For example, for nbd_genl_reconfigure we already check if task_recv is set to check if nbd_start_device was called successfully.