From: Dexuan Cui <decui@xxxxxxxxxxxxx> Sent: Monday, September 9, 2024 9:47 AM > > If the KVP (or VSS) daemon starts before the VMBus channel's ringbuffer is > fully initialized, we can hit the panic below: > > hv_utils: Registering HyperV Utility Driver > hv_vmbus: registering driver hv_utils > ... > BUG: kernel NULL pointer dereference, address: 0000000000000000 > CPU: 44 UID: 0 PID: 2552 Comm: hv_kvp_daemon Tainted: G E 6.11.0-rc3+ #1 > RIP: 0010:hv_pkt_iter_first+0x12/0xd0 > Call Trace: > ... > vmbus_recvpacket > hv_kvp_onchannelcallback > vmbus_on_event > tasklet_action_common > tasklet_action > handle_softirqs > irq_exit_rcu > sysvec_hyperv_stimer0 > </IRQ> > <TASK> > asm_sysvec_hyperv_stimer0 > ... > kvp_register_done > hvt_op_read > vfs_read > ksys_read > __x64_sys_read > > This can happen because the KVP/VSS channel callback can be invoked > even before the channel is fully opened: > 1) as soon as hv_kvp_init() -> hvutil_transport_init() creates > /dev/vmbus/hv_kvp, the kvp daemon can open the device file immediately and > register itself to the driver by writing a message KVP_OP_REGISTER1 to the > file (which is handled by kvp_on_msg() ->kvp_handle_handshake()) and > reading the file for the driver's response, which is handled by > hvt_op_read(), which calls hvt->on_read(), i.e. kvp_register_done(). > > 2) the problem with kvp_register_done() is that it can cause the > channel callback to be called even before the channel is fully opened, > and when the channel callback is starting to run, util_probe()-> > vmbus_open() may have not initialized the ringbuffer yet, so the > callback can hit the panic of NULL pointer dereference. > > To reproduce the panic consistently, we can add a "ssleep(10)" for KVP in > __vmbus_open(), just before the first hv_ringbuffer_init(), and then we > unload and reload the driver hv_utils, and run the daemon manually within > the 10 seconds. > > Fix the panic by checking the channel state in the channel callback. > To avoid the race condition with __vmbus_open(), we disable and enable > the channel callback temporarily in __vmbus_open(). > > The channel callbacks of the other VMBus devices don't need to check > the channel state since they can't run before the channels are fully > initialized. > > Note: we would also need to fix the fcopy driver code, but that has > been removed in commit ec314f61e4fc ("Drivers: hv: Remove fcopy driver") in > the mainline kernel since v6.10. For old 6.x LTS kernels, and the 5.x > and 4.x LTS kernels, the fcopy driver needs to be fixed when the > fix is backported to the stable kernel branches. An alternate approach occurs to me. util_probe() does these three things in order: 1) Allocates the receive buffer 2) Calls the util_init() function, which for KVP and VSS creates the char dev 3) Sets up the VMBus channel, including calling vmbus_open() What if the order of #2 and #3 were swapped in util_probe()? I don't immediately see any interdependency between #2 and #3 for KVP and VSS, nor for Shutdown and Timesync. With the swap, the VMBus channel would be fully open by the time the /dev entry appears and the user space daemon can do anything. I haven't though too deeply about this, so maybe there's a problem somewhere. But if not, it seems a lot cleaner. Michael > > Fixes: e0fa3e5e7df6 ("Drivers: hv: utils: fix a race on userspace daemons registration") > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx> > --- > drivers/hv/channel.c | 11 +++++++++++ > drivers/hv/hv_kvp.c | 3 +++ > drivers/hv/hv_snapshot.c | 3 +++ > 3 files changed, 17 insertions(+) > > diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c > index fb8cd8469328..685e407a3fdf 100644 > --- a/drivers/hv/channel.c > +++ b/drivers/hv/channel.c > @@ -657,6 +657,14 @@ static int __vmbus_open(struct vmbus_channel > *newchannel, > return -ENOMEM; > } > > + /* > + * The channel callbacks of KVP/VSS may run before __vmbus_open() > + * finishes (see kvp_register_done() -> ... -> kvp_poll_wrapper()), so > + * they check newchannel->state to tell the ringbuffer has been fully > + * initialized or not. Disable and enable the tasklet to avoid the race. > + */ > + tasklet_disable(&newchannel->callback_event); > + > newchannel->state = CHANNEL_OPENING_STATE; > newchannel->onchannel_callback = onchannelcallback; > newchannel->channel_callback_context = context; > @@ -750,6 +758,8 @@ static int __vmbus_open(struct vmbus_channel *newchannel, > } > > newchannel->state = CHANNEL_OPENED_STATE; > + tasklet_enable(&newchannel->callback_event); > + > kfree(open_info); > return 0; > > @@ -766,6 +776,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel, > hv_ringbuffer_cleanup(&newchannel->inbound); > vmbus_free_requestor(&newchannel->requestor); > newchannel->state = CHANNEL_OPEN_STATE; > + tasklet_enable(&newchannel->callback_event); > return err; > } > > diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c > index d35b60c06114..ec098067e579 100644 > --- a/drivers/hv/hv_kvp.c > +++ b/drivers/hv/hv_kvp.c > @@ -662,6 +662,9 @@ void hv_kvp_onchannelcallback(void *context) > if (kvp_transaction.state > HVUTIL_READY) > return; > > + if (channel->state != CHANNEL_OPENED_STATE) > + return; > + > if (vmbus_recvpacket(channel, recv_buffer, HV_HYP_PAGE_SIZE * 4, &recvlen, &requestid)) { > pr_err_ratelimited("KVP request received. Could not read into recv buf\n"); > return; > diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c > index 0d2184be1691..f7924c2fc62e 100644 > --- a/drivers/hv/hv_snapshot.c > +++ b/drivers/hv/hv_snapshot.c > @@ -301,6 +301,9 @@ void hv_vss_onchannelcallback(void *context) > if (vss_transaction.state > HVUTIL_READY) > return; > > + if (channel->state != CHANNEL_OPENED_STATE) > + return; > + > if (vmbus_recvpacket(channel, recv_buffer, VSS_MAX_PKT_SIZE, &recvlen, &requestid)) { > pr_err_ratelimited("VSS request received. Could not read into recv buf\n"); > return; > -- > 2.25.1