> Subject: [EXTERNAL] [PATCH] Drivers: hv: kvp/vss: Avoid accessing a ringbuffer > not initialized yet > > If the KVP (or VSS) daemon starts before the VMBus channel's ringbuffer is > fully initialized, we can hit the panic below: > > hv_utils: Registering HyperV Utility Driver > hv_vmbus: registering driver hv_utils > ... > BUG: kernel NULL pointer dereference, address: 0000000000000000 > CPU: 44 UID: 0 PID: 2552 Comm: hv_kvp_daemon Tainted: G E 6.11.0-rc3+ #1 > RIP: 0010:hv_pkt_iter_first+0x12/0xd0 > Call Trace: > ... > vmbus_recvpacket > hv_kvp_onchannelcallback > vmbus_on_event > tasklet_action_common > tasklet_action > handle_softirqs > irq_exit_rcu > sysvec_hyperv_stimer0 > </IRQ> > <TASK> > asm_sysvec_hyperv_stimer0 > ... > kvp_register_done > hvt_op_read > vfs_read > ksys_read > __x64_sys_read > > This can happen because the KVP/VSS channel callback can be invoked even > before the channel is fully opened: > 1) as soon as hv_kvp_init() -> hvutil_transport_init() creates > /dev/vmbus/hv_kvp, the kvp daemon can open the device file immediately > and register itself to the driver by writing a message KVP_OP_REGISTER1 to > the file (which is handled by kvp_on_msg() ->kvp_handle_handshake()) and > reading the file for the driver's response, which is handled by hvt_op_read(), > which calls hvt->on_read(), i.e. kvp_register_done(). > > 2) the problem with kvp_register_done() is that it can cause the channel > callback to be called even before the channel is fully opened, and when the > channel callback is starting to run, util_probe()-> > vmbus_open() may have not initialized the ringbuffer yet, so the callback can > hit the panic of NULL pointer dereference. > > To reproduce the panic consistently, we can add a "ssleep(10)" for KVP in > __vmbus_open(), just before the first hv_ringbuffer_init(), and then we > unload and reload the driver hv_utils, and run the daemon manually within > the 10 seconds. > > Fix the panic by checking the channel state in the channel callback. > To avoid the race condition with __vmbus_open(), we disable and enable the > channel callback temporarily in __vmbus_open(). > > The channel callbacks of the other VMBus devices don't need to check the > channel state since they can't run before the channels are fully initialized. > > Note: we would also need to fix the fcopy driver code, but that has been > removed in commit ec314f61e4fc ("Drivers: hv: Remove fcopy driver") in the > mainline kernel since v6.10. For old 6.x LTS kernels, and the 5.x and 4.x LTS > kernels, the fcopy driver needs to be fixed when the fix is backported to the > stable kernel branches. > > Fixes: e0fa3e5e7df6 ("Drivers: hv: utils: fix a race on userspace daemons > registration") > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx> > --- > drivers/hv/channel.c | 11 +++++++++++ > drivers/hv/hv_kvp.c | 3 +++ > drivers/hv/hv_snapshot.c | 3 +++ > 3 files changed, 17 insertions(+) > > diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c index > fb8cd8469328..685e407a3fdf 100644 > --- a/drivers/hv/channel.c > +++ b/drivers/hv/channel.c > @@ -657,6 +657,14 @@ static int __vmbus_open(struct vmbus_channel > *newchannel, > return -ENOMEM; > } > > + /* > + * The channel callbacks of KVP/VSS may run before __vmbus_open() > + * finishes (see kvp_register_done() -> ... -> kvp_poll_wrapper()), so > + * they check newchannel->state to tell the ringbuffer has been fully > + * initialized or not. Disable and enable the tasklet to avoid the race. > + */ > + tasklet_disable(&newchannel->callback_event); > + > newchannel->state = CHANNEL_OPENING_STATE; > newchannel->onchannel_callback = onchannelcallback; > newchannel->channel_callback_context = context; @@ -750,6 +758,8 > @@ static int __vmbus_open(struct vmbus_channel *newchannel, > } > > newchannel->state = CHANNEL_OPENED_STATE; > + tasklet_enable(&newchannel->callback_event); > + > kfree(open_info); > return 0; > > @@ -766,6 +776,7 @@ static int __vmbus_open(struct vmbus_channel > *newchannel, > hv_ringbuffer_cleanup(&newchannel->inbound); > vmbus_free_requestor(&newchannel->requestor); > newchannel->state = CHANNEL_OPEN_STATE; > + tasklet_enable(&newchannel->callback_event); > return err; > } > > diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c index > d35b60c06114..ec098067e579 100644 > --- a/drivers/hv/hv_kvp.c > +++ b/drivers/hv/hv_kvp.c > @@ -662,6 +662,9 @@ void hv_kvp_onchannelcallback(void *context) > if (kvp_transaction.state > HVUTIL_READY) > return; > > + if (channel->state != CHANNEL_OPENED_STATE) > + return; > + > if (vmbus_recvpacket(channel, recv_buffer, HV_HYP_PAGE_SIZE * 4, > &recvlen, &requestid)) { > pr_err_ratelimited("KVP request received. Could not read into > recv buf\n"); > return; > diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c index > 0d2184be1691..f7924c2fc62e 100644 > --- a/drivers/hv/hv_snapshot.c > +++ b/drivers/hv/hv_snapshot.c > @@ -301,6 +301,9 @@ void hv_vss_onchannelcallback(void *context) > if (vss_transaction.state > HVUTIL_READY) > return; > > + if (channel->state != CHANNEL_OPENED_STATE) > + return; > + > if (vmbus_recvpacket(channel, recv_buffer, VSS_MAX_PKT_SIZE, > &recvlen, &requestid)) { > pr_err_ratelimited("VSS request received. Could not read into > recv buf\n"); > return; > -- > 2.25.1 > Thanks for the fix. Reviewed-by: Saurabh Sengar <ssengar@xxxxxxxxxxxxxxxxxxx>