On 2/27/2025 11:54 AM, Naman Jain wrote:
On 2/26/2025 8:03 PM, Greg Kroah-Hartman wrote:
On Wed, Feb 26, 2025 at 05:51:46PM +0530, Naman Jain wrote:
On 2/26/2025 3:33 PM, Greg Kroah-Hartman wrote:
On Wed, Feb 26, 2025 at 10:43:41AM +0530, Naman Jain wrote:
On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote:
On Tue, Feb 25, 2025 at 02:04:43PM +0530, Naman Jain wrote:
On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote:
On Tue, Feb 25, 2025 at 10:50:01AM +0530, Naman Jain wrote:
On regular bootup, devices get registered to vmbus first, so when
uio_hv_generic driver for a particular device type is probed,
the device is already initialized and added, so sysfs creation in
uio_hv_generic probe works fine. However, when device is removed
and brought back, the channel rescinds and device again gets
registered to vmbus. However this time, the uio_hv_generic
driver is
already registered to probe for that device and in this case sysfs
creation is tried before the device gets initialized completely.
Fix this by moving the core logic of sysfs creation for ring
buffer,
from uio_hv_generic to HyperV's vmbus driver, where rest of the
sysfs
attributes for the channels are defined. While doing that, make
use
of attribute groups and macros, instead of creating sysfs
directly,
to ensure better error handling and code flow.
<snip>
+static int hv_mmap_ring_buffer_wrapper(struct file *filp,
struct kobject *kobj,
+ const struct bin_attribute *attr,
+ struct vm_area_struct *vma)
+{
+ struct vmbus_channel *channel = container_of(kobj, struct
vmbus_channel, kobj);
+
+ if (!channel->mmap_ring_buffer)
+ return -ENODEV;
+ return channel->mmap_ring_buffer(channel, vma);
What is preventing mmap_ring_buffer from being set to NULL right
after
checking it and then calling it here? I see no locks here or
where you
are assigning this variable at all, so what is preventing these
types of
races?
thanks,
greg k-h
Thank you so much for reviewing.
I spent some time to understand if this race condition can happen
and it
seems execution flow is pretty sequential, for a particular
channel of a
device.
Unless hv_uio_remove (which makes channel->mmap_ring_buffer NULL)
can be
called in parallel to hv_uio_probe (which had set
channel->mmap_ring_buffer to non NULL), I doubt race can happen
here.
Code Flow: (R, W-> Read, Write to channel->mmap_ring_buffer)
vmbus_device_register
device_register
hv_uio_probe
hv_create_ring_sysfs (W to non NULL)
sysfs_update_group
vmbus_chan_attr_is_visible (R)
vmbus_add_channel_kobj
sysfs_create_group
vmbus_chan_attr_is_visible (R)
hv_mmap_ring_buffer_wrapper (critical section)
hv_uio_remove
hv_remove_ring_sysfs (W to NULL)
Yes, and right in here someone mmaps the file.
I think you can race here, no locks at all feels wrong.
Messing with sysfs groups and files like this is rough, and almost
never
a good idea, why can't you just do this all at once with the default
groups, why is this being added/removed out-of-band?
thanks,
greg k-h
The decision to avoid creating a "ring" sysfs attribute by default
likely stems from a specific use case where it wasn't needed for every
device. By creating it automatically, it keeps the uio_hv_generic
driver simpler and helps prevent potential race conditions.
However, it
has an added cost of having ring buffer for all the channels, where it
is not required. I am trying to find if there are any more
implications
of it.
You do know about the "is_visable" attribute callback, right? Why not
just use that instead of manually mucking around with the
adding/removing of sysfs attributes at all? That is what it was
designed for.
thanks,
greg k-h
Hi Greg,
Yes, I am utilizing that in my patch. For differentiating channels of a
uio_hv_generic device, and for *selectively* creating sysfs, we
introduced this field in channel struct "channel->mmap_ring_buffer",
which we were setting only in uio_hv_generic. But, by the time we set
this in uio_hv_generic driver, the sysfs creation has already gone
through and sysfs doesn't get updated dynamically. That's where there
was a need to call sysfs_update_group. I thought the better place to
keep sysfs_update_group would be in vmbus driver, where we are creating
the original sysfs entries, hence I had to add the wrapper functions.
This led us to the race condition we are trying to address now.
@@ -1838,6 +1872,10 @@ static umode_t vmbus_chan_attr_is_visible(struct
kobject *kobj,
attr == &chan_attr_monitor_id.attr))
return 0;
+ /* Hide ring attribute if channel's mmap_ring_buffer function is
not yet
initialised */
+ if (attr == &chan_attr_ring_buffer.attr && !channel-
>mmap_ring_buffer)
+ return 0;
+
return attr->mode;
Ok, that's good. BUT you need to change the detection of this to be
before the device is set up in the driver. Why can't you do it in the
device creation logic itself instead of after-the-fact when you will
ALWAYS race with userspace?
thanks,
greg k-h
Sure, will check more on this. Thanks.
Regards,
Naman
Hi Greg,
I understand this is deviating from the discussions that we have had
till now, but I wanted to kindly request your opinion on the following
approach, which came up in one of our internal discussions.
By moving the sysfs creation logic from hv_uio_probe to hv_uio_open, I
believe we can address this problem. Here are some of the benefits of
this approach:
* This approach involves minimal changes and provides a localized
solution.
* Since the use-case of ring sysfs is specific to uio_hv_generic and
DPDK, this will give us the flexibility to modify it based on
requirements. For example, ring_buffer_bin_attr.size should depend on
the ring buffer's allocated size, which is easier to manage if the
current code resides in uio_hv_generic.
* The use-case of DPDK is such that at any given time, either the
hv_netvsc kernel driver or the userspace (DPDK) will be controlling this
HV_NIC device. We do not want to expose the ring buffer to userspace
when hv_netvsc is using the device. This is where the "awareness" of the
current user comes into play, and we need a way to control the
visibility of the "ring" sysfs from uio_hv_generic.
Alternatively, I could focus on resolving the race condition you
mentioned and proceed with refining the patch. This approach addresses
most of the general practice concerns you highlighted.
Regards,
Naman