From: Boqun Feng <boqun.feng@xxxxxxxxx> Sent: Tuesday, February 20, 2024 9:29 AM > > On Mon, Feb 19, 2024 at 10:30:07PM -0800, Saurabh Singh Sengar wrote: > > On Mon, Feb 12, 2024 at 10:19:59PM -0800, mhkelley58@xxxxxxxxx wrote: > > > From: Michael Kelley <mhklinux@xxxxxxxxxxx> > > > > > > The VMBUS_RING_SIZE macro adds space for a ring buffer header to the > > > requested ring buffer size. The header size is always 1 page, and so > > > its size varies based on the PAGE_SIZE for which the kernel is built. > > > If the requested ring buffer size is a large power-of-2 size and the header > > > size is small, the resulting size is inefficient in its use of memory. > > > For example, a 512 Kbyte ring buffer with a 4 Kbyte page size results in > > > a 516 Kbyte allocation, which is rounded to up 1 Mbyte by the memory > > > allocator, and wastes 508 Kbytes of memory. > > > > > > In such situations, the exact size of the ring buffer isn't that important, > > > and it's OK to allocate the 4 Kbyte header at the beginning of the 512 > > > Kbytes, leaving the ring buffer itself with just 508 Kbytes. The memory > > > allocation can be 512 Kbytes instead of 1 Mbyte and nothing is wasted. > > > > > > Update VMBUS_RING_SIZE to implement this approach for "large" ring buffer > > > sizes. "Large" is somewhat arbitrarily defined as 8 times the size of > > > the ring buffer header (which is of size PAGE_SIZE). For example, for > > > 4 Kbyte PAGE_SIZE, ring buffers of 32 Kbytes and larger use the first > > > 4 Kbytes as the ring buffer header. For 64 Kbyte PAGE_SIZE, ring buffers > > > of 512 Kbytes and larger use the first 64 Kbytes as the ring buffer > > > header. In both cases, smaller sizes add space for the header so > > > the ring size isn't reduced too much by using part of the space for > > > the header. For example, with a 64 Kbyte page size, we don't want > > > a 128 Kbyte ring buffer to be reduced to 64 Kbytes by allocating half > > > of the space for the header. In such a case, the memory allocation > > > is less efficient, but it's the best that can be done. > > > > > > Fixes: c1135c7fd0e9 ("Drivers: hv: vmbus: Introduce types of GPADL") > > > Signed-off-by: Michael Kelley <mhklinux@xxxxxxxxxxx> > > > --- > > > include/linux/hyperv.h | 22 +++++++++++++++++++++- > > > 1 file changed, 21 insertions(+), 1 deletion(-) > > > > > > diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h > > > index 2b00faf98017..6ef0557b4bff 100644 > > > --- a/include/linux/hyperv.h > > > +++ b/include/linux/hyperv.h > > > @@ -164,8 +164,28 @@ struct hv_ring_buffer { > > > u8 buffer[]; > > > } __packed; > > > > > > + > > > +/* > > > + * If the requested ring buffer size is at least 8 times the size of the > > > + * header, steal space from the ring buffer for the header. Otherwise, add > > > + * space for the header so that is doesn't take too much of the ring buffer > > > + * space. > > > + * > > > + * The factor of 8 is somewhat arbitrary. The goal is to prevent adding a > > > + * relatively small header (4 Kbytes on x86) to a large-ish power-of-2 ring > > > + * buffer size (such as 128 Kbytes) and so end up making a nearly twice as > > > + * large allocation that will be almost half wasted. As a contrasting example, > > > + * on ARM64 with 64 Kbyte page size, we don't want to take 64 Kbytes for the > > > + * header from a 128 Kbyte allocation, leaving only 64 Kbytes for the ring. > > > + * In this latter case, we must add 64 Kbytes for the header and not worry > > > + * about what's wasted. > > > + */ > > > +#define VMBUS_HEADER_ADJ(payload_sz) \ > > > + ((payload_sz) >= 8 * sizeof(struct hv_ring_buffer) ? \ > > > + 0 : sizeof(struct hv_ring_buffer)) > > > + > > > /* Calculate the proper size of a ringbuffer, it must be page-aligned */ > > > -#define VMBUS_RING_SIZE(payload_sz) PAGE_ALIGN(sizeof(struct hv_ring_buffer) + \ > > > +#define VMBUS_RING_SIZE(payload_sz) PAGE_ALIGN(VMBUS_HEADER_ADJ(payload_sz) + \ > > > (payload_sz)) > > I generally see the point of this patch, however, it changes the > semantics of VMBUS_RING_SIZE() (similiar as Saurabh mentioned below), > before VMBUS_RING_SIZE() will give you a ring buffer size which has at > least "payload_sz" bytes, but after the change, you may not get "enough" > bytes for the vmbus ring buffer. Storvsc and netvsc were previously not using VMBUS_RING_SIZE(), so space for the ring buffer header was already being "stolen" from the specified ring size. But if this new version of VMBUS_RING_SIZE() still allows the ring buffer header space to be "stolen" from the large-ish ring buffers, I don't see that as a problem. The ring buffer sizes have always been a swag value for the low-speed devices with small ring buffers. For the high-speed devices (netvsc and storvsc) the ring buffer sizes have been somewhat determined by perf measurements, but the ring buffers are much bigger, so stealing a little bit of space doesn't have any noticeable effect. Even with the perf measurements, the sizes aren't very exact. Performance just isn't sensitive to the size of the ring buffer except at a gross level. > > One cause of the waste memory is using alloc_pages() to get physical > continuous, however, after a quick look into GPADL, looks like it also > supports uncontinuous pages. Maybe that's the longer-term solution? Yes, that seems like a desirable long-term solution. While the GPADL code handles vmalloc memory correctly (because the netvsc send and receive buffers are vmalloc memory), hv_ringbuffer_init() assumes physically contiguous memory. It would need to use vmalloc_to_page() in building the pages_wraparound array instead of just indexing into the struct page array. But that's an easy fix. Another problem is the uio_hv_generic.c driver, where hv_uio_ring_mmap() assumes a physically contiguous ring. Maybe there's a straightforward way to fix it as well, but it isn't immediately obvious to me. Using vmalloc memory for ring buffers has another benefit as well. Today, adding a new NIC to a VM that has been running for a while could fail because of not being able to allocate 1 Mbyte contiguous memory for the ring buffers used by each VMBus channel. There could be plenty of memory available, but fragmentation could prevent getting 1 Mbyte contiguous. Using vmalloc'ed ring buffers would solve this problem. Michael > > Regards, > Boqun > > > > > > > struct hv_ring_buffer_info { > > > > Thanks for the patch. > > It's worth noting that this will affect the size of ringbuffer calculation for > > some of the drivers: netvsc, storvsc_drv, hid-hyperv, and hyperv-keyboard.c. > > It will be nice to have this comment added in commit for future reference. > > > > Looks a good improvement to me, > > Reviewed-by: Saurabh Sengar <ssengar@xxxxxxxxxxxxxxxxxxx> > > > > > -- > > > 2.25.1 > > >