On Thu, Jun 09, 2022 at 12:33:32PM +0000, Arseniy Krasnov wrote:
On 09.06.2022 11:54, Stefano Garzarella wrote:
Hi Arseniy,
I left some comments in the patches, and I'm adding something also here:
Thanks for comments
On Fri, Jun 03, 2022 at 05:27:56AM +0000, Arseniy Krasnov wrote:
INTRODUCTION
Hello, this is experimental implementation of virtio vsock zerocopy
receive. It was inspired by TCP zerocopy receive by Eric Dumazet. This API uses
same idea: call 'mmap()' on socket's descriptor, then every 'getsockopt()' will
fill provided vma area with pages of virtio RX buffers. After received data was
processed by user, pages must be freed by 'madvise()' call with MADV_DONTNEED
flag set(if user won't call 'madvise()', next 'getsockopt()' will fail).
If it is not too time-consuming, can we have a table/list to compare this and the TCP zerocopy?
You mean compare API with more details?
Yes, maybe a comparison from the user's point of view to do zero-copy
with TCP and VSOCK.
DETAILS
Here is how mapping with mapped pages looks exactly: first page mapping
contains array of trimmed virtio vsock packet headers (in contains only length
of data on the corresponding page and 'flags' field):
struct virtio_vsock_usr_hdr {
uint32_t length;
uint32_t flags;
uint32_t copy_len;
};
Field 'length' allows user to know exact size of payload within each sequence
of pages and 'flags' allows user to handle SOCK_SEQPACKET flags(such as message
bounds or record bounds). Field 'copy_len' is described below in 'v1->v2' part.
All other pages are data pages from RX queue.
Page 0 Page 1 Page N
[ hdr1 .. hdrN ][ data ] .. [ data ]
| | ^ ^
| | | |
| *-------------------*
| |
| |
*----------------*
Of course, single header could represent array of pages (when packet's
buffer is bigger than one page).So here is example of detailed mapping layout
for some set of packages. Lets consider that we have the following sequence of
packages: 56 bytes, 4096 bytes and 8200 bytes. All pages: 0,1,2,3,4 and 5 will
be inserted to user's vma(vma is large enough).
In order to have a "userspace polling-friendly approach" and reduce number of syscall, can we allow for example the userspace to mmap at least the first header before packets arrive.
Then the userspace can poll a flag or other fields in the header to understand that there are new packets.
You mean to avoid 'poll()' syscall, user will spin on some flag, provided by kernel on some mapped page? I think yes. This is ok. Also i think, that i can avoid 'madvise' call
to clear memory mapping before each 'getsockopt()' - let 'getsockopt()' do 'madvise()' job by removing pages from previous data. In this case only one system call is needed - 'getsockopt()'.
Yes, that's right. I mean to support both, poll() for interrupt-based
applications and the ability to actively poll a variable in the shared
memory for applications that want to minimize latency.
Thanks,
Stefano