Re: [PATCH 1/1] eventfd new tag EFD_VPOLL: generate epoll events

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Roman,

	 I sorry for the delay in my answer, but I needed to set up a minimal
tutorial to show what I am working on and why I need a feature like the
one I am proposing.

Please, have a look of the README.md page here:
https://github.com/virtualsquare/vuos
(everything can be downloaded and tested)

On Fri, May 31, 2019 at 01:48:39PM +0200, Roman Penyaev wrote:
> Since each such a stack has a set of read/write/etc functions you always
> can extend you stack with another call which returns you event mask,
> specifying what exactly you have to do, e.g.:
> 
>     nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1);
>     for (n = 0; n < nfds; ++n) {
>          struct sock *sock;
> 
>          sock = events[n].data.ptr;
>          events = sock->get_events(sock, &events[n]);
> 
>          if (events & EPOLLIN)
>              sock->read(sock);
>          if (events & EPOLLOUT)
>              sock->write(sock);
>     }
> 
> 
> With such a virtual table you can mix all userspace stacks and even
> with normal sockets, for which 'get_events' function can be declared as
> 
> static poll_t kernel_sock_get_events(struct sock *sock, struct epoll_event
> *ev)
> {
>     return ev->events;
> }
> 
> Do I miss something?

I am not trying to port some tools to use user-space implemented stacks or device
drivers/emulators, I am seeking to a general purpose approach.

I think that the example in the section of the README "mount a user-level 
networking stack" explains the situation.

The submodule vunetvdestack uses a namespace to define a networking stack connected
to a VDE network (see https://github.com/rd235/vdeplug4).

The API is clean (as it can be seen at the end of the file vunet_modules/vunetvdestack.c).
All the methods but "socket" are directly mapped to their system call counterparts:

struct vunet_operations vunet_ops = {
  .socket = vdestack_socket,
  .bind = bind,
  .connect = connect,
  .listen = listen,
  .accept4 = accept4,
....
	.epoll_ctl = epoll_ctl,
...
}

(the elegance of the API can be seen also in vunet_modules/vunetreal.c: a 38 lines module
 implementing a gateway to the real networking of the hosting machine)

Unfortunately I cannot use the same clean interface to support user-library implemented
stacks like lwip/lwipv6/picotcp because I cannot generate EPOLL events...

Bizantine workarounds based on data structures exchanged in the data.ptr field of epoll_event
that must be decoded by the hypervisor to retrieve the missing information about the event
can be implemented... but it would be a pity ;-)

The same problem arises in umdev modules: virtual devices should generate the same
EPOLL events of their real couterparts.

I feel that the ability to generate/synthesize EPOLL events could be useful for many projects.
(In my first message I included some URLs of people seeking for this feature, retrieved by
 some queries on a web search engine)

Implementations may vary as well as the kernel API to support such a feature.
As I told, my proposal has a minimal impact on the code, it does not require the definition
of new syscalls, it simply enhances the features of eventfd.

> 
> Eventually you come up with such a lock to protect your tcp or whatever
> state machine.  Or you have a real example where read and write paths
> can work completely independently?

Actually umvu hypervisor uses concurrent tracing of concurrent processes.
We have named this technique "guardian angels": each process/thread running in the
partial virtual machine has a correspondent thread in the hypervisor.
So if a process uses two threads to manage a network connection (say a TCP stream),
the two guardian angels replicate their requests towards the networking module.

So I am looking for a general solution, not to a pattern to port some projects.
(and I cannot use two different approaches for event driven and multi-threaded
 implementations as I have to support both).

If you reached this point...  Thank you for your patience.
I am more than pleased to receive further comments or proposals.

	renzo



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux