Re: [PATCH] epoll: add exclusive wakeups flag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Michael,

On 03/14/2016 07:26 PM, Michael Kerrisk (man-pages) wrote:
> Hi Jason,
> 
> On 03/15/2016 11:35 AM, Jason Baron wrote:
>> Hi Michael,
>>
>> On 03/14/2016 05:03 PM, Michael Kerrisk (man-pages) wrote:
>>> Hi Jason,
>>>
>>> On 03/15/2016 09:01 AM, Michael Kerrisk (man-pages) wrote:
>>>> Hi Jason,
>>>>
>>>> On 03/15/2016 08:32 AM, Jason Baron wrote:
>>>>>
>>>>>
>>>>> On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
>>>>>> [Restoring CC, which I see I accidentally dropped, one iteration back.]
>>>
>>> [...]
>>>
>>>>>> Returning to the second sentence in this description:
>>>>>>
>>>>>>               When a wakeup event occurs and multiple epoll file descrip‐
>>>>>>               tors are attached to the same target file using EPOLLEXCLU‐
>>>>>>               SIVE, one or  more  of  the  epoll  file  descriptors  will
>>>>>>               receive  an  event with epoll_wait(2).
>>>>>>
>>>>>> There is a point that is unclear to me: what does "target file" refer to?
>>>>>> Is it an open file description (aka open file table entry) or an inode?
>>>>>> I suspect the former, but it was not clear in your original text.
>>>>>>
>>>>>
>>>>> So from epoll's perspective, the wakeups are associated with a 'wait
>>>>> queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
>>>>> file->poll()) results in adding to the same 'wait queue' then we will
>>>>> get 'exclusive' wakeup behavior.
>>>>>
>>>>> So in general, I think the answer here is that its associated with the
>>>>> inode (I coudn't say with 100% certainty without really looking at all
>>>>> file->poll() implementations). Certainly, with the 'FIFO' example below,
>>>>> the two scenarios will have the same behavior with respect to
>>>>> EPOLLEXCLUSIVE.
>>>
>>> So, I was actually a little surprised by this, and went away and tested
>>> this point. It appears to me that that the two scenarios described below
>>> do NOT have the same behavior with respect to EPOLLEXCLUSIVE. See below.
>>>
>>>> So, in both scenarios, *one or more* processes will get a wakeup?
>>>> (I'll try to add something to the text to clarify the detail we're 
>>>> discussing.)
>>>>
>>>>> Also, the 'non-exclusive' mode would be subject to the same question of
>>>>> which wait queue is the epfd is associated with...
>>>>
>>>> I'm not sure of the point you are trying to make here?
>>>>
>>>> Cheers,
>>>>
>>>> Michael
>>>>
>>>>
>>>>>> To make this point even clearer, here are two scenarios I'm thinking of.
>>>>>> In each case, we're talking of monitoring the read end of a FIFO.
>>>>>>
>>>>>> ===
>>>>>>
>>>>>> Scenario 1:
>>>>>>
>>>>>> We have three processes each of which
>>>>>> 1. Creates an epoll instance
>>>>>> 2. Opens the read end of the FIFO
>>>>>> 3. Adds the read end of the FIFO to the epoll instance, specifying
>>>>>>    EPOLLEXCLUSIVE
>>>>>>
>>>>>> When input becomes available on the FIFO, how many processes
>>>>>> get a wakeup?
>>>
>>> When I test this scenario, all three processes get a wakeup.
>>>
>>>>>> ===
>>>>>>
>>>>>> Scenario 3
>>>>>>
>>>>>> A parent process opens the read end of a FIFO and then calls
>>>>>> fork() three times to create three children. Each child then:
>>>>>>
>>>>>> 1. Creates an epoll instance
>>>>>> 2. Adds the read end of the FIFO to the epoll instance, specifying
>>>>>> EPOLLEXCLUSIVE
>>>>>>
>>>>>> When input becomes available on the FIFO, how many processes
>>>>>> get a wakeup?
>>>
>>> When I test this scenario, one process gets a wakeup.
>>>
>>> In other words, "target file" appears to mean open file description
>>> (aka open file table entry), not inode.
>>>
>>> This is actually what I suspected might be the case, but now I am
>>> puzzled. Given what I've discovered and what you suggest are the
>>> semantics, is the implementation correct? (I suspect that it is,
>>> but it is at odds with your statement above. My test programs are
>>> inline below.
>>>
>>> Cheers,
>>>
>>> Michael
>>>
>>
>> Thanks for the test cases. So in your first test case, you are exiting
>> immediately after the epoll_wait() returns. So this is actually causing
>> the next wakeup. 
> 
> Can I just check my understanding of the rationale for the preceding 
> point. The next process is getting woken up, because the previous process
> did not "consume" the event (that is, the input is still available on the 
> FIFO). Right?
> 
>> And then the 2nd thread returns from epoll_wait() and
>> this causes the 3rd wakeup.
> 
> I added the sleep() calls, but still things don't seem to happen
> quite as you suggest. In the first scenario, after the first process
> terminates, *all* of the remaining processes wake from epoll_wait().
> What's happening in this case? (This smells like a possible bug.)

Yes, you are right. When the first process exits() and thus closes
the read-side of the pipe, it will wake up all the other calls that
are in epoll_wait(). When the file closes, since there are no more
fds referencing it, the pipe_release() routine does a wakeup specifying
both POLLIN and POLLOUT. In this case, all of the epoll exclusive
waiters will get a wakeup. The combination of POLLIN and POLLOUT is
not expected to be the typical use-case here. Normally, we would have
the threads in event loops, and just POLLIN would be set resulting
in the exclusive waskeup behavior. So yes, there can be multiple
wakeups in some cases, but the *common* case of only POLLIN or only
POLLOUT set will yield exclusive wakeups. There is no guarantee here
that only 1 thread wakes up - only that *at least* one.

> 
> In the second scenario (fork()), after the first process terminates
> (without consuming the FIFO input), all of the other processes remain 
> blocked in epoll-wait(). (Note, I extended the test program here
> to allow the number of child processes to be specified as a command-line
> argument.) I think I can make sense of that: it's because the open 
> file descriptor for the read end of the FIFO has been duplicated
> in all of the child processes, and closing the FD in one child
> does not cause the corresponding open file description in other
> processes to be torn down because there are other FDs that still
> refer to it.
> 

Yes, exactly. The final process is going to invoke the pipe_release(),
but at that point there is nobody left to wakeup.

>> So the wakeups are actually not happening from the write directly, but
>> instead from the readers doing a close(). If you do some sort of sleep
>> after the epoll_wait() you can confirm the behavior. So I believe this
>> is working as expected.
> 
> As note above, I'm still slightly puzzled.
> Revised test programs pasted below.
> 

Ok, hopefully this makes sense.

Thanks,

-Jason

> Cheers,
> 
> Michael
> 
> ==========
> 
> /* t_EPOLLEXCLUSIVE_multiopen.c
> 
>   Licensed under GNU GPLv2 or later.
> */
> 
> #include <sys/epoll.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
> 
> #define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
>                         } while (0)
> 
> #define usageErr(msg, progName) \
>                         do { fprintf(stderr, "Usage: "); \
>                              fprintf(stderr, msg, progName); \
>                              exit(EXIT_FAILURE); } while (0)
> 
> #ifndef EPOLLEXCLUSIVE
> #define EPOLLEXCLUSIVE (1 << 28)
> #endif
> 
> int
> main(int argc, char *argv[])
> {
>     int fd, epfd, nready;
>     struct epoll_event ev, rev;
> 
>     if (argc != 2 || strcmp(argv[1], "--help") == 0)
>         usageErr("%s <FIFO>\n", argv[0]);
> 
>     epfd = epoll_create(2);
>     if (epfd == -1)
>         errExit("epoll_create");
> 
>     fd = open(argv[1], O_RDONLY);
>     if (fd == -1)
>         errExit("open");
>     printf("Opened %s\n", argv[1]);
> 
>     ev.events = EPOLLIN | EPOLLEXCLUSIVE;
>     if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
>         errExit("epoll_ctl");
> 
>     nready = epoll_wait(epfd, &rev, 1, -1);
>     if (nready == -1)
>         errExit("epoll-wait");
>     printf("epoll_wait() returned %d\n", nready);
> 
>     printf("sleeping\n");
>     sleep(3);
>     printf("Terminating\n");
>     exit(EXIT_SUCCESS);
> }
> 
> ===================
> 
> /* t_EPOLLEXCLUSIVE_fork.c 
>  
>   Licensed under GNU GPLv2 or later.
> */
> 
> #include <sys/epoll.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
> 
> #define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
>                         } while (0)
> 
> #define usageErr(msg, progName) \
>                         do { fprintf(stderr, "Usage: "); \
>                              fprintf(stderr, msg, progName); \
>                              exit(EXIT_FAILURE); } while (0)
> 
> #ifndef EPOLLEXCLUSIVE
> #define EPOLLEXCLUSIVE (1 << 28)
> #endif
> 
> int
> main(int argc, char *argv[])
> {
>     int fd, epfd, nready;
>     struct epoll_event ev, rev;
>     int cnum, cmax;
> 
>     if (argc < 2 || strcmp(argv[1], "--help") == 0)
>         usageErr("%s <FIFO> [num-children]\n", argv[0]);
> 
>     fd = open(argv[1], O_RDONLY);
>     if (fd == -1)
>         errExit("open");
>     printf("Opened %s\n", argv[1]);
> 
>     cmax = (argc > 2) ? atoi(argv[2]) : 3;
> 
>     for (cnum = 0; cnum < cmax; cnum++) {
>         switch (fork()) {
>         case -1:
>             errExit("fork");
> 
>         case 0: /* Child */
>             epfd = epoll_create(2);
>             if (epfd == -1)
>                 errExit("epoll_create");
> 
>             ev.events = EPOLLIN | EPOLLEXCLUSIVE;
>             if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
>                 errExit("epoll_ctl");
> 
>             nready = epoll_wait(epfd, &rev, 1, -1);
>             if (nready == -1)
>                 errExit("epoll-wait");
>             printf("Child %d: epoll_wait() returned %d\n", cnum, nready);
>             printf("sleeping\n");
>             sleep(3);
>             printf("Child %d terminating\n", cnum);
>             exit(EXIT_SUCCESS);
> 
>         default:
>             break;
>         }
>     }
> 
>     for (cnum = 0; cnum < cmax; cnum++)
>         wait(NULL);
> 
>     exit(EXIT_SUCCESS);
> }
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux