Re: Indefinitely sleeping task in poll_schedule_timeout()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 06-01-22 10:41:27, Linus Torvalds wrote:
> On Thu, Jan 6, 2022 at 7:45 AM Jan Kara <jack@xxxxxxx> wrote:
> >
> > So maybe something like attached patch (boot tested only so far)? What you
> > do think?
> 
> I don't think the patch is wrong, but it looks like we've actually
> never reacted to bad file descriptors, so I get the strong feeling
> that the patch might break existing code that has worked by accident
> before.

Well, max_select_fd() returns -EBADF in case some file descriptor is wrong.
But yes, once this initial test passes we didn't care about bad fd. So if
the timing used to work out so that fd was never closed before the
max_select_fd() check I agree my patch could break such app.

> And the breakage might end up very subtle - select() randomly now
> returning EBADFD if you have a race with somebody closing the fd that
> previously was benign and silent. Which is going to be hell to debug
> because it's some sporadic user-space race with close and select at
> the same time, and it might easily actually break user space because
> somebody just takes "error from select" to be fatal and asserts it.
> 
> What do_pollfd() does if there's no file associated with the fd is to
> use EPOLLNVAL (an actual negative fd is ignored):
> 
>         mask = EPOLLNVAL;
>         f = fdget(fd);
>         if (!f.file)
>                 goto out;
> 
> and I wonder if we should try something equivalent for select() as a
> slightly less drastic change.
> 
> Of course, select() doesn't actually have EPOLLNVAL, but it does have
> EPOLLERR and EPOLLPRI, which would set all the bits for that fd.
> 
> The patch would be something like this (NOTE NOTE NOTE: this is not
> only whitespace-damaged - it was done with "git diff -w" to not show
> the whitespace change. Look at how it changes the nesting of that "if
> (f.file)" thing, and it needs to change the indentation of all those
> subsequent if-statements that check the mask. I did it this way just
> to show the logic change, a real patch would be much bigger due to the
> whitespace change).
> 
>     diff --git a/fs/select.c b/fs/select.c
>     index 945896d0ac9e..d77f8a614ad4 100644
>     --- a/fs/select.c
>     +++ b/fs/select.c
>     @@ -528,12 +528,14 @@ static int do_select(int n, fd_set_bits *fds,
>                     if (!(bit & all_bits))
>                             continue;
>                     f = fdget(i);
>     +               mask = EPOLLERR | EPOLLPRI;
>                     if (f.file) {
>                             wait_key_set(wait, in, out, bit,
>                                          busy_flag);
>                             mask = vfs_poll(f.file, wait);
> 
>                             fdput(f);
>     +               }
>                     if ((mask & POLLIN_SET) && (in & bit)) {
>                             res_in |= bit;
>                             retval++;
>     @@ -560,8 +562,6 @@ static int do_select(int n, fd_set_bits *fds,
>                      */
>                     } else if (busy_flag & mask)
>                             can_busy_loop = true;
>     -
>     -               }
>             }
>             if (res_in)
>                     *rinp = res_in;
> 

OK, nice, I think this would work and probably has lower chance of breaking
someone.

> That said: while I don't think your patch is wrong, if we do want to
> do that EBADF thing, I think your patch is still very very ugly. You
> are adding this new 'bad_fd' variable for no good reason.
> 
> We already have an error variable: 'table.error'.
> 
> So a *minimal* patch that does what you suggest is to just do something like
> 
>     diff --git a/fs/select.c b/fs/select.c
>     index 945896d0ac9e..7a06d28ec83d 100644
>     --- a/fs/select.c
>     +++ b/fs/select.c
>     @@ -528,7 +528,9 @@ static int do_select(int n, fd_set_bits *fds,
> struct timespec64 *end_time)
>                     if (!(bit & all_bits))
>                             continue;
>                     f = fdget(i);
>     -               if (f.file) {
>     +               if (unlikely(!f.file)) {
>     +                       table.error = -EBADF;
>     +               } else {
>                             wait_key_set(wait, in, out, bit,
>                                          busy_flag);
>                             mask = vfs_poll(f.file, wait);
> 
> (which is again whitespace-damaged, but this time that's just to make
> it legible inline - there's no indentation change).

Yup, nicer.

> Hmm? I'd be ok with either of these (the second of these patches
> _should_ be equivalent to yours), but I think that first one might be
> the one that might cause less potential user-space breakage.

Agreed. I'll code it and give it some testing. 

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux