Andrew, thanks for picking up this patch series in -mm. Please drop it. After discussions with Vegard, I have something better now. Cheers, Michael On 08/16/2016 11:10 PM, Michael Kerrisk (man-pages) wrote: > When changing a pipe's capacity with fcntl(F_SETPIPE_SZ), various > limits defined by /proc/sys/fs/pipe-* files are checked to see > if unprivileged users are exceeding limits on memory consumption. > > While documenting and testing the operation of these limits I noticed > that, as currently implemented, these checks can lead to cases where > a user can increase a pipe's capacity and is then unable to decrease > the capacity. The origin of the problem is two-fold: > > (1) When increasing the pipe capacity, the checks against the limits > in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against > existing consumption, and exclude the memory required for the > increased pipe capacity. The new increase in pipe capacity > can then push the total memory used by the user for pipes > (possibly far) over a limit. > > (2) The limit checks are performed even when the new pipe capacity > is less than the existing pipe capacity. This can lead to > problems if a user sets a large pipe capacity, and then the > limits are lowered, with the result that the user will no > longer be able to decrease the pipe capacity. > > The simple solution given by this patch is to perform the checks > only when the pipe capacity is being increased. The patch does not > address the broken check in (1), which allows a user to (one-time) > set a pipe capacity that pushes the user's consumption over the user > pipe limits. A change to fix that check is proposed in a subsequent > patch. I've separated the two fixes because the second fix is a > little more complex, and could possibly (though unlikely) break > existing user-space. The current patch implements the simple fix > that carries little risk and seems obviously correct: allowing an > unprivileged user always to decrease a pipe's capacity. > > The program below can be used to demonstrate the problem, and the > effect of the fix. The program takes one or more command-line > arguments. The first argument specifies the number of pipes > that the program should create. The remaining arguments are, > alternately, pipe capacities that should be set using > fcntl(F_SETPIPE_SZ), and sleep intervals (in seconds) between > the fcntl() operations. (The sleep intervals allow the possibility > to change the limits between fcntl() operations.) > > Running this program on an unpatched kernel, we first set some limits: > > # getconf PAGESIZE > 4096 > # echo 0 > /proc/sys/fs/pipe-user-pages-soft > # echo 1000000000 > /proc/sys/fs/pipe-max-size > # echo 10000 > /proc/sys/fs/pipe-user-pages-hard # 40.96 MB > > Now perform two fcntl(F_SETPIPE_SZ) operations on a single pipe, > first setting a pipe capacity (10MB), sleeping for a few seconds, > during which time the hard limit is lowered, and then set pipe > capacity to a smaller amount (5MB): > > # sudo -u mtk ./test_F_SETPIPE_SZ 1 10000000 15 5000000 & > [1] 748 > # Loop 1: set pipe capacity to 10000000 bytes > F_SETPIPE_SZ returned 16777216 > Sleeping 15 seconds > > # echo 1000 > /proc/sys/fs/pipe-user-pages-hard # 4.096 MB > > # Loop 2: set pipe capacity to 5000000 bytes > Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted > > In this case, the user should be able to lower the limit. > > With a kernel that has the patch below, the second fcntl() > succeeds: > > # echo 0 > /proc/sys/fs/pipe-user-pages-soft > # echo 1000000000 > /proc/sys/fs/pipe-max-size > # echo 10000 > /proc/sys/fs/pipe-user-pages-hard > # sudo -u mtk ./test_F_SETPIPE_SZ 1 10000000 15 5000000 & > [1] 3215 > # Loop 1: set pipe capacity to 10000000 bytes > F_SETPIPE_SZ returned 16777216 > Sleeping 15 seconds > > # echo 1000 > /proc/sys/fs/pipe-user-pages-hard > > # Loop 2: set pipe capacity to 5000000 bytes > F_SETPIPE_SZ returned 8388608 > > 8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x--- > > /* test_F_SETPIPE_SZ.c > > (C) 2016, Michael Kerrisk; licensed under GNU GPL version 2 or later > > Test operation of fcntl(F_SETPIPE_SZ) for setting pipe capacity > and interactions with limits defined by /proc/sys/fs/pipe-* files. > */ > > int > main(int argc, char *argv[]) > { > int (*pfd)[2]; > int npipes; > int pcap, rcap; > int j, p, s, stime, loop; > > if (argc < 2) { > fprintf(stderr, "Usage: %s num-pipes " > "[pipe-capacity sleep-time]...\n", argv[0]); > exit(EXIT_FAILURE); > } > > npipes = atoi(argv[1]); > > pfd = calloc(npipes, sizeof (int [2])); > if (pfd == NULL) { > perror("calloc"); > exit(EXIT_FAILURE); > } > > for (j = 0; j < npipes; j++) { > if (pipe(pfd[j]) == -1) { > fprintf(stderr, "Loop %d: pipe() failed: ", j); > perror("pipe"); > exit(EXIT_FAILURE); > } > } > > for (j = 2; j < argc; j += 2 ) { > loop = j / 2; > pcap = atoi(argv[j]); > printf(" Loop %d: set pipe capacity to %d bytes\n", loop, pcap); > > for (p = 0; p < npipes; p++) { > s = fcntl(pfd[p][0], F_SETPIPE_SZ, pcap); > if (s == -1) { > fprintf(stderr, " Loop %d, pipe %d: F_SETPIPE_SZ " > "failed: ", loop, p); > perror("fcntl"); > exit(EXIT_FAILURE); > } > > if (p == 0) { > printf(" F_SETPIPE_SZ returned %d\n", s); > rcap = s; > } else { > if (s != rcap) { > fprintf(stderr, " Loop %d, pipe %d: F_SETPIPE_SZ " > "unexpected return: %d\n", loop, p, s); > exit(EXIT_FAILURE); > } > } > > stime = (j + 1 < argc) ? atoi(argv[j + 1]) : 0; > if (stime > 0) { > printf(" Sleeping %d seconds\n", stime); > sleep(stime); > } > } > } > > exit(EXIT_SUCCESS); > } > > 8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x--- > > Cc: Willy Tarreau <w@xxxxxx> > Cc: Vegard Nossum <vegard.nossum@xxxxxxxxxx> > Cc: socketpair@xxxxxxxxx > Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > Cc: Jens Axboe <axboe@xxxxxx> > Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Cc: linux-api@xxxxxxxxxxxxxxx > Cc: linux-kernel@xxxxxxxxxxxxxxx > Signed-off-by: Michael Kerrisk <mtk.manpages@xxxxxxxxx> > --- > fs/pipe.c | 25 +++++++++++++++++-------- > 1 file changed, 17 insertions(+), 8 deletions(-) > > diff --git a/fs/pipe.c b/fs/pipe.c > index 4ebe6b2..a98ebca 100644 > --- a/fs/pipe.c > +++ b/fs/pipe.c > @@ -1122,14 +1122,23 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned long arg) > if (!nr_pages) > goto out; > > - if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) { > - ret = -EPERM; > - goto out; > - } else if ((too_many_pipe_buffers_hard(pipe->user) || > - too_many_pipe_buffers_soft(pipe->user)) && > - !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) { > - ret = -EPERM; > - goto out; > + /* > + * If trying to increase the pipe capacity, check that an > + * unprivileged user is not trying to exceed various limits. > + * (Decreasing the pipe capacity is always permitted, even > + * if the user is currently over a limit.) > + */ > + if (nr_pages > pipe->buffers) { > + if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) { > + ret = -EPERM; > + goto out; > + } else if ((too_many_pipe_buffers_hard(pipe->user) || > + too_many_pipe_buffers_soft(pipe->user)) && > + !capable(CAP_SYS_RESOURCE) && > + !capable(CAP_SYS_ADMIN)) { > + ret = -EPERM; > + goto out; > + } > } > ret = pipe_set_size(pipe, nr_pages); > break; > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html