On 11/08/2020 12:37, Philippe Mathieu-Daudé wrote: > You Cc'ed qemu-devel, so Cc'ing the virtio-rng maintainers. > > On 7/15/20 3:32 PM, mwilck@xxxxxxxx wrote: >> From: Martin Wilck <mwilck@xxxxxxxx> >> >> If a program opens /dev/hwrng with O_NONBLOCK and uses poll() and >> non-blocking read() to retrieve random data, it ends up in a tight >> loop with poll() always returning POLLIN and read() returning EAGAIN. >> This repeats forever until some process makes a blocking read() call. >> The reason is that virtio_read() always returns 0 in non-blocking mode, >> even if data is available. Worse, it fetches random data from the >> hypervisor after every non-blocking call, without ever using this data. >> >> The following test program illustrates the behavior and can be used >> for testing and experiments. The problem will only be seen if all >> tasks use non-blocking access; otherwise the blocking reads will >> "recharge" the random pool and cause other, non-blocking reads to >> succeed at least sometimes. >> >> /* Whether to use non-blocking mode in a task, problem occurs if CONDITION is 1 */ >> //#define CONDITION (getpid() % 2 != 0) >> >> static volatile sig_atomic_t stop; >> static void handler(int sig __attribute__((unused))) { stop = 1; } >> >> static void loop(int fd, int sec) >> { >> struct pollfd pfd = { .fd = fd, .events = POLLIN, }; >> unsigned long errors = 0, eagains = 0, bytes = 0, succ = 0; >> int size, rc, rd; >> >> srandom(getpid()); >> if (CONDITION && fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) | O_NONBLOCK) == -1) >> perror("fcntl"); >> size = MINBUFSIZ + random() % (MAXBUFSIZ - MINBUFSIZ + 1); >> >> for(;;) { >> char buf[size]; >> >> if (stop) >> break; >> rc = poll(&pfd, 1, sec); >> if (rc > 0) { >> rd = read(fd, buf, sizeof(buf)); >> if (rd == -1 && errno == EAGAIN) >> eagains++; >> else if (rd == -1) >> errors++; >> else { >> succ++; >> bytes += rd; >> write(1, buf, sizeof(buf)); >> } >> } else if (rc == -1) { >> if (errno != EINTR) >> perror("poll"); >> break; >> } else >> fprintf(stderr, "poll: timeout\n"); >> } >> fprintf(stderr, >> "pid %d %sblocking, bufsize %d, %d seconds, %lu bytes read, %lu success, %lu eagain, %lu errors\n", >> getpid(), CONDITION ? "non-" : "", size, sec, bytes, succ, eagains, errors); >> } >> >> int main(void) >> { >> int fd; >> >> fork(); fork(); >> fd = open("/dev/hwrng", O_RDONLY); >> if (fd == -1) { >> perror("open"); >> return 1; >> }; >> signal(SIGALRM, handler); >> alarm(SECONDS); >> loop(fd, SECONDS); >> close(fd); >> wait(NULL); >> return 0; >> } >> >> void loop(int fd) >> { >> struct pollfd pfd0 = { .fd = fd, .events = POLLIN, }; >> int rc; >> unsigned int n; >> >> for (n = LOOPS; n > 0; n--) { >> struct pollfd pfd = pfd0; >> char buf[SIZE]; >> >> rc = poll(&pfd, 1, 1); >> if (rc > 0) { >> int rd = read(fd, buf, sizeof(buf)); >> >> if (rd == -1) >> perror("read"); >> else >> printf("read %d bytes\n", rd); >> } else if (rc == -1) >> perror("poll"); >> else >> fprintf(stderr, "timeout\n"); >> >> } >> } >> >> int main(void) >> { >> int fd; >> >> fd = open("/dev/hwrng", O_RDONLY|O_NONBLOCK); >> if (fd == -1) { >> perror("open"); >> return 1; >> }; >> loop(fd); >> close(fd); >> return 0; >> } >> >> This can be observed in the real word e.g. with nested qemu/KVM virtual >> machines, if both the "outer" and "inner" VMs have a virtio-rng device. >> If the "inner" VM requests random data, qemu running in the "outer" VM >> uses this device in a non-blocking manner like the test program above. >> >> Fix it by returning available data if a previous hypervisor call has >> completed in the meantime. I tested the patch with the program above, >> and with rng-tools. >> >> Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> >> --- >> drivers/char/hw_random/virtio-rng.c | 14 ++++++++++++++ >> 1 file changed, 14 insertions(+) >> >> diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c >> index 79a6e47b5fbc..984713b35892 100644 >> --- a/drivers/char/hw_random/virtio-rng.c >> +++ b/drivers/char/hw_random/virtio-rng.c >> @@ -59,6 +59,20 @@ static int virtio_read(struct hwrng *rng, void *buf, size_t size, bool wait) >> if (vi->hwrng_removed) >> return -ENODEV; >> >> + /* >> + * If the previous call was non-blocking, we may have got some >> + * randomness already. >> + */ >> + if (vi->busy && completion_done(&vi->have_data)) { >> + unsigned int len; >> + >> + vi->busy = false; >> + len = vi->data_avail > size ? size : vi->data_avail; >> + vi->data_avail -= len; You don't need to modify data_avail. As busy is set to false, the buffer will be reused. and it is always overwritten by virtqueue_get_buf(). And moreover, if it was reused it would be always the beginning. >> + if (len) >> + return len; >> + } >> + >> if (!vi->busy) { >> vi->busy = true; >> reinit_completion(&vi->have_data); >> > Why don't you modify only the wait case? Something like: if (!wait && !completion_done(&vi->have_data)) { return 0; } then at the end you can do "return min(size, vi->data_avail);". Thanks, Laurent _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization