On Tue, Dec 22, 2020 at 6:04 PM Dmitry Kadashev <dkadashev@xxxxxxxxx> wrote: > > On Tue, Dec 22, 2020 at 11:11 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: > > > > On 22/12/2020 03:35, Pavel Begunkov wrote: > > > On 21/12/2020 11:00, Dmitry Kadashev wrote: > > > [snip] > > >>> We do not share rings between processes. Our rings are accessible from different > > >>> threads (under locks), but nothing fancy. > > >>> > > >>>> In other words, if you kill all your io_uring applications, does it > > >>>> go back to normal? > > >>> > > >>> I'm pretty sure it does not, the only fix is to reboot the box. But I'll find an > > >>> affected box and double check just in case. > > > > > > I can't spot any misaccounting, but I wonder if it can be that your memory is > > > getting fragmented enough to be unable make an allocation of 16 __contiguous__ > > > pages, i.e. sizeof(sqe) * 1024 > > > > > > That's how it's allocated internally: > > > > > > static void *io_mem_alloc(size_t size) > > > { > > > gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP | > > > __GFP_NORETRY; > > > > > > return (void *) __get_free_pages(gfp_flags, get_order(size)); > > > } > > > > > > What about smaller rings? Can you check io_uring of what SQ size it can allocate? > > > That can be a different program, e.g. modify a bit liburing/test/nop. > > > > Even better to allocate N smaller rings, where N = 1024 / SQ_size > > > > static int try_size(int sq_size) > > { > > int ret = 0, i, n = 1024 / sq_size; > > static struct io_uring rings[128]; > > > > for (i = 0; i < n; ++i) { > > if (io_uring_queue_init(sq_size, &rings[i], 0) < 0) { > > ret = -1; > > break; > > } > > } > > for (i -= 1; i >= 0; i--) > > io_uring_queue_exit(&rings[i]); > > return ret; > > } > > > > int main() > > { > > int size; > > > > for (size = 1024; size >= 2; size /= 2) { > > if (!try_size(size)) { > > printf("max size %i\n", size); > > return 0; > > } > > } > > > > printf("can't allocate %i\n", size); > > return 0; > > } > > Unfortunately I've rebooted the box I've used for tests yesterday, so I can't > try this there. Also I was not able to come up with an isolated reproducer for > this yet. > > The good news is I've found a relatively easy way to provoke this on a test VM > using our software. Our app runs with "admin" user perms (plus some > capabilities), it bumps RLIMIT_MEMLOCK to infinity on start. I've also created > an user called 'ioutest' to run the check for ring sizes using a different user. > > I've modified the test program slightly, to show the number of rings > successfully > created on each iteration and the actual error message (to debug a problem I was > having with it, but I've kept this after that). Here is the output: > > # sudo -u admin bash -c 'ulimit -a' | grep locked > max locked memory (kbytes, -l) 1024 > > # sudo -u ioutest bash -c 'ulimit -a' | grep locked > max locked memory (kbytes, -l) 1024 > > # sudo -u admin ./iou-test1 > Failed after 0 rings with 1024 size: Cannot allocate memory > Failed after 0 rings with 512 size: Cannot allocate memory > Failed after 0 rings with 256 size: Cannot allocate memory > Failed after 0 rings with 128 size: Cannot allocate memory > Failed after 0 rings with 64 size: Cannot allocate memory > Failed after 0 rings with 32 size: Cannot allocate memory > Failed after 0 rings with 16 size: Cannot allocate memory > Failed after 0 rings with 8 size: Cannot allocate memory > Failed after 0 rings with 4 size: Cannot allocate memory > Failed after 0 rings with 2 size: Cannot allocate memory > can't allocate 1 > > # sudo -u ioutest ./iou-test1 > max size 1024 > > # ps ax | grep wq > 8 ? I< 0:00 [mm_percpu_wq] > 121 ? I< 0:00 [tpm_dev_wq] > 124 ? I< 0:00 [devfreq_wq] > 20593 pts/1 S+ 0:00 grep --color=auto wq This was on kernel 5.6.7, I'm going to try this on 5.10.1 now. -- Dmitry Kadashev