it worked, but there are some issues. with o_dsync and even moderate submission rate threads are stuck in some cpu task (99.9% cpu consumption), and make very slow progress. have you expected it? it must be something specific to uring, i can't reproduce this condition by writing from 2048 threads. On Mon, 17 Aug 2020 at 19:17, Jens Axboe <axboe@xxxxxxxxx> wrote: > > On 8/17/20 8:49 AM, Dmitry Shulyak wrote: > > With 48 threads i am getting 200 mb/s, about the same with 48 separate > > uring instances. > > With single uring instance (or with shared pool) - 60 mb/s. > > fs - ext4, device - ssd. > > You could try something like this kernel addition: > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 4b102d9ad846..8909a1d37801 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -1152,7 +1152,7 @@ static void io_prep_async_work(struct io_kiocb *req) > io_req_init_async(req); > > if (req->flags & REQ_F_ISREG) { > - if (def->hash_reg_file) > + if (def->hash_reg_file && !(req->flags & REQ_F_FORCE_ASYNC)) > io_wq_hash_work(&req->work, file_inode(req->file)); > } else { > if (def->unbound_nonreg_file) > > and then set IOSQE_IO_ASYNC on your writes. That'll parallelize them in > terms of execution. > > -- > Jens Axboe >