On 3/22/20 12:54 PM, Pavel Begunkov wrote: > On 22/03/2020 19:24, Pavel Begunkov wrote: >> On 22/03/2020 19:09, Pavel Begunkov wrote: >>> On 19/03/2020 21:56, Jens Axboe wrote: >>>> We always punt async buffered writes to an io-wq helper, as the core >>>> kernel does not have IOCB_NOWAIT support for that. Most buffered async >>>> writes complete very quickly, as it's just a copy operation. This means >>>> that doing multiple locking roundtrips on the shared wqe lock for each >>>> buffered write is wasteful. Additionally, buffered writes are hashed >>>> work items, which means that any buffered write to a given file is >>>> serialized. >>>> >>>> When looking for a new work item, build a chain of identicaly hashed >>>> work items, and then hand back that batch. Until the batch is done, the >>>> caller doesn't have to synchronize with the wqe or worker locks again. >> >> I have an idea, how to do it a bit better. Let me try it. > > The diff below is buggy (Ooopses somewhere in blk-mq for > read-write.c:read_poll_link), I'll double check it on a fresh head. Are you running for-5.7/io_uring merged with master? If not you're missing: commit f1d96a8fcbbbb22d4fbc1d69eaaa678bbb0ff6e2 Author: Pavel Begunkov <asml.silence@xxxxxxxxx> Date: Fri Mar 13 22:29:14 2020 +0300 io_uring: NULL-deref for IOSQE_{ASYNC,DRAIN} which is what I ran into as well last week... > The idea is to keep same-hashed works continuously in @work_list, so > they can be spliced in one go. For each hash bucket, I keep last added > work > - on enqueue, it adds a work after the last one > - on dequeue it splices [first, last]. Where @first is the current > one, because > > of how we traverse @work_list. > > > It throws a bit of extra memory, but > - removes extra looping > - and also takes all hashed requests, but not only sequentially > submitted > > e.g. for the following submission sequence, it will take all (hash=0) > requests. > REQ(hash=0) > REQ(hash=1) > REQ(hash=0) > REQ() > REQ(hash=0) > ... > > > Please, tell if you see a hole in the concept. And as said, there is > still a bug somewhere. The extra memory isn't a bit deal, it's very minor. My main concern would be fairness, since we'd then be grabbing non-contig hashed chunks, before we did not. May not be a concern as long as we ensure the non-hasned (and differently hashed) work can proceed in parallel. For my end, I deliberately added: + /* already have hashed work, let new worker get this */ + if (ret) { + struct io_wqe_acct *acct; + + /* get new worker for unhashed, if none now */ + acct = io_work_get_acct(wqe, work); + if (!atomic_read(&acct->nr_running)) + io_wqe_wake_worker(wqe, acct); + break; + } to try and improve that. I'll run a quick test with yours. -- Jens Axboe