Re: [PATCH v2] io-wq: handle hashed writes in chains

Jens Axboe <axboe@xxxxxxxxx> · Sun, 22 Mar 2020 13:51:32 -0600

On 3/22/20 12:54 PM, Pavel Begunkov wrote:
> On 22/03/2020 19:24, Pavel Begunkov wrote:
>> On 22/03/2020 19:09, Pavel Begunkov wrote:
>>> On 19/03/2020 21:56, Jens Axboe wrote:
>>>> We always punt async buffered writes to an io-wq helper, as the core
>>>> kernel does not have IOCB_NOWAIT support for that. Most buffered async
>>>> writes complete very quickly, as it's just a copy operation. This means
>>>> that doing multiple locking roundtrips on the shared wqe lock for each
>>>> buffered write is wasteful. Additionally, buffered writes are hashed
>>>> work items, which means that any buffered write to a given file is
>>>> serialized.
>>>>
>>>> When looking for a new work item, build a chain of identicaly hashed
>>>> work items, and then hand back that batch. Until the batch is done, the
>>>> caller doesn't have to synchronize with the wqe or worker locks again.
>>
>> I have an idea, how to do it a bit better. Let me try it.
> 
> The diff below is buggy (Ooopses somewhere in blk-mq for
> read-write.c:read_poll_link), I'll double check it on a fresh head.

Are you running for-5.7/io_uring merged with master? If not you're
missing:

commit f1d96a8fcbbbb22d4fbc1d69eaaa678bbb0ff6e2
Author: Pavel Begunkov <asml.silence@xxxxxxxxx>
Date:   Fri Mar 13 22:29:14 2020 +0300

    io_uring: NULL-deref for IOSQE_{ASYNC,DRAIN}

which is what I ran into as well last week...

> The idea is to keep same-hashed works continuously in @work_list, so
> they can be spliced in one go. For each hash bucket, I keep last added
> work
> - on enqueue, it adds a work after the last one
> - on dequeue it splices [first, last]. Where @first is the current
>   one, because
>
> of how we traverse @work_list.
>
> 
> It throws a bit of extra memory, but
> - removes extra looping
> - and also takes all hashed requests, but not only sequentially
>   submitted
> 
> e.g. for the following submission sequence, it will take all (hash=0)
> requests.

> REQ(hash=0)
> REQ(hash=1)
> REQ(hash=0)
> REQ()
> REQ(hash=0)
> ...
> 
> 
> Please, tell if you see a hole in the concept. And as said, there is
> still a bug somewhere.

The extra memory isn't a bit deal, it's very minor. My main concern
would be fairness, since we'd then be grabbing non-contig hashed chunks,
before we did not. May not be a concern as long as we ensure the
non-hasned (and differently hashed) work can proceed in parallel. For my
end, I deliberately added:

+	/* already have hashed work, let new worker get this */
+	if (ret) {
+		struct io_wqe_acct *acct;
+
+		/* get new worker for unhashed, if none now */
+		acct = io_work_get_acct(wqe, work);
+		if (!atomic_read(&acct->nr_running))
+			io_wqe_wake_worker(wqe, acct);
+		break;
+	}

to try and improve that.

I'll run a quick test with yours.

-- 
Jens Axboe