Re: [RFC] iomap: fix race between readahead and direct write

"yukuai (C)" <yukuai3@xxxxxxxxxx> · Sun, 19 Jan 2020 19:21:24 +0800

On 2020/1/19 15:58, Matthew Wilcox wrote:
On Sun, Jan 19, 2020 at 02:55:14PM +0800, yukuai (C) wrote:
On 2020/1/19 14:14, Matthew Wilcox wrote:
I don't understand your reasoning here.  If another process wants to
access a page of the file which isn't currently in cache, it would have
to first read the page in from storage.  If it's under readahead, it
has to wait for the read to finish.  Why is the second case worse than
the second?  It seems better to me.

Thanks for your response! My worries is that, for example:

We read page 0, and trigger readahead to read n pages(0 - n-1). While in
another thread, we read page n-1.

In the current implementation, if readahead is in the process of reading
page 0 - n-2,  later operation doesn't need to wait the former one to
finish. However, later operation will have to wait if we add all pages
to page cache first. And that is why I said it might cause problem for
performance overhead.

OK, but let's put some numbers on that.  Imagine that we're using high
performance spinning rust so we have an access latency of 5ms (200
IOPS), we're accessing 20 consecutive pages which happen to have their
data contiguous on disk.  Our CPU is running at 2GHz and takes about
100,000 cycles to submit an I/O, plus 1,000 cycles to add an extra page
to the I/O.

Current implementation: Allocate 20 pages, place 19 of them in the cache,
fail to place the last one in the cache.  The later thread actually gets
to jump the queue and submit its bio first.  Its latency will be 100,000
cycles (20us) plus the 5ms access time.  But it only has 20,000 cycles
(4us) to hit this race, or it will end up behaving the same way as below.

New implementation: Allocate 20 pages, place them all in the cache,
then takes 120,000 cycles to build & submit the I/O, and wait 5ms for
the I/O to complete.

But look how much more likely it is that it'll hit during the window
where we're waiting for the I/O to complete -- 5ms is 1250 times longer
than 4us.

If it _does_ get the latency benefit of jumping the queue, the readahead
will create one or two I/Os.  If it hit page 18 instead of page 19, we'd
end up doing three I/Os; the first for page 18, then one for pages 0-17,
and one for page 19.  And that means the disk is going to be busy for
15ms, delaying the next I/O for up to 10ms.  It's actually beneficial in
the long term for the second thread to wait for the readahead to finish.

Thank you very much for your detailed explanation, I was too blind for
my sided view. And I do agree that your patch series is a better
solution for the problem.

Yu Kuai