On Sun, Jan 19, 2020 at 02:55:14PM +0800, yukuai (C) wrote: > On 2020/1/19 14:14, Matthew Wilcox wrote: > > I don't understand your reasoning here. If another process wants to > > access a page of the file which isn't currently in cache, it would have > > to first read the page in from storage. If it's under readahead, it > > has to wait for the read to finish. Why is the second case worse than > > the second? It seems better to me. > > Thanks for your response! My worries is that, for example: > > We read page 0, and trigger readahead to read n pages(0 - n-1). While in > another thread, we read page n-1. > > In the current implementation, if readahead is in the process of reading > page 0 - n-2, later operation doesn't need to wait the former one to > finish. However, later operation will have to wait if we add all pages > to page cache first. And that is why I said it might cause problem for > performance overhead. OK, but let's put some numbers on that. Imagine that we're using high performance spinning rust so we have an access latency of 5ms (200 IOPS), we're accessing 20 consecutive pages which happen to have their data contiguous on disk. Our CPU is running at 2GHz and takes about 100,000 cycles to submit an I/O, plus 1,000 cycles to add an extra page to the I/O. Current implementation: Allocate 20 pages, place 19 of them in the cache, fail to place the last one in the cache. The later thread actually gets to jump the queue and submit its bio first. Its latency will be 100,000 cycles (20us) plus the 5ms access time. But it only has 20,000 cycles (4us) to hit this race, or it will end up behaving the same way as below. New implementation: Allocate 20 pages, place them all in the cache, then takes 120,000 cycles to build & submit the I/O, and wait 5ms for the I/O to complete. But look how much more likely it is that it'll hit during the window where we're waiting for the I/O to complete -- 5ms is 1250 times longer than 4us. If it _does_ get the latency benefit of jumping the queue, the readahead will create one or two I/Os. If it hit page 18 instead of page 19, we'd end up doing three I/Os; the first for page 18, then one for pages 0-17, and one for page 19. And that means the disk is going to be busy for 15ms, delaying the next I/O for up to 10ms. It's actually beneficial in the long term for the second thread to wait for the readahead to finish. Oh, and the current ->readpages code has a race where if the page tagged with PageReadahead ends up not being inserted, we'll lose that bit, which means the readahead will just stop and have to restart (because it will look to the readahead code like it's not being effective). That's a far worse performance problem.