On Wed, Feb 24, 2021 at 01:32:02PM +0000, David Howells wrote: > Steve French <smfrench@xxxxxxxxx> wrote: > > > This (readahead behavior improvements in Linux, on single large file > > sequential read workloads like cp or grep) gets particularly interesting > > with SMB3 as multichannel becomes more common. With one channel having one > > readahead request pending on the network is suboptimal - but not as bad as > > when multichannel is negotiated. Interestingly in most cases two network > > connections to the same server (different TCP sockets,but the same mount, > > even in cases where only network adapter) can achieve better performance - > > but still significantly lags Windows (and probably other clients) as in > > Linux we don't keep multiple I/Os in flight at one time (unless different > > files are being read at the same time by different threads). > > I think it should be relatively straightforward to make the netfs_readahead() > function generate multiple read requests. If I wasn't handed sufficient pages > by the VM upfront to do two or more read requests, I would need to do extra > expansion. There are a couple of ways this could be done: I don't think this is a job for netfs_readahead(). We can get into a similar situation with SSDs or RAID arrays where ideally we would have several outstanding readahead requests. If your drive is connected through a 1Gbps link (eg PCIe gen 1 x1) and has a latency of 10ms seek time, with one outstanding read, each read needs to be 12.5MB in size in order to saturate the bus. If the device supports 128 outstanding commands, each read need only be 100kB. We need the core readahead code to handle this situation. My suggestion for doing this is to send off an extra readahead request every time we hit a !Uptodate page. It looks something like this (assuming the app is processing the data fast and always hits the !Uptodate case) ... 1. hit 0, set readahead size to 64kB, mark 32kB as Readahead, send read for 0-64kB wait for 0-64kB to complete 2. hit 32kB (Readahead), no reads outstanding inc readahead size to 128kB, mark 128kB as Readahead, send read for 64k-192kB 3. hit 64kB (!Uptodate), one read outstanding mark 256kB as Readahead, send read for 192-320kB mark 384kB as Readahead, send read for 320-448kB wait for 64-192kB to complete 4. hit 128kB (Readahead), two reads outstanding inc readahead size to 256kB, mark 576kB as Readahead, send read for 448-704kB 5. hit 192kB (!Uptodate), three reads outstanding mark 832kB as Readahead, send read for 704-960kB mark 1088kB as Readahead, send read for 960-1216kB wait for 192-320kB to complete 6. hit 256kB (Readahead), four reads outstanding mark 1344kB as Readahead, send read for 1216-1472kB 7. hit 320kB (!Uptodate), five reads outstanding mark 1600kB as Readahead, send read for 1472-1728kB mark 1856kB as Readahead, send read for 1728-1984kB wait for 320-448kB to complete 8. hit 384kB (Readahead), five reads outstanding mark 2112kB as Readahead, send read for 1984-2240kB 9. hit 448kB (!Uptodate), six reads outstanding mark 2368kB as Readahead, send read for 2240-2496kB mark 2624kB as Readahead, send read for 2496-2752kB wait for 448-704kB to complete 10. hit 576kB (Readahead), seven reads outstanding mark 2880kB as Readahead, send read for 2752-3008kB ... Once we stop hitting !Uptodate pages, we'll maintain the number of pages marked as Readahead, and thus keep the number of readahead requests at the level it determined was necessary to keep the link saturated. I think we may need to put a parallelism cap in the bdi so that a device which is just slow instead of at the end of a long fat pipe doesn't get overwhelmed with requests.