Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 28, 2023 at 10:52:15PM -0500, Theodore Ts'o wrote:
> For example, most cloud storage devices are doing read-ahead to try to
> anticipate read requests from the VM.  This can interfere with the
> read-ahead being done by the guest kernel.  So being able to tell
> cloud storage device whether a particular read request is stemming
> from a read-ahead or not.  At the moment, as Matthew Wilcox has
> pointed out, we currently use the read-ahead code path for synchronous
> buffered reads.  So plumbing this information so it can passed through
> multiple levels of the mm, fs, and block layers will probably be
> needed.

This shouldn't be _too_ painful.  For example, the NVMe driver already
does the right thing:

        if (req->cmd_flags & (REQ_FAILFAST_DEV | REQ_RAHEAD))
                control |= NVME_RW_LR;

        if (req->cmd_flags & REQ_RAHEAD)
                dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH;

(LR is Limited Retry; FREQ_PREFETCH is "Speculative read. The command
is part of a prefetch operation")

The only problem is that the readahead code doesn't tell the filesystem
whether the request is sync or async.  This should be a simple matter
of adding a new 'bool async' to the readahead_control and then setting
REQ_RAHEAD based on that, rather than on whether the request came in
through readahead() or read_folio() (eg see mpage_readahead()).

Another thing to fix is that SCSI doesn't do anything with the REQ_RAHEAD
flag, so I presume T10 has some work to do (maybe they could borrow the
Access Frequency field from NVMe, since that was what the drive vendors
told us they wanted; maybe they changed their minds since).




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux