Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Matthew!

On 2023/3/1 12:35, Matthew Wilcox wrote:
On Tue, Feb 28, 2023 at 10:52:15PM -0500, Theodore Ts'o wrote:
For example, most cloud storage devices are doing read-ahead to try to
anticipate read requests from the VM.  This can interfere with the
read-ahead being done by the guest kernel.  So being able to tell
cloud storage device whether a particular read request is stemming
from a read-ahead or not.  At the moment, as Matthew Wilcox has
pointed out, we currently use the read-ahead code path for synchronous
buffered reads.  So plumbing this information so it can passed through
multiple levels of the mm, fs, and block layers will probably be
needed.

This shouldn't be _too_ painful.  For example, the NVMe driver already
does the right thing:

         if (req->cmd_flags & (REQ_FAILFAST_DEV | REQ_RAHEAD))
                 control |= NVME_RW_LR;

         if (req->cmd_flags & REQ_RAHEAD)
                 dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH;

(LR is Limited Retry; FREQ_PREFETCH is "Speculative read. The command
is part of a prefetch operation")

The only problem is that the readahead code doesn't tell the filesystem
whether the request is sync or async.  This should be a simple matter
of adding a new 'bool async' to the readahead_control and then setting
REQ_RAHEAD based on that, rather than on whether the request came in
through readahead() or read_folio() (eg see mpage_readahead()).

Great!  In addition to that, just (somewhat) off topic, if we have a
"bool async" now, I think it will immediately have some users (such as
EROFS), since we'd like to do post-processing (such as decompression)
immediately in the same context with sync readahead (due to missing
pages) and leave it to another kworker for async readahead (I think
it's almost same for decryption and verification).

So "bool async" is quite useful on my side if it could be possible
passed to fs side.  I'd like to raise my hands to have it.

Thanks,
Gao Xiang


Another thing to fix is that SCSI doesn't do anything with the REQ_RAHEAD
flag, so I presume T10 has some work to do (maybe they could borrow the
Access Frequency field from NVMe, since that was what the drive vendors
told us they wanted; maybe they changed their minds since).



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux