I was also hoping to get opinions from others this week at the FS/MM summit, but no luck so far On Tue, May 3, 2022 at 4:12 AM Kinglong Mee <kinglongmee@xxxxxxxxx> wrote: > > Hi steve, > > What's your opinion about the new fix as nfs does? > > On Sun, May 1, 2022 at 6:02 PM ronnie sahlberg <ronniesahlberg@xxxxxxxxx> wrote: > > > > On Sat, 30 Apr 2022 at 06:57, Kinglong Mee <kinglongmee@xxxxxxxxx> wrote: > > > > > > Hi steve, > > > > > > On 2022/4/28 10:47 PM, Steve French wrote: > > > > Got some additional review comments/questions about this patch. > > > > > > > > In __cifs_writev isn't it likely that the write will be async and now > > > > become synchronous and also could we now have a duplicated write > > > > (flushing the write, then calling write again on that range)? > > > > > > Yes, you are right. > > > But for a direct write, cifs must writes those buffer pages to server > > > before the direct write is send to server. > > > > > > > > > > > For example, the change you added > > > > > > > > + if (CIFS_CACHE_WRITE(CIFS_I(inode)) && > > > > + inode->i_mapping && inode->i_mapping->nrpages != 0) { > > > > + rc = filemap_write_and_wait(inode->i_mapping); > > > > > > > > will write synchronously all dirty pages but then proceed to call the async > > > > > > > > rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, &saved_from, > > > > cfile, cifs_sb, &ctx->list, ctx); > > > > > > > > a few lines later. Won't this kill performance? > > > > > > Yes, it is kill performance. > > > But for cache=none, i don't think the local dirty pages should left > > > when a new direct write coming. > > > > > > > > > > > What was the reason for this part of the patch? Doesn't the original > > > > code end up in the same place around line 678 in mm/filemap.c > > > > > > > > int err2 = filemap_fdatawait_range(mapping, > > > > lstart, lend); > > > > > > > > called from: > > > > > > > > @@ -2440,7 +2440,7 @@ int cifs_getattr(struct user_namespace > > > > *mnt_userns, const struct path *path, > > > > if ((request_mask & (STATX_CTIME | STATX_MTIME | STATX_SIZE | > > > > STATX_BLOCKS)) && > > > > !CIFS_CACHE_READ(CIFS_I(inode)) && > > > > inode->i_mapping && inode->i_mapping->nrpages != 0) { > > > > - rc = filemap_fdatawait(inode->i_mapping); > > > > + rc = filemap_write_and_wait(inode->i_mapping); > > > > > > This fix is the other case when mounting cifs with nolease, > > > # mount -t cifs -ocache=none,nolease,nobrl,guest //cifsserverip/test > > > /mnt/cifs > > > # ./testcases/bin/fsx-linux -l 500000 -r 4096 -t 4096 -w 4096 -N 10000 > > > /mnt/cifs/junkfile > > > > > > fsx-linux fails too. > > > > > > After rethinking of this problem, i think the core problem is > > > the buffer io between direct io. In cifs_getattr, it flushs dirty pages > > > if !CIFS_CACHE_READ(CIFS_I(inode)), but when oplock is granted, > > > it's always false, so the dirty pages are not send to server. > > > > > > With write oplock granted, it's right of don't send those dirty pages to > > > server, but the following direct ios, should be send to server directly, > > > or gets data from the local dirty pages? > > > > > > Maybe cifs should flush all dirty pages at cifs_getattr as NFS > > > nfs_getattr does, > > > > > > /* Flush out writes to the server in order to update c/mtime. */ > > > if ((request_mask & (STATX_CTIME | STATX_MTIME)) && > > > S_ISREG(inode->i_mode)) > > > filemap_write_and_wait(inode->i_mapping); > > > > > > After modifing cifs_getattr as following without changing > > > __cifs_readv/__cifs_writev, the fsx-linux test pass at > > > -ocache=none,nolease,nobrl,guest and -ocache=none,nobrl,guest. > > > > > > @@ -2438,9 +2438,8 @@ int cifs_getattr(struct user_namespace > > > *mnt_userns, const struct path *path, > > > * has actual ctime, mtime and file length. > > > */ > > > if ((request_mask & (STATX_CTIME | STATX_MTIME | STATX_SIZE | > > > STATX_BLOCKS)) && > > > - !CIFS_CACHE_READ(CIFS_I(inode)) && > > > - inode->i_mapping && inode->i_mapping->nrpages != 0) { > > > - rc = filemap_fdatawait(inode->i_mapping); > > > + S_ISREG(inode->i_mode)) { > > > + rc = filemap_write_and_wait(inode->i_mapping); > > > if (rc) { > > > mapping_set_error(inode->i_mapping, rc); > > > return rc; > > > > Thanks, Kinglong > > So with this patch, that matches what nfs does, we have a less > > intrusive fix that solves the issue. > > Can you send this as a separate patch to the list for review? > > > > > > > > > > thanks, > > > Kinglong Mee > > > > > > > > > > > On Wed, Apr 27, 2022 at 10:10 PM Steve French <smfrench@xxxxxxxxx> wrote: > > > >> > > > >> merged into cifs-2.6.git for-next pending testing > > > >> > > > >> On Sun, Apr 24, 2022 at 10:41 PM Kinglong Mee <kinglongmee@xxxxxxxxx> wrote: > > > >>> > > > >>> ping... > > > >>> > > > >>> On Mon, Apr 11, 2022 at 3:39 PM Kinglong Mee <kinglongmee@xxxxxxxxx> wrote: > > > >>>> > > > >>>> Testing with ltp, fsx-linux fail as, > > > >>>> > > > >>>> # mount -t cifs -ocache=none,nobrl,guest //cifsserverip/test /mnt/cifs/ > > > >>>> # dd if=/dev/zero of=/mnt/cifs//junkfile bs=8192 count=19200 conv=block > > > >>>> # ./testcases/bin/fsx-linux -l 500000 -r 4096 -t 4096 -w 4096 -N 10000 /mnt/cifs/junkfile > > > >>>> skipping zero size read > > > >>>> truncating to largest ever: 0x2c000 > > > >>>> READ BAD DATA: offset = 0x1c000, size = 0x9cc0 > > > >>>> OFFSET GOOD BAD RANGE > > > >>>> 0x1c000 0x09d2 000000 0x22ed > > > >>>> operation# (mod 256) for the bad dataunknown, check HOLE and EXTEND ops > > > >>>> LOG DUMP (10 total operations): > > > >>>> 1: 1649662377.404010 SKIPPED (no operation) > > > >>>> 2: 1649662377.413729 WRITE 0x3000 thru 0xdece (0xaecf bytes) HOLE > > > >>>> 3: 1649662377.424961 WRITE 0x19000 thru 0x1b410 (0x2411 bytes) HOLE > > > >>>> 4: 1649662377.435135 TRUNCATE UP from 0x1b411 to 0x2c000 ******WWWW > > > >>>> 5: 1649662377.487010 MAPWRITE 0x5000 thru 0x13077 (0xe078 bytes) > > > >>>> 6: 1649662377.495006 MAPREAD 0x8000 thru 0xe16c (0x616d bytes) > > > >>>> 7: 1649662377.500638 MAPREAD 0x1e000 thru 0x2054d (0x254e bytes) ***RRRR*** > > > >>>> 8: 1649662377.506165 WRITE 0x76000 thru 0x7993f (0x3940 bytes) HOLE > > > >>>> 9: 1649662377.516674 MAPWRITE 0x1a000 thru 0x1e2fe (0x42ff bytes) ******WWWW > > > >>>> 10: 1649662377.535312 READ 0x1c000 thru 0x25cbf (0x9cc0 bytes) ***RRRR*** > > > >>>> Correct content saved for comparison > > > >>>> (maybe hexdump "/mnt/cifs/junkfile" vs "/mnt/cifs/junkfile.fsxgood") > > > >>>> > > > >>>> Those data written at MAPWRITE is not flush to smb server, > > > >>>> but the fallowing read gets data from the backend. > > > >>>> > > > >>>> Signed-off-by: Kinglong Mee <kinglongmee@xxxxxxxxx> > > > >>>> --- > > > >>>> fs/cifs/file.c | 22 ++++++++++++++++++++++ > > > >>>> fs/cifs/inode.c | 2 +- > > > >>>> 2 files changed, 23 insertions(+), 1 deletion(-) > > > >>>> > > > >>>> diff --git a/fs/cifs/file.c b/fs/cifs/file.c > > > >>>> index d511a78383c3..11912474563e 100644 > > > >>>> --- a/fs/cifs/file.c > > > >>>> +++ b/fs/cifs/file.c > > > >>>> @@ -3222,6 +3222,7 @@ static ssize_t __cifs_writev( > > > >>>> struct kiocb *iocb, struct iov_iter *from, bool direct) > > > >>>> { > > > >>>> struct file *file = iocb->ki_filp; > > > >>>> + struct inode *inode = file_inode(iocb->ki_filp); > > > >>>> ssize_t total_written = 0; > > > >>>> struct cifsFileInfo *cfile; > > > >>>> struct cifs_tcon *tcon; > > > >>>> @@ -3249,6 +3250,16 @@ static ssize_t __cifs_writev( > > > >>>> cfile = file->private_data; > > > >>>> tcon = tlink_tcon(cfile->tlink); > > > >>>> > > > >>>> + /* We need to be sure that all dirty pages are written to the server. */ > > > >>>> + if (CIFS_CACHE_WRITE(CIFS_I(inode)) && > > > >>>> + inode->i_mapping && inode->i_mapping->nrpages != 0) { > > > >>>> + rc = filemap_write_and_wait(inode->i_mapping); > > > >>>> + if (rc) { > > > >>>> + mapping_set_error(inode->i_mapping, rc); > > > >>>> + return rc; > > > >>>> + } > > > >>>> + } > > > >>>> + > > > >>>> if (!tcon->ses->server->ops->async_writev) > > > >>>> return -ENOSYS; > > > >>>> > > > >>>> @@ -3961,6 +3972,7 @@ static ssize_t __cifs_readv( > > > >>>> { > > > >>>> size_t len; > > > >>>> struct file *file = iocb->ki_filp; > > > >>>> + struct inode *inode = file_inode(iocb->ki_filp); > > > >>>> struct cifs_sb_info *cifs_sb; > > > >>>> struct cifsFileInfo *cfile; > > > >>>> struct cifs_tcon *tcon; > > > >>>> @@ -3986,6 +3998,16 @@ static ssize_t __cifs_readv( > > > >>>> cfile = file->private_data; > > > >>>> tcon = tlink_tcon(cfile->tlink); > > > >>>> > > > >>>> + /* We need to be sure that all dirty pages are written to the server. */ > > > >>>> + if (CIFS_CACHE_WRITE(CIFS_I(inode)) && > > > >>>> + inode->i_mapping && inode->i_mapping->nrpages != 0) { > > > >>>> + rc = filemap_write_and_wait(inode->i_mapping); > > > >>>> + if (rc) { > > > >>>> + mapping_set_error(inode->i_mapping, rc); > > > >>>> + return rc; > > > >>>> + } > > > >>>> + } > > > >>>> + > > > >>>> if (!tcon->ses->server->ops->async_readv) > > > >>>> return -ENOSYS; > > > >>>> > > > >>>> diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c > > > >>>> index 2f9e7d2f81b6..d5c07196a81e 100644 > > > >>>> --- a/fs/cifs/inode.c > > > >>>> +++ b/fs/cifs/inode.c > > > >>>> @@ -2440,7 +2440,7 @@ int cifs_getattr(struct user_namespace *mnt_userns, const struct path *path, > > > >>>> if ((request_mask & (STATX_CTIME | STATX_MTIME | STATX_SIZE | STATX_BLOCKS)) && > > > >>>> !CIFS_CACHE_READ(CIFS_I(inode)) && > > > >>>> inode->i_mapping && inode->i_mapping->nrpages != 0) { > > > >>>> - rc = filemap_fdatawait(inode->i_mapping); > > > >>>> + rc = filemap_write_and_wait(inode->i_mapping); > > > >>>> if (rc) { > > > >>>> mapping_set_error(inode->i_mapping, rc); > > > >>>> return rc; > > > >>>> -- > > > >>>> 2.35.1 > > > >>>> > > > >> > > > >> > > > >> > > > >> -- > > > >> Thanks, > > > >> > > > >> Steve > > > > > > > > > > > > -- Thanks, Steve