On Sat, 30 Apr 2022 at 06:57, Kinglong Mee <kinglongmee@xxxxxxxxx> wrote: > > Hi steve, > > On 2022/4/28 10:47 PM, Steve French wrote: > > Got some additional review comments/questions about this patch. > > > > In __cifs_writev isn't it likely that the write will be async and now > > become synchronous and also could we now have a duplicated write > > (flushing the write, then calling write again on that range)? > > Yes, you are right. > But for a direct write, cifs must writes those buffer pages to server > before the direct write is send to server. > > > > > For example, the change you added > > > > + if (CIFS_CACHE_WRITE(CIFS_I(inode)) && > > + inode->i_mapping && inode->i_mapping->nrpages != 0) { > > + rc = filemap_write_and_wait(inode->i_mapping); > > > > will write synchronously all dirty pages but then proceed to call the async > > > > rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, &saved_from, > > cfile, cifs_sb, &ctx->list, ctx); > > > > a few lines later. Won't this kill performance? > > Yes, it is kill performance. > But for cache=none, i don't think the local dirty pages should left > when a new direct write coming. > > > > > What was the reason for this part of the patch? Doesn't the original > > code end up in the same place around line 678 in mm/filemap.c > > > > int err2 = filemap_fdatawait_range(mapping, > > lstart, lend); > > > > called from: > > > > @@ -2440,7 +2440,7 @@ int cifs_getattr(struct user_namespace > > *mnt_userns, const struct path *path, > > if ((request_mask & (STATX_CTIME | STATX_MTIME | STATX_SIZE | > > STATX_BLOCKS)) && > > !CIFS_CACHE_READ(CIFS_I(inode)) && > > inode->i_mapping && inode->i_mapping->nrpages != 0) { > > - rc = filemap_fdatawait(inode->i_mapping); > > + rc = filemap_write_and_wait(inode->i_mapping); > > This fix is the other case when mounting cifs with nolease, > # mount -t cifs -ocache=none,nolease,nobrl,guest //cifsserverip/test > /mnt/cifs > # ./testcases/bin/fsx-linux -l 500000 -r 4096 -t 4096 -w 4096 -N 10000 > /mnt/cifs/junkfile > > fsx-linux fails too. > > After rethinking of this problem, i think the core problem is > the buffer io between direct io. In cifs_getattr, it flushs dirty pages > if !CIFS_CACHE_READ(CIFS_I(inode)), but when oplock is granted, > it's always false, so the dirty pages are not send to server. > > With write oplock granted, it's right of don't send those dirty pages to > server, but the following direct ios, should be send to server directly, > or gets data from the local dirty pages? > > Maybe cifs should flush all dirty pages at cifs_getattr as NFS > nfs_getattr does, > > /* Flush out writes to the server in order to update c/mtime. */ > if ((request_mask & (STATX_CTIME | STATX_MTIME)) && > S_ISREG(inode->i_mode)) > filemap_write_and_wait(inode->i_mapping); > > After modifing cifs_getattr as following without changing > __cifs_readv/__cifs_writev, the fsx-linux test pass at > -ocache=none,nolease,nobrl,guest and -ocache=none,nobrl,guest. > > @@ -2438,9 +2438,8 @@ int cifs_getattr(struct user_namespace > *mnt_userns, const struct path *path, > * has actual ctime, mtime and file length. > */ > if ((request_mask & (STATX_CTIME | STATX_MTIME | STATX_SIZE | > STATX_BLOCKS)) && > - !CIFS_CACHE_READ(CIFS_I(inode)) && > - inode->i_mapping && inode->i_mapping->nrpages != 0) { > - rc = filemap_fdatawait(inode->i_mapping); > + S_ISREG(inode->i_mode)) { > + rc = filemap_write_and_wait(inode->i_mapping); > if (rc) { > mapping_set_error(inode->i_mapping, rc); > return rc; Thanks, Kinglong So with this patch, that matches what nfs does, we have a less intrusive fix that solves the issue. Can you send this as a separate patch to the list for review? > > thanks, > Kinglong Mee > > > > > On Wed, Apr 27, 2022 at 10:10 PM Steve French <smfrench@xxxxxxxxx> wrote: > >> > >> merged into cifs-2.6.git for-next pending testing > >> > >> On Sun, Apr 24, 2022 at 10:41 PM Kinglong Mee <kinglongmee@xxxxxxxxx> wrote: > >>> > >>> ping... > >>> > >>> On Mon, Apr 11, 2022 at 3:39 PM Kinglong Mee <kinglongmee@xxxxxxxxx> wrote: > >>>> > >>>> Testing with ltp, fsx-linux fail as, > >>>> > >>>> # mount -t cifs -ocache=none,nobrl,guest //cifsserverip/test /mnt/cifs/ > >>>> # dd if=/dev/zero of=/mnt/cifs//junkfile bs=8192 count=19200 conv=block > >>>> # ./testcases/bin/fsx-linux -l 500000 -r 4096 -t 4096 -w 4096 -N 10000 /mnt/cifs/junkfile > >>>> skipping zero size read > >>>> truncating to largest ever: 0x2c000 > >>>> READ BAD DATA: offset = 0x1c000, size = 0x9cc0 > >>>> OFFSET GOOD BAD RANGE > >>>> 0x1c000 0x09d2 000000 0x22ed > >>>> operation# (mod 256) for the bad dataunknown, check HOLE and EXTEND ops > >>>> LOG DUMP (10 total operations): > >>>> 1: 1649662377.404010 SKIPPED (no operation) > >>>> 2: 1649662377.413729 WRITE 0x3000 thru 0xdece (0xaecf bytes) HOLE > >>>> 3: 1649662377.424961 WRITE 0x19000 thru 0x1b410 (0x2411 bytes) HOLE > >>>> 4: 1649662377.435135 TRUNCATE UP from 0x1b411 to 0x2c000 ******WWWW > >>>> 5: 1649662377.487010 MAPWRITE 0x5000 thru 0x13077 (0xe078 bytes) > >>>> 6: 1649662377.495006 MAPREAD 0x8000 thru 0xe16c (0x616d bytes) > >>>> 7: 1649662377.500638 MAPREAD 0x1e000 thru 0x2054d (0x254e bytes) ***RRRR*** > >>>> 8: 1649662377.506165 WRITE 0x76000 thru 0x7993f (0x3940 bytes) HOLE > >>>> 9: 1649662377.516674 MAPWRITE 0x1a000 thru 0x1e2fe (0x42ff bytes) ******WWWW > >>>> 10: 1649662377.535312 READ 0x1c000 thru 0x25cbf (0x9cc0 bytes) ***RRRR*** > >>>> Correct content saved for comparison > >>>> (maybe hexdump "/mnt/cifs/junkfile" vs "/mnt/cifs/junkfile.fsxgood") > >>>> > >>>> Those data written at MAPWRITE is not flush to smb server, > >>>> but the fallowing read gets data from the backend. > >>>> > >>>> Signed-off-by: Kinglong Mee <kinglongmee@xxxxxxxxx> > >>>> --- > >>>> fs/cifs/file.c | 22 ++++++++++++++++++++++ > >>>> fs/cifs/inode.c | 2 +- > >>>> 2 files changed, 23 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/fs/cifs/file.c b/fs/cifs/file.c > >>>> index d511a78383c3..11912474563e 100644 > >>>> --- a/fs/cifs/file.c > >>>> +++ b/fs/cifs/file.c > >>>> @@ -3222,6 +3222,7 @@ static ssize_t __cifs_writev( > >>>> struct kiocb *iocb, struct iov_iter *from, bool direct) > >>>> { > >>>> struct file *file = iocb->ki_filp; > >>>> + struct inode *inode = file_inode(iocb->ki_filp); > >>>> ssize_t total_written = 0; > >>>> struct cifsFileInfo *cfile; > >>>> struct cifs_tcon *tcon; > >>>> @@ -3249,6 +3250,16 @@ static ssize_t __cifs_writev( > >>>> cfile = file->private_data; > >>>> tcon = tlink_tcon(cfile->tlink); > >>>> > >>>> + /* We need to be sure that all dirty pages are written to the server. */ > >>>> + if (CIFS_CACHE_WRITE(CIFS_I(inode)) && > >>>> + inode->i_mapping && inode->i_mapping->nrpages != 0) { > >>>> + rc = filemap_write_and_wait(inode->i_mapping); > >>>> + if (rc) { > >>>> + mapping_set_error(inode->i_mapping, rc); > >>>> + return rc; > >>>> + } > >>>> + } > >>>> + > >>>> if (!tcon->ses->server->ops->async_writev) > >>>> return -ENOSYS; > >>>> > >>>> @@ -3961,6 +3972,7 @@ static ssize_t __cifs_readv( > >>>> { > >>>> size_t len; > >>>> struct file *file = iocb->ki_filp; > >>>> + struct inode *inode = file_inode(iocb->ki_filp); > >>>> struct cifs_sb_info *cifs_sb; > >>>> struct cifsFileInfo *cfile; > >>>> struct cifs_tcon *tcon; > >>>> @@ -3986,6 +3998,16 @@ static ssize_t __cifs_readv( > >>>> cfile = file->private_data; > >>>> tcon = tlink_tcon(cfile->tlink); > >>>> > >>>> + /* We need to be sure that all dirty pages are written to the server. */ > >>>> + if (CIFS_CACHE_WRITE(CIFS_I(inode)) && > >>>> + inode->i_mapping && inode->i_mapping->nrpages != 0) { > >>>> + rc = filemap_write_and_wait(inode->i_mapping); > >>>> + if (rc) { > >>>> + mapping_set_error(inode->i_mapping, rc); > >>>> + return rc; > >>>> + } > >>>> + } > >>>> + > >>>> if (!tcon->ses->server->ops->async_readv) > >>>> return -ENOSYS; > >>>> > >>>> diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c > >>>> index 2f9e7d2f81b6..d5c07196a81e 100644 > >>>> --- a/fs/cifs/inode.c > >>>> +++ b/fs/cifs/inode.c > >>>> @@ -2440,7 +2440,7 @@ int cifs_getattr(struct user_namespace *mnt_userns, const struct path *path, > >>>> if ((request_mask & (STATX_CTIME | STATX_MTIME | STATX_SIZE | STATX_BLOCKS)) && > >>>> !CIFS_CACHE_READ(CIFS_I(inode)) && > >>>> inode->i_mapping && inode->i_mapping->nrpages != 0) { > >>>> - rc = filemap_fdatawait(inode->i_mapping); > >>>> + rc = filemap_write_and_wait(inode->i_mapping); > >>>> if (rc) { > >>>> mapping_set_error(inode->i_mapping, rc); > >>>> return rc; > >>>> -- > >>>> 2.35.1 > >>>> > >> > >> > >> > >> -- > >> Thanks, > >> > >> Steve > > > > > >