Re: [PATCH] cifs: flush all dirty pages to server before read/write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 30 Apr 2022 at 06:57, Kinglong Mee <kinglongmee@xxxxxxxxx> wrote:
>
> Hi steve,
>
> On 2022/4/28 10:47 PM, Steve French wrote:
> > Got some additional review comments/questions about this patch.
> >
> > In __cifs_writev isn't it likely that the write will be async and now
> > become synchronous and also could we now have a duplicated write
> > (flushing the write, then calling write again on that range)?
>
> Yes, you are right.
> But for a direct write, cifs must writes those buffer pages to server
> before the direct write is send to server.
>
> >
> > For example, the change you added
> >
> > +       if (CIFS_CACHE_WRITE(CIFS_I(inode)) &&
> > +           inode->i_mapping && inode->i_mapping->nrpages != 0) {
> > +               rc = filemap_write_and_wait(inode->i_mapping);
> >
> > will write synchronously all dirty pages but then proceed to call the async
> >
> >          rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, &saved_from,
> >                                    cfile, cifs_sb, &ctx->list, ctx);
> >
> > a few lines later.  Won't this kill performance?
>
> Yes, it is kill performance.
> But for cache=none, i don't think the local dirty pages should left
> when a new direct write coming.
>
> >
> > What was the reason for this part of the patch?  Doesn't the original
> > code end up in the same place around line 678 in mm/filemap.c
> >
> >                          int err2 = filemap_fdatawait_range(mapping,
> >                                                  lstart, lend);
> >
> > called from:
> >
> > @@ -2440,7 +2440,7 @@ int cifs_getattr(struct user_namespace
> > *mnt_userns, const struct path *path,
> >          if ((request_mask & (STATX_CTIME | STATX_MTIME | STATX_SIZE |
> > STATX_BLOCKS)) &&
> >              !CIFS_CACHE_READ(CIFS_I(inode)) &&
> >              inode->i_mapping && inode->i_mapping->nrpages != 0) {
> > -               rc = filemap_fdatawait(inode->i_mapping);
> > +               rc = filemap_write_and_wait(inode->i_mapping);
>
> This fix is the other case when mounting cifs with nolease,
> # mount -t cifs -ocache=none,nolease,nobrl,guest //cifsserverip/test
> /mnt/cifs
> # ./testcases/bin/fsx-linux -l 500000 -r 4096 -t 4096 -w 4096 -N 10000
> /mnt/cifs/junkfile
>
> fsx-linux fails too.
>
> After rethinking of this problem, i think the core problem is
> the buffer io between direct io. In cifs_getattr, it flushs dirty pages
> if !CIFS_CACHE_READ(CIFS_I(inode)), but when oplock is granted,
> it's always false, so the dirty pages are not send to server.
>
> With write oplock granted, it's right of don't send those dirty pages to
> server, but the following direct ios, should be send to server directly,
> or gets data from the local dirty pages?
>
> Maybe cifs should flush all dirty pages at cifs_getattr as NFS
> nfs_getattr does,
>
>          /* Flush out writes to the server in order to update c/mtime.  */
>          if ((request_mask & (STATX_CTIME | STATX_MTIME)) &&
>              S_ISREG(inode->i_mode))
>                  filemap_write_and_wait(inode->i_mapping);
>
> After modifing cifs_getattr as following without changing
> __cifs_readv/__cifs_writev, the fsx-linux test pass at
> -ocache=none,nolease,nobrl,guest and -ocache=none,nobrl,guest.
>
> @@ -2438,9 +2438,8 @@ int cifs_getattr(struct user_namespace
> *mnt_userns, const struct path *path,
>           * has actual ctime, mtime and file length.
>           */
>          if ((request_mask & (STATX_CTIME | STATX_MTIME | STATX_SIZE |
> STATX_BLOCKS)) &&
> -           !CIFS_CACHE_READ(CIFS_I(inode)) &&
> -           inode->i_mapping && inode->i_mapping->nrpages != 0) {
> -               rc = filemap_fdatawait(inode->i_mapping);
> +           S_ISREG(inode->i_mode)) {
> +               rc = filemap_write_and_wait(inode->i_mapping);
>                  if (rc) {
>                          mapping_set_error(inode->i_mapping, rc);
>                          return rc;

Thanks, Kinglong
So with this patch, that matches what nfs does, we have a less
intrusive fix that solves the issue.
Can you send this as a separate patch to the list for review?


>
> thanks,
> Kinglong Mee
>
> >
> > On Wed, Apr 27, 2022 at 10:10 PM Steve French <smfrench@xxxxxxxxx> wrote:
> >>
> >> merged into cifs-2.6.git for-next pending testing
> >>
> >> On Sun, Apr 24, 2022 at 10:41 PM Kinglong Mee <kinglongmee@xxxxxxxxx> wrote:
> >>>
> >>> ping...
> >>>
> >>> On Mon, Apr 11, 2022 at 3:39 PM Kinglong Mee <kinglongmee@xxxxxxxxx> wrote:
> >>>>
> >>>> Testing with ltp, fsx-linux fail as,
> >>>>
> >>>> # mount -t cifs -ocache=none,nobrl,guest //cifsserverip/test /mnt/cifs/
> >>>> # dd if=/dev/zero of=/mnt/cifs//junkfile bs=8192 count=19200 conv=block
> >>>> # ./testcases/bin/fsx-linux -l 500000 -r 4096 -t 4096 -w 4096 -N 10000 /mnt/cifs/junkfile
> >>>> skipping zero size read
> >>>> truncating to largest ever: 0x2c000
> >>>> READ BAD DATA: offset = 0x1c000, size = 0x9cc0
> >>>> OFFSET  GOOD    BAD     RANGE
> >>>> 0x1c000 0x09d2  000000  0x22ed
> >>>> operation# (mod 256) for the bad dataunknown, check HOLE and EXTEND ops
> >>>> LOG DUMP (10 total operations):
> >>>> 1: 1649662377.404010 SKIPPED (no operation)
> >>>> 2: 1649662377.413729 WRITE    0x3000 thru 0xdece (0xaecf bytes) HOLE
> >>>> 3: 1649662377.424961 WRITE    0x19000 thru 0x1b410 (0x2411 bytes) HOLE
> >>>> 4: 1649662377.435135 TRUNCATE UP        from 0x1b411 to 0x2c000 ******WWWW
> >>>> 5: 1649662377.487010 MAPWRITE 0x5000 thru 0x13077 (0xe078 bytes)
> >>>> 6: 1649662377.495006 MAPREAD  0x8000 thru 0xe16c (0x616d bytes)
> >>>> 7: 1649662377.500638 MAPREAD  0x1e000 thru 0x2054d (0x254e bytes)       ***RRRR***
> >>>> 8: 1649662377.506165 WRITE    0x76000 thru 0x7993f (0x3940 bytes) HOLE
> >>>> 9: 1649662377.516674 MAPWRITE 0x1a000 thru 0x1e2fe (0x42ff bytes)       ******WWWW
> >>>> 10: 1649662377.535312 READ     0x1c000 thru 0x25cbf (0x9cc0 bytes)      ***RRRR***
> >>>> Correct content saved for comparison
> >>>> (maybe hexdump "/mnt/cifs/junkfile" vs "/mnt/cifs/junkfile.fsxgood")
> >>>>
> >>>> Those data written at MAPWRITE is not flush to smb server,
> >>>> but the fallowing read gets data from the backend.
> >>>>
> >>>> Signed-off-by: Kinglong Mee <kinglongmee@xxxxxxxxx>
> >>>> ---
> >>>>   fs/cifs/file.c  | 22 ++++++++++++++++++++++
> >>>>   fs/cifs/inode.c |  2 +-
> >>>>   2 files changed, 23 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> >>>> index d511a78383c3..11912474563e 100644
> >>>> --- a/fs/cifs/file.c
> >>>> +++ b/fs/cifs/file.c
> >>>> @@ -3222,6 +3222,7 @@ static ssize_t __cifs_writev(
> >>>>          struct kiocb *iocb, struct iov_iter *from, bool direct)
> >>>>   {
> >>>>          struct file *file = iocb->ki_filp;
> >>>> +       struct inode *inode = file_inode(iocb->ki_filp);
> >>>>          ssize_t total_written = 0;
> >>>>          struct cifsFileInfo *cfile;
> >>>>          struct cifs_tcon *tcon;
> >>>> @@ -3249,6 +3250,16 @@ static ssize_t __cifs_writev(
> >>>>          cfile = file->private_data;
> >>>>          tcon = tlink_tcon(cfile->tlink);
> >>>>
> >>>> +       /* We need to be sure that all dirty pages are written to the server. */
> >>>> +       if (CIFS_CACHE_WRITE(CIFS_I(inode)) &&
> >>>> +           inode->i_mapping && inode->i_mapping->nrpages != 0) {
> >>>> +               rc = filemap_write_and_wait(inode->i_mapping);
> >>>> +               if (rc) {
> >>>> +                       mapping_set_error(inode->i_mapping, rc);
> >>>> +                       return rc;
> >>>> +               }
> >>>> +       }
> >>>> +
> >>>>          if (!tcon->ses->server->ops->async_writev)
> >>>>                  return -ENOSYS;
> >>>>
> >>>> @@ -3961,6 +3972,7 @@ static ssize_t __cifs_readv(
> >>>>   {
> >>>>          size_t len;
> >>>>          struct file *file = iocb->ki_filp;
> >>>> +       struct inode *inode = file_inode(iocb->ki_filp);
> >>>>          struct cifs_sb_info *cifs_sb;
> >>>>          struct cifsFileInfo *cfile;
> >>>>          struct cifs_tcon *tcon;
> >>>> @@ -3986,6 +3998,16 @@ static ssize_t __cifs_readv(
> >>>>          cfile = file->private_data;
> >>>>          tcon = tlink_tcon(cfile->tlink);
> >>>>
> >>>> +       /* We need to be sure that all dirty pages are written to the server. */
> >>>> +       if (CIFS_CACHE_WRITE(CIFS_I(inode)) &&
> >>>> +           inode->i_mapping && inode->i_mapping->nrpages != 0) {
> >>>> +               rc = filemap_write_and_wait(inode->i_mapping);
> >>>> +               if (rc) {
> >>>> +                       mapping_set_error(inode->i_mapping, rc);
> >>>> +                       return rc;
> >>>> +               }
> >>>> +       }
> >>>> +
> >>>>          if (!tcon->ses->server->ops->async_readv)
> >>>>                  return -ENOSYS;
> >>>>
> >>>> diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
> >>>> index 2f9e7d2f81b6..d5c07196a81e 100644
> >>>> --- a/fs/cifs/inode.c
> >>>> +++ b/fs/cifs/inode.c
> >>>> @@ -2440,7 +2440,7 @@ int cifs_getattr(struct user_namespace *mnt_userns, const struct path *path,
> >>>>          if ((request_mask & (STATX_CTIME | STATX_MTIME | STATX_SIZE | STATX_BLOCKS)) &&
> >>>>              !CIFS_CACHE_READ(CIFS_I(inode)) &&
> >>>>              inode->i_mapping && inode->i_mapping->nrpages != 0) {
> >>>> -               rc = filemap_fdatawait(inode->i_mapping);
> >>>> +               rc = filemap_write_and_wait(inode->i_mapping);
> >>>>                  if (rc) {
> >>>>                          mapping_set_error(inode->i_mapping, rc);
> >>>>                          return rc;
> >>>> --
> >>>> 2.35.1
> >>>>
> >>
> >>
> >>
> >> --
> >> Thanks,
> >>
> >> Steve
> >
> >
> >



[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux