Re: Allow very large writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I also noticed that we can't take advantage of this for the
forcedirectio case since we are passed a user space pointer and the
kernel socket API requires a kvec (which will point to pages in the
kernel address space) so there is an extra buffer allocate and file
copy.  Any ideas the best way to handle that?  It would be trivial for
a quick test to change cifs_user_write from calling CIFSSMBWrite to
CIFSSMBWrite2 (which takes the write buffer as a pointer instead of
copying it) - I would expect a big performance gain.

By the way, I did additional testing over localhost and even there the
large writes show a big gain.

e.g. mounting over localhost cifs->Samba/XFS and doing dd of 350MB
with a block size of 1000000.  Local copy got (no cifs)  46MB/sec
(this is not a high end disk).  For cifs->Samba/XFS

wsize=56K (default) 32.2MB/sec
wsize=64K 34.7 MB/sec
wsize=504000   36.3 MB/sec
wsize=1004000 36.3 MB/sec


On the same machine with cifs mounted over localhost to Samba/ext3
instead of Samba/XFS
wsize=56K (default) 22 MB/sec
wsize=504000 33.5 MB/sec
wsize=1004000 39.5 MB/sec

On Nov 8, 2007 9:50 AM, Steve French <smfrench@xxxxxxxxx> wrote:
> I like your code much better than my quick hack - but shouldn't the
> size of the cifs pagevec depend on wsize? (and it also needs a change
> so that we only allow the very large wsize when the CIFS POSIX
> extensions flag for very large write is negotiated).
>
> The performance improvements from this are significant (almost double
> over GigE) even without splice being finished, and should be huge when
> splice can do socket to file copy.
>
> See:
> http://marc.info/?l=linux-cifs-client&m=119449818718441&w=2
>
>
> On Nov 8, 2007 8:23 AM, Dave Kleikamp <shaggy@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Wed, 2007-11-07 at 23:13 -0600, Steve French wrote:
> > > This quick hack works (to allow file writes over CIFS of up to 1MB
> > > (actually I set this to 245 pages at one time), but needs to be
> > > changed so it does not allocate a kvec like structure off the stack -
> > > but I am also concerned that it will look strange casting around the
> > > compiler warnings.   The problem is that the kvec only has pointers to
> > > 14 pages (thus the original 56K cifs write size).  I am creating
> > > another structure that begins the same but has a larger array of
> > > pointers - which obviously will generate warnings although it works.
> > >
> > > Ideas?
> >
> > I don't like defining your own pagevec struct.  This could be
> > troublesome if struct pagevec ever changes.  How's this for a cleanup?
> >
> > diff -Nurp linux-2.6.24-rc2/fs/cifs/cifsglob.h linux/fs/cifs/cifsglob.h
> > --- linux-2.6.24-rc2/fs/cifs/cifsglob.h 2007-11-07 08:13:54.000000000 -0600
> > +++ linux/fs/cifs/cifsglob.h    2007-11-08 08:02:11.000000000 -0600
> > @@ -36,6 +36,8 @@
> >
> >  #define CIFS_MIN_RCV_POOL 4
> >
> > +#define CIFS_PAGEVEC_SIZE 246
> > +
> >  /*
> >   * MAX_REQ is the maximum number of requests that WE will send
> >   * on one socket concurently. It also matches the most common
> > diff -Nurp linux-2.6.24-rc2/fs/cifs/connect.c linux/fs/cifs/connect.c
> > --- linux-2.6.24-rc2/fs/cifs/connect.c  2007-11-07 08:13:54.000000000 -0600
> > +++ linux/fs/cifs/connect.c     2007-11-08 08:04:59.000000000 -0600
> > @@ -2016,7 +2016,7 @@ cifs_mount(struct super_block *sb, struc
> >                 else /* default */
> >                         cifs_sb->rsize = CIFSMaxBufSize;
> >
> > -               if (volume_info.wsize > PAGEVEC_SIZE * PAGE_CACHE_SIZE) {
> > +               if (volume_info.wsize > CIFS_PAGEVEC_SIZE * PAGE_CACHE_SIZE) {
> >                         cERROR(1, ("wsize %d too large, using 4096 instead",
> >                                   volume_info.wsize));
> >                         cifs_sb->wsize = 4096;
> > diff -Nurp linux-2.6.24-rc2/fs/cifs/file.c linux/fs/cifs/file.c
> > --- linux-2.6.24-rc2/fs/cifs/file.c     2007-11-07 08:13:54.000000000 -0600
> > +++ linux/fs/cifs/file.c        2007-11-08 08:17:31.000000000 -0600
> > @@ -1178,7 +1178,8 @@ static int cifs_writepages(struct addres
> >         __u64 offset = 0;
> >         struct cifsFileInfo *open_file;
> >         struct page *page;
> > -       struct pagevec pvec;
> > +       struct pagevec *pvec;
> > +       int pvec_size;
> >         int rc = 0;
> >         int scanned = 0;
> >         int xid;
> > @@ -1198,10 +1199,18 @@ static int cifs_writepages(struct addres
> >                         if (!experimEnabled)
> >                                 return generic_writepages(mapping, wbc);
> >
> > -       iov = kmalloc(32 * sizeof(struct kvec), GFP_KERNEL);
> > -       if (iov == NULL)
> > +       pvec_size = sizeof(struct pagevec) +
> > +                   (CIFS_PAGEVEC_SIZE - PAGEVEC_SIZE) * sizeof(void *);
> > +       pvec = kmalloc(pvec_size, GFP_KERNEL);
> > +       if (pvec == NULL)
> >                 return generic_writepages(mapping, wbc);
> >
> > +       iov = kmalloc(CIFS_PAGEVEC_SIZE * sizeof(struct kvec), GFP_KERNEL);
> > +       if (iov == NULL) {
> > +               kfree(pvec);
> > +               return generic_writepages(mapping, wbc);
> > +       }
> > +
> >
> >         /*
> >          * BB: Is this meaningful for a non-block-device file system?
> > @@ -1210,12 +1219,13 @@ static int cifs_writepages(struct addres
> >         if (wbc->nonblocking && bdi_write_congested(bdi)) {
> >                 wbc->encountered_congestion = 1;
> >                 kfree(iov);
> > +               kfree(pvec);
> >                 return 0;
> >         }
> >
> >         xid = GetXid();
> >
> > -       pagevec_init(&pvec, 0);
> > +       pagevec_init(pvec, 0);
> >         if (wbc->range_cyclic) {
> >                 index = mapping->writeback_index; /* Start from prev offset */
> >                 end = -1;
> > @@ -1228,9 +1238,10 @@ static int cifs_writepages(struct addres
> >         }
> >  retry:
> >         while (!done && (index <= end) &&
> > -              (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
> > +              (nr_pages = pagevec_lookup_tag(pvec, mapping, &index,
> >                         PAGECACHE_TAG_DIRTY,
> > -                       min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1))) {
> > +                       min(end - index,
> > +                           (pgoff_t)CIFS_PAGEVEC_SIZE - 1) + 1))) {
> >                 int first;
> >                 unsigned int i;
> >
> > @@ -1240,7 +1251,7 @@ retry:
> >                 bytes_to_write = 0;
> >
> >                 for (i = 0; i < nr_pages; i++) {
> > -                       page = pvec.pages[i];
> > +                       page = pvec->pages[i];
> >                         /*
> >                          * At this point we hold neither mapping->tree_lock nor
> >                          * lock on the page itself: the page may be truncated or
> > @@ -1343,7 +1354,7 @@ retry:
> >                                 }
> >                         }
> >                         for (i = 0; i < n_iov; i++) {
> > -                               page = pvec.pages[first + i];
> > +                               page = pvec->pages[first + i];
> >                                 /* Should we also set page error on
> >                                 success rc but too little data written? */
> >                                 /* BB investigate retry logic on temporary
> > @@ -1360,7 +1371,7 @@ retry:
> >                                 done = 1;
> >                         index = next;
> >                 }
> > -               pagevec_release(&pvec);
> > +               pagevec_release(pvec);
> >         }
> >         if (!scanned && !done) {
> >                 /*
> > @@ -1376,6 +1387,7 @@ retry:
> >
> >         FreeXid(xid);
> >         kfree(iov);
> > +       kfree(pvec);
> >         return rc;
> >  }
> >
> >
> >
> >
> > --
> > David Kleikamp
> > IBM Linux Technology Center
> >
> >
>
>
>
> --
> Thanks,
>
> Steve
>



-- 
Thanks,

Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux