Re: Writes greater than 64k fails with -ENOSPC

Jeff Layton <jlayton@xxxxxxxxxx> · Wed, 30 Jan 2013 09:37:29 -0500

On Wed, 30 Jan 2013 14:06:19 +0000
Tom Talpey <ttalpey@xxxxxxxxxxxxx> wrote:

> > -----Original Message-----
> > From: linux-cifs-owner@xxxxxxxxxxxxxxx [mailto:linux-cifs-
> > owner@xxxxxxxxxxxxxxx] On Behalf Of Jeff Layton
> > Sent: Tuesday, January 29, 2013 7:29 PM
> > To: Suresh Jayaraman
> > Cc: linux-cifs
> > Subject: Re: Writes greater than 64k fails with -ENOSPC
> > 
> > On Tue, 29 Jan 2013 17:54:21 +0530
> > Suresh Jayaraman <sjayaraman@xxxxxxxx> wrote:
> > 
> > > Hi all,
> > >
> > > I'm looking into a report on 3.0 based kernel (plus stable fixes)
> > > where writes greater than 64k to a NAS (Hitachi NAS) is failing
> > > (simple dd). The problem was not seen with a 2.6.32-ish kernel.
> > > Also, note that the problem is not seen with other Servers such as
> > > Windows 2003 or Windows 8 Servers.
> > >
> > > The strace output shows the close() fall fails with -ENOSPC.
> > >
> > > The relevant cFYI snip
> > >
> > > Jan 23 08:31:45 vsusix02 kernel: [1003552.227274]
> > > /usr/src/packages/BUILD/kernel-default-3.0.42/linux-3.0/fs/cifs/transp
> > > ort.c: For smb_command 47 Jan 23 08:31:45 vsusix02 kernel:
> > > [1003552.227277]
> > > /usr/src/packages/BUILD/kernel-default-3.0.42/linux-3.0/fs/cifs/transp
> > > ort.c: Sending smb:  total_len 127044 Jan 23 08:31:45 vsusix02 kernel:
> > > [1003552.345848]
> > > /usr/src/packages/BUILD/kernel-default-3.0.42/linux-3.0/fs/cifs/connec
> > > t.c: rfc1002 length 0x33 Jan 23 08:31:45 vsusix02 kernel:
> > > [1003552.393906]
> > > /usr/src/packages/BUILD/kernel-default-3.0.42/linux-3.0/fs/cifs/cifssm
> > > b.c: async write at 1015808 8192 bytes Jan 23 08:31:45 vsusix02
> > > kernel: [1003552.393911]
> > > /usr/src/packages/BUILD/kernel-default-3.0.42/linux-3.0/fs/cifs/transp
> > > ort.c: For smb_command 47 Jan 23 08:31:45 vsusix02 kernel:
> > > [1003552.393914]
> > > /usr/src/packages/BUILD/kernel-default-3.0.42/linux-3.0/fs/cifs/transp
> > > ort.c: Sending smb:  total_len 8260 Jan 23 08:31:45 vsusix02 kernel:
> > > [1003552.479378]
> > > /usr/src/packages/BUILD/kernel-default-3.0.42/linux-3.0/fs/cifs/connec
> > > t.c: rfc1002 length 0x33 Jan 23 08:31:45 vsusix02 kernel:
> > > [1003552.481215]
> > > /usr/src/packages/BUILD/kernel-default-3.0.42/linux-3.0/fs/cifs/connec
> > > t.c: rfc1002 length 0x33 Jan 23 08:31:45 vsusix02 kernel:
> > > [1003552.481260]
> > > /usr/src/packages/BUILD/kernel-default-3.0.42/linux-3.0/fs/cifs/file.c
> > > : Flush inode ffff88004e1655c0 file ffff8800379d48c0 rc -28
> > >
> > >
> > > The problem seems to be that during close(), we try to flush the
> > > buffers by calling cifs_flush() which in turn will call
> > > filemap_write_and_wait() to wait on the pages under writeback to
> > > complete. do_writepages() will invoke cifs_writepages() which is
> > > perhaps returning -ENOSPC and it is propagated back.
> > >
> > > There are no quota restrictions or disk space problems. The tcpdump
> > > output doesn't show any errors during the write or close.
> > >
> > > I'm not sure what could be causing the problem and would appreciate
> > > any clues or debugging suggestions.
> > >
> > > Thanks
> > >
> > 
> > I'd look for "short" write replies. We have this in
> > cifs_writev_callback:
> > 
> >                 if (written < wdata->bytes)
> >                         wdata->result = -ENOSPC;
> 
> If the server does not return an error, then the client can conclude that at least something was written, based on the value of CountOfBytesWritten in the response. Unless this is zero, I think the client substituting ENOSPC may be incorrect.
> 

That's probably the case, but short writes are really tricky to deal
with...

Say we're spraying out a bunch of 64k writes, but the server then
replies that only 63k of each was written. Now we have to go back and
redirty the pages that didn't get fully written. That's fine and not
too hard to do, but now you have a bunch of dirty pages sprinkled
around the file every 63k.

So, on the next pass through the dirty page radix tree, we go and try
to issue another WRITE_ANDX call. I'm not sure if we'd have to go back
and restart writepages ourselves at that point for the WB_SYNC_ALL case.

Note that this sort of server behavior was the crux of my argument with
Steve a few months ago about defaulting to 64k writes on servers w/o
POSIX extensions. We can be reasonably sure that most servers handle
64k writes OK since that's what Windows does.

The spec is not 100% clear on whether servers are *required* to support
arbitrarily large writes up to the 128k limit. Clearly there are some
that do not, and a larger default is problematic against those servers.

> > IIRC, this is spelled out in MS-CIFS in the section on WRITE_ANDX responses,
> > but I remember it not being 100% clear...
> 
> File a doc issue! Happy to clarify it if needed.
> 

That might be nice to do once Suresh confirms whether that's the case.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html