Re: Data corruption problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 18 Feb 2011 12:30:04 -0600
Wayne Walker <wwalker@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, Feb 10, 2011 at 11:14:59PM -0600, Wayne Walker wrote:
> > First, I'm not certain whether this is samba, the linux cifs driver, or
> > something else.
> > 
> > During testing, one of my QA guys was running an inhouse program that
> > generates pseudo-random, but fully recreatable, data and writes it to
> > a file, the file is named with a name that is essentially the seed to
> > the pseudo- random stream, so, given a filename, it can read the file
> > and verify that the data is correct.
> ... snip ...
> 
> So, my QA guy has repeated the failure - 93 times, only from a linux box, so it appears to definitely be a cifs driver issue.
> 
> What can I do to gather useful info?  tcpdump on both client and server drop too many packets to be useful.
> 

I asked before, but I don't think you ever gave a conclusive answer...

Did the kernel report an error when you did a fsync() or close()? I
suspect that it did, but sadly a lot of programs don't bother to check
for that (usually because they're not really able to deal with it).

>     From a Linux client (hostname: acorn):
>     Feb 17 16:54:30 acorn kernel:  CIFS VFS: Write2 ret -11, wrote 0
>     Feb 17 16:57:10 acorn kernel:  CIFS VFS: No response to cmd 47 mid 46382
>     Feb 17 16:57:10 acorn kernel:  CIFS VFS: Write2 ret -11, wrote 0
>     Feb 17 16:57:16 acorn kernel:  CIFS VFS: Write2 ret -11, wrote 0
>     Feb 17 16:57:31 acorn kernel:  CIFS VFS: No response for cmd 50 mid 46388
>     Feb 17 16:59:52 acorn kernel:  CIFS VFS: No response to cmd 47 mid 64873
>     Feb 17 16:59:52 acorn kernel:  CIFS VFS: Write2 ret -11, wrote 0
>     Feb 17 16:59:53 acorn kernel:  CIFS VFS: Write2 ret -11, wrote 0
>  

Those mean that calls to the server were occasionally timing out.
That's not terribly unusual under heavy load. Until very recently when
that happened, the kernel would treat that like a hard error and would
disconnect the socket.

You may want to test something more recent (like 2.6.38-rc5) to see if
the problems go away with that. Since you mention you're using CentOS
you could also open a bug at bugzilla.redhat.com and I'll try to look
at it when I get time.

If you have a RH support contract you may also want to open a support
case with this problem which would allow me to give it more priority.

Cheers,
-- 
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux