On Tue, Oct 18, 2011 at 3:29 PM, Pavel Shilovsky <piastryyy@xxxxxxxxx> wrote: > 2011/10/14 Jeff Layton <jlayton@xxxxxxxxxx>: >> On Fri, 14 Oct 2011 14:02:54 +0400 >> Pavel Shilovsky <piastryyy@xxxxxxxxx> wrote: >> >>> Today, I caught it once again and didn't noticed any reconnects (no cERRORs). >>> >>> It is surely not depends on Jeff's async read patchset, because I used >>> my cifs-3.2-current branch. >>> >>> My branch consists of Steve's master + lockpatchset + smb2 patches. >>> From another hand, previously I caught it with Jeff's branch (without >>> lockpatchset and smb2 patches). So, that's why the problem is in >>> existing cifs code now. >>> >>> FYI: I checked two files: "buggy" and original, and noticed that the >>> difference between them is located in one place - positions from >>> 2014442 to 2014569 - 126 differences with two equal holes. >>> >>> So, 2014569 - 2014442 + 1 = 128 wrong bytes. Ideas? >>> >> >> Good to know, thanks. I also tried reproducing this for a while last >> night and was unable to... >> >> I used this script: >> >> -------------------------[snip]------------------------------ >> #!/bin/bash >> >> origfile=$1 >> destfile=$2 >> >> origsum=`md5sum $origfile | cut -d' ' -f1` >> i=0 >> >> while true; do >> echo $i >> rm -f $destfile $origfile.tmp >> >> dd if=$origfile of=$destfile bs=100000 >> if [ $? -ne 0 ]; then >> echo "dd1 failed" >> exit 1 >> fi >> >> dd if=$destfile of=$origfile.tmp bs=100000 >> if [ $? -ne 0 ]; then >> echo "dd2 failed" >> exit 1 >> fi >> >> destsum=`md5sum $destfile | cut -d' ' -f1` > > As you have already read $destfile to $origfile.tmp, there is no need > to read it again - you only need to calculate md5sum of the > origfile.tmp. > >> if [ "$origsum" != "$destsum" ]; then >> echo "md5sums don't match! orig=$origsum dest=$destsum" >> stat $origfile >> stat $destfile >> exit 1 >> fi >> >> i=`expr $i + 1` >> done >> >> -------------------------[snip]------------------------------ >> >> I ran the above with the first arg set to a ~615M .iso file on local >> disk and the second to a file on a cifs mount. >> >> I ran it against my win2k8 host for several hours and it never failed. >> I then tried running it against my Windows 7 home host (running on >> bare-metal) and it would run for a little while and would eventually >> fail due to the server returning "out of memory" errors. Some of those >> would occur on the NEGOTIATE call, so I chalk that up to a Win7 bug. >> >> I never saw this mismatch, but I think we can try to infer something >> from the nature of the failures that Pavel saw... >> >> Since the file was apparently being written properly, the write phase >> seems like it worked correctly. The data all went into the cache, and >> then got flushed properly to the server. >> >> So, it seems likely that the problem is in the read phase of the test. >> There are several possibilities: >> >> 1) we started out doing a cache read, but the cache was invalidated >> partway through. "Something happened" and one of the reads got mangled. >> >> 2) the server sent us a corrupt read for some reason >> >> 3) lower level networking problem caused a corrupt read >> >> 4) generic memory corruption in the pagecache of some sort >> >> ...plus many others... >> >> The fact that only 127 bytes was corrupt is very odd. It would be >> easier to understand if an entire page were bad, or an entire rsize >> chunk. >> >> If you are able to reproduce this again, it might be helpful to see if >> that's consistent. Try to nail down the nature of the corruption -- see >> how much is different and where the different parts are. That may >> help shed light on the problem... >> >> In any case, this will probably take some digging -- we should probably >> open a bug at bugzilla.samba.org and start working on this there. >> Pavel, would you mind doing that when you have time? >> >> Thanks, >> -- >> Jeff Layton <jlayton@xxxxxxxxxx> >> > > So, after a closer investigating of the problem I figured out that: > > 1) It always reproduces after I boot the OS, load module, mount share > and read the existing file. > > 2) Network traffics that are caught by wireshark on the server > (Windows 7) and the client are different - I checked it and found the > same difference in response packets for the area that is different on > orig and orig.tmp files (the response packet from the capture on the > server was true and the response packet from the capture on the client > was failed). > > 3) The different area is always 128 bytes bounded but appears in > different places. > > 4) It doesn't depends on a maybe broken LAN cable - I used two > different ones with the same results. > > So, I don't think that it's cifs module issue and there is no need to > open a bug on bugzilla.samba.org. It seems that it's the problem with > the network driver or with the LAN card from my laptop. > > Make sense? Yes ... but it brings up the obvious question ... what happens if cifs signing is turned on? -- Thanks, Steve -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html