On Fri, 30 Dec 2011 20:00:46 +0200 Konstantinos Skarlatos <k.skarlatos@xxxxxxxxx> wrote: > On 30/12/2011 3:11 μμ, Jeff Layton wrote: > > On Fri, 30 Dec 2011 11:04:59 +0200 > > Konstantinos Skarlatos<k.skarlatos@xxxxxxxxx> wrote: > > > >> On 29/12/2011 3:54 μμ, Konstantinos Skarlatos wrote: > >>> On Πέμπτη, 29 Δεκέμβριος 2011 3:39:30 μμ, Jeff Layton wrote: > >>>> On Thu, 29 Dec 2011 12:30:18 +0200 > >>>> Konstantinos Skarlatos<k.skarlatos@xxxxxxxxx> wrote: > >>>> > >>>>> On 29/12/2011 4:04 πμ, Jeff Layton wrote: > >>>>>> On Thu, 29 Dec 2011 02:08:57 +0200 > >>>>>> Konstantinos Skarlatos<k.skarlatos@xxxxxxxxx> wrote: > >>>>>> > >>>>>>> I mount via cifs a windows XP share, df gives me correct sizes, > >>>>>>> but when > >>>>>>> I ls the mount point i get input/output error. > >>>>>>> strace: http://pastebin.com/WXf8M1nu > >>>>>>> > >>>>>>> mount --verbose -t cifs -o username=administrator,password=blahblah > >>>>>>> //192.168.0.11/jobs /mnt/backups/montaz/jobs > >>>>>>> mount.cifs kernel mount options: > >>>>>>> ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=******** > >>>>>>> > >>>>>>> > >>>>>>> df > >>>>>>> //192.168.0.11/jobs 114464 > >>>>>>> 105196 9268 92% /mnt/backups/montaz/jobs > >>>>>>> > >>>>>>> ls /mnt/backups/montaz/jobs/ > >>>>>>> ls: reading directory /mnt/backups/montaz/jobs/: Input/output error > >>>>>>> total 0 > >>>>>>> > >>>>>>> the fun thing is that i can cd to a lower level directory, and ls > >>>>>>> works > >>>>>>> fine there! only the mount point has the problem > >>>>>>> > >>>>>>> ls /mnt/backups/montaz/jobs/test > >>>>>>> total 44K > >>>>>>> drwxr-xr-x 1 root root 0 Apr 30 2010 blah blah/ > >>>>>>> ...... > >>>>>>> > >>>>>>> kernel version 3.2rc7 > >>>>>>> > >>>>>>> this seems to be related to : > >>>>>>> https://lkml.org/lkml/2011/8/1/427 > >>>>>>> Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for > >>>>>>> large dirs > >>>>>>> > >>>>>> Hmmm, maybe. What makes you think that it's related? What sort of > >>>>>> server are you seeing this against? > >>>>> Windows XP service pack 2 (greek) > >>>> > >>>> How many files are in the directory? > >>>> > >>> 140 folders and 20 files > >>> > >> Attached is a tcp dump of my session. > > I tried reproducing this here, but wasn't able to. Testing against my > > xp box worked fine. > > > > Most likely, the FIND_FILE responses are falling afoul of the code in > > coalesce_t2 or check2ndT2. Unfortunately that code is pretty > > complicated and I'm not certain what the problem actually is... > > > > One thing that's interesting is that the total data being sent in the > > request is rather large (16336 bytes). I think that's legit, but maybe > > it's exceeding the end of the buffer once we try to coalesce it. > > > > Would it be possible to get the cFYI output from this test? > I did not get a cFYI output from that test, but i redid a > mount-ls-umount and am attaching the tcpdump > Also here http://pastebin.com/J20uC6kU you can find the cifsFYI and the > contents of /proc/fs/cifs/DebugData form the same test > > > > Is this a regression? Did it work with earlier kernels and only > > recently start failing? > > > I do not know, and i am a bit afraid to downgrade this machine below 3.0 > due to some changes arch linux has introduced recently. I can always set > up a few virtual machines though, and i can even request permission from > my company to give you shell access if you like. Which kernel versions > would you like me to test? Ok, that tells us a little: -------------------[snip]--------------------- [96268.787078] fs/cifs/cifssmb.c: In FindFirst for [96268.787083] fs/cifs/transport.c: For smb_command 50 [96268.787086] fs/cifs/transport.c: Sending smb: total_len 88 ...FIND_FIRST command is sent [96268.787690] fs/cifs/connect.c: RFC1002 header 0x1104 [96268.787697] fs/cifs/connect.c: missing 12048 bytes from transact2, check next response [96268.787865] fs/cifs/connect.c: RFC1002 header 0x1104 [96268.787870] fs/cifs/connect.c: missing 12036 bytes from transact2, check next response [96268.788037] fs/cifs/connect.c: RFC1002 header 0x1104 [96268.788042] fs/cifs/connect.c: missing 12036 bytes from transact2, check next response [96268.788371] fs/cifs/connect.c: RFC1002 header 0xdb0 [96268.788375] fs/cifs/connect.c: missing 12888 bytes from transact2, check next response ...all four parts of the response are collected here [96268.788391] fs/cifs/transport.c: cifs_sync_mid_result: cmd=50 mid=12 state=16 ...but the state at this point is MID_RESPONSE_MALFORMED [96268.788395] fs/cifs/cifssmb.c: Error in FindFirst = -5 [96268.788397] fs/cifs/readdir.c: initiate cifs search rc -5 [96268.788398] fs/cifs/readdir.c: CIFS VFS: leaving cifs_readdir (xid = 737644) rc = -5 ...which makes readdir return -EIO -------------------[snip]--------------------- Based on that, it looks like something in one of these frames caused coalesce_t2() to return an error. I don't see the problem right offhand in the capture, but T2 response handling is pretty complex so it can be hard to see. Would it be possible for you to rebuild your kernel (or just cifs.ko) with this patch? Once you do that, rerun the test with cFYI turned up, and it should help point out what the problem is. Thanks, -- Jeff Layton <jlayton@xxxxxxxxxx>
>From 3ae6fb6404d150c5ac026b7b16cd1c6b17e6f1ad Mon Sep 17 00:00:00 2001 From: Jeff Layton <jlayton@xxxxxxxxxx> Date: Sat, 31 Dec 2011 06:53:26 -0500 Subject: [PATCH] cifs: better instrumentation for coalesce_t2 When coalesce_t2 returns an error, have it throw a cFYI message that explains the reason. Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> --- fs/cifs/connect.c | 29 +++++++++++++++++++++-------- 1 files changed, 21 insertions(+), 8 deletions(-) diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 8cd4b52..cbf3b78 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -243,16 +243,21 @@ static int coalesce_t2(struct smb_hdr *psecond, struct smb_hdr *pTargetSMB) remaining = total_data_size - total_in_buf; - if (remaining < 0) + if (remaining < 0) { + cFYI(1, "Server sent too much data. total_data_size=%hu " + "total_in_buf=%hu", total_data_size, total_in_buf); return -EPROTO; + } - if (remaining == 0) /* nothing to do, ignore */ + if (remaining == 0) { + /* nothing to do, ignore */ + cFYI(1, "no more data remains"); return 0; + } total_in_buf2 = get_unaligned_le16(&pSMB2->t2_rsp.DataCount); - if (remaining < total_in_buf2) { + if (remaining < total_in_buf2) cFYI(1, "transact2 2nd response contains too much data"); - } /* find end of first SMB data area */ data_area_of_target = (char *)&pSMBt->hdr.Protocol + @@ -267,23 +272,29 @@ static int coalesce_t2(struct smb_hdr *psecond, struct smb_hdr *pTargetSMB) /* copy second buffer into end of first buffer */ total_in_buf += total_in_buf2; /* is the result too big for the field? */ - if (total_in_buf > USHRT_MAX) + if (total_in_buf > USHRT_MAX) { + cFYI(1, "coalesced total_in_buf too large (%u)", total_in_buf); return -EPROTO; + } put_unaligned_le16(total_in_buf, &pSMBt->t2_rsp.DataCount); /* fix up the BCC */ byte_count = get_bcc(pTargetSMB); byte_count += total_in_buf2; /* is the result too big for the field? */ - if (byte_count > USHRT_MAX) + if (byte_count > USHRT_MAX) { + cFYI(1, "coalesced BCC too large (%u)", byte_count); return -EPROTO; + } put_bcc(byte_count, pTargetSMB); byte_count = be32_to_cpu(pTargetSMB->smb_buf_length); byte_count += total_in_buf2; /* don't allow buffer to overflow */ - if (byte_count > CIFSMaxBufSize) + if (byte_count > CIFSMaxBufSize) { + cFYI(1, "coalesced BCC exceeds buffer size (%u)", byte_count); return -ENOBUFS; + } pTargetSMB->smb_buf_length = cpu_to_be32(byte_count); memcpy(data_area_of_target, data_area_of_buf2, total_in_buf2); @@ -291,8 +302,10 @@ static int coalesce_t2(struct smb_hdr *psecond, struct smb_hdr *pTargetSMB) if (remaining == total_in_buf2) { cFYI(1, "found the last secondary response"); return 0; /* we are done */ - } else /* more responses to go */ + } else {/* more responses to go */ + cFYI(1, "waiting for more secondary responses"); return 1; + } } static void -- 1.7.7.4