Fw: CIFS mount fails if I ctrl-c a long-running find process (Linux mounting Windows share)

Tim Perry <tim.perry@xxxxxxxxxxxxxxxxxxxxxxxx> · Wed, 19 Dec 2012 11:30:32 -0800 (PST)

Dear Jeff, et. al.,

I can reproduce the problem by starting "find . -name \*.ext"and killing it when connected to either of our two Windows 2003 Servers. I can *not* reproduce it doing the same thing connected to a windows 7 box.

$ uname -a
Linux servername 3.2.0-34-generic #53-Ubuntu SMP Thu Nov 15 10:49:02 UTC 2012 i686 i686 i386 GNU/Linux
$ cat /proc/version

Linux version 3.2.0-34-generic (buildd@roseapple) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #53-Ubuntu SMP Thu Nov 15 10:49:02 UTC 2012
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 12.04.1 LTS
Release:        12.04
Codename:       precise

I tried using strace but hitting ctrl-c killed strace (obviously, oops), but interestingly, this did *not* hang the file system. I will try and kill the find command (kill -9 perhaps?) and see if I can recreate the error that way.

CONTINUING HERE:
I don't think strace on the find command will help because it isn't making the network connections. CIFS is making the network connections. Maybe I can cause the mount to happen with an strace version of CIFS?  How would I do that?

Anyhow, I opened two terminal windows and proceeded as follows:

In terminal 1:

$ strace find . -name \*adzzz >& ~/straceFind.txt

In terminal 2:
$ ps aux | grep find | grep -v strace
perry     2583 12.6  0.0   4792  1088 pts/5    R+   11:27   0:00 find . -name *adzzz
perry     2585  0.0  0.0   4388   828 pts/2    S+   11:27   0:00 grep find
$ kill -9 2583

File system dies.

I've attaced the straceFind.txt, but it just shows find walking the filesystem tree:
statat64(AT_FDCWD, "0010", {st_mode=S_IFDIR|0777, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(AT_FDCWD, "0010", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 5
fchdir(5)                               = 0
getdents64(5, /* 14 entries */, 32768)  = 448
getdents64(5, /* 0 entries */, 32768)   = 0
close(5)                                = 0
fstatat64(AT_FDCWD, "_vti_cnf", {st_mode=S_IFDIR|0777, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(AT_FDCWD, "_vti_cnf", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 5
fchdir(5)                               = 0
getdents64(5, /* 13 entries */, 32768)  = 416
getdents64(5, /* 0 entries */, 32768)   = 0
close(5)                                = 0
open("..", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW) = 5
fstat64(5, {st_mode=S_IFDIR|0777, st_size=0, ...}) = 0
fchdir(

Ideas?

Tim

----- Original Message -----
From: Jeff Layton <jlayton@xxxxxxxxxx>
To: Tim Perry <tdparmor-sambabugs@xxxxxxxxx>
Cc: "linux-cifs@xxxxxxxxxxxxxxx" <linux-cifs@xxxxxxxxxxxxxxx>
Sent: Wednesday, December 19, 2012 6:03 AM
Subject: Re: CIFS mount fails if I ctrl-c a long-running find process (Linux mounting Windows share)

On Tue, 18 Dec 2012 15:16:56 -0800 (PST)
Tim Perry <tdparmor-sambabugs@xxxxxxxxx> wrote:

> We have a windows 2003 server with a directory shared. I connect to it from linux (Ubuntu 12.04) using CIFS. Once mounted, I can read/write files, list files, et cetera from the linux box. However, if I run "find . -name \*txt" and hit ctrl-c before the find command finishes then the mount immediately fails (the find command takes about 15 minutes to complete on this particular file system). Not only that, once the mount fails I can't successfully remount the drive without rebooting the linux box. This makes me think a kernel bug is in play.... Occasionally using lsof or fuser I can find a process that is keeping the mount alive, kill it, and remount the filesystem without rebooting. However, this is rare.
> 
> Perhaps I'm just missing some setting to CIFS, but I can't figure it out. I know what you are thinking "Use the google, Luke" which I have tried. However, I get about 5000 hits for "I kan't mount my Windozes on my linux, kan u give m3 teh codez pleaze" which is almost, but not quite my issue.
> 
> Anyhow, /etc/fstab contains:
> //server/All_proj /server/all_proj cifs sec=ntlmv2i,iocharset=utf8,uid=perry,credentials=/home/me/.smbcred,noauto,filemode=0777,dir_mode=0777,_netdev,gid=5000 0 0
> 
> Around the time the mount fails, I see this in /var/log/syslog:
> Dec 13 10:40:35 chinstrap kernel: [ 2412.028311] CIFS VFS: Unexpected SMB signature
> Dec 13 10:40:35 chinstrap kernel: [ 2412.028318] CIFS VFS: Send error in Close = -13
> Dec 13 10:40:35 chinstrap kernel: [ 2412.028558] CIFS VFS: Unexpected SMB signature
> Dec 13 10:40:35 chinstrap kernel: [ 2412.028886] CIFS VFS: Unexpected SMB signature
> Dec 13 10:40:35 chinstrap kernel: [ 2412.029395] CIFS VFS: Unexpected SMB signature
> ....
> 
> I also see the following in /var/log/kern.log
> Dec 13 10:01:29 chinstrap kernel: [   66.567463] CIFS VFS: default security mechanism requested.  The default security mechanism will be upgraded from ntlm to ntlmv2 in kernel release 3.3
> Dec 13 10:40:35 chinstrap kernel: [ 2412.028311] CIFS VFS: Unexpected SMB signature
> Dec 13 10:40:35 chinstrap kernel: [ 2412.028318] CIFS VFS: Send error in Close = -13
> Dec 13 10:40:35 chinstrap kernel: [ 2412.028558] CIFS VFS: Unexpected SMB signature
> Dec 13 10:40:35 chinstrap kernel: [ 2412.028886] CIFS VFS: Unexpected SMB signature
> Dec 13 10:40:35 chinstrap kernel: [ 2412.029395] CIFS VFS: Unexpected SMB signature
> ....
> 
> So....I know there must be somewhere to report this as a bug and beg for help, but I have no idea where? Do I go to the Ubuntu folks at Canonical? To the Linux kernel team? Here?
> 

I have no idea what kernel 12.04 uses. Can you tell us the kernel
version? As far as I know, Ubuntu generally pulls their kernels
directly from upstream ones (or the stable series) so this is probably
a problem for mainline kernels too.

It sounds like signals are somehow causing the sequence numbers for
signatures to get out-of-whack. -13 is -EACCES, so it sounds like the
server is just returning errors and not disconnecting the socket when
this happens. That behavior varies between different windows versions...

The reason you need to reboot to clear this is probably because
something is holding a reference to the connection to the server. When
you try to unmount and remount it, then it ends up reusing the same
socket which is obviously still having problems.

The best thing you could do is to provide as simple a way as possible
to reproduce this problem. You may also want to do some investigation
on your own by stracing the process.

Is it getting hung on a particular syscall before you ever issue your
signal? If not, then does it get hung on a syscall afterward? What
state is the process in at that time?

-- 
Jeff Layton <jlayton@xxxxxxxxxx>

--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html