Re: CIFS mount fails if I ctrl-c a long-running find process (Linux mounting Windows share)

Jeff Layton <jlayton@xxxxxxxxxx> · Thu, 20 Dec 2012 09:38:06 -0500

On Wed, 19 Dec 2012 11:30:32 -0800 (PST)
Tim Perry <tim.perry@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:

> Dear Jeff, et. al.,
> 
> 
> I can reproduce the problem by starting "find . -name \*.ext"and killing it when connected to either of our two Windows 2003 Servers. I can *not* reproduce it doing the same thing connected to a windows 7 box.
> 
> $ uname -a
> Linux servername 3.2.0-34-generic #53-Ubuntu SMP Thu Nov 15 10:49:02 UTC 2012 i686 i686 i386 GNU/Linux
> $ cat /proc/version
> 
> Linux version 3.2.0-34-generic (buildd@roseapple) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #53-Ubuntu SMP Thu Nov 15 10:49:02 UTC 2012
> $ lsb_release -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description:    Ubuntu 12.04.1 LTS
> Release:        12.04
> Codename:       precise
> 
> 
> I tried using strace but hitting ctrl-c killed strace (obviously, oops), but interestingly, this did *not* hang the file system. I will try and kill the find command (kill -9 perhaps?) and see if I can recreate the error that way.
> 
> CONTINUING HERE:
> I don't think strace on the find command will help because it isn't making the network connections. CIFS is making the network connections. Maybe I can cause the mount to happen with an strace version of CIFS?  How would I do that?
> 
> Anyhow, I opened two terminal windows and proceeded as follows:
> 
> In terminal 1:
> 
> $ strace find . -name \*adzzz >& ~/straceFind.txt
> 
> 
> In terminal 2:
> $ ps aux | grep find | grep -v strace
> perry     2583 12.6  0.0   4792  1088 pts/5    R+   11:27   0:00 find . -name *adzzz
> perry     2585  0.0  0.0   4388   828 pts/2    S+   11:27   0:00 grep find
> $ kill -9 2583
> 
> File system dies.
> 
> I've attaced the straceFind.txt, but it just shows find walking the filesystem tree:
> statat64(AT_FDCWD, "0010", {st_mode=S_IFDIR|0777, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
> openat(AT_FDCWD, "0010", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 5
> fchdir(5)                               = 0
> getdents64(5, /* 14 entries */, 32768)  = 448
> getdents64(5, /* 0 entries */, 32768)   = 0
> close(5)                                = 0
> fstatat64(AT_FDCWD, "_vti_cnf", {st_mode=S_IFDIR|0777, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
> openat(AT_FDCWD, "_vti_cnf", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 5
> fchdir(5)                               = 0
> getdents64(5, /* 13 entries */, 32768)  = 416
> getdents64(5, /* 0 entries */, 32768)   = 0
> close(5)                                = 0
> open("..", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW) = 5
> fstat64(5, {st_mode=S_IFDIR|0777, st_size=0, ...}) = 0
> fchdir(
> 
> 
> Ideas?
> 

That kernel is pretty old, so you may want to try a more recent one.

You may first want to start by tracing with wireshark -- see what's
happening on the wire before and after the signal is delivered.

If it works against win7 then it's likely that win7 disconnects the
socket when the signatures are wrong. With that, we'd reestablish the
connection and things would start working again. I suspect that win2k8
just starts returning an error that we map to -EACCES.

It's possible that we should disconnect the client when the signatures
start looking wrong, but I think we need to understand why signals are
causing this issue in the first place.

There are some places where we do interruptible sleeps (vs. killable
ones). It's possible that SIGINT (which is what ^c generally delivers)
is causing havok there.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html