On Wed, 12 Jan 2011 11:49:04 -0500 Jeff Layton <jlayton@xxxxxxxxxx> wrote: > On Wed, 12 Jan 2011 10:34:22 +0100 > "Benjamin S." <da_joind@xxxxxxx> wrote: > > > > > > > dmesg Output after I have tried to suspend my computer: > > > > [334447.728980] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer > > [334447.729525] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer > > [334447.729571] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer > > [334447.729979] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer > > [334447.730806] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer > > [334447.730853] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer > > [334447.730918] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer > > [334447.734428] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer > > [334447.734465] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer > > [347809.421490] PM: Syncing filesystems ... done. > > [347809.647465] Freezing user space processes ... (elapsed 0.01 seconds) done. > > [347809.663090] Freezing remaining freezable tasks ... > > [347829.678854] Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0): > > [347829.678873] cifsd S ffff880127f7b1b0 0 1821 2 0x00800000 > > [347829.678883] ffff880127f7b1b0 0000000000000046 ffff88005fe008a8 ffff8800ffffffff > > [347829.678890] ffff880127cee6b0 0000000000011100 ffff880127737fd8 0000000000004000 > > [347829.678897] ffff880127737fd8 0000000000011100 ffff880127f7b1b0 ffff880127736010 > > [347829.678904] Call Trace: > > [347829.678915] [<ffffffff811e85dd>] ? sk_reset_timer+0xf/0x19 > > [347829.678921] [<ffffffff8122cf3f>] ? tcp_connect+0x43c/0x445 > > [347829.678928] [<ffffffff8123374e>] ? tcp_v4_connect+0x40d/0x47f > > [347829.678935] [<ffffffff8126ce41>] ? schedule_timeout+0x21/0x1ad > > [347829.678942] [<ffffffff8126e358>] ? _raw_spin_lock_bh+0x9/0x1f > > [347829.678947] [<ffffffff811e81c7>] ? release_sock+0x19/0xef > > [347829.678953] [<ffffffff8123e8be>] ? inet_stream_connect+0x14c/0x24a > > [347829.678961] [<ffffffff8104485b>] ? autoremove_wake_function+0x0/0x2a > > [347829.678986] [<ffffffffa02ccfe2>] ? ipv4_connect+0x39c/0x3b5 [cifs] > > [347829.678991] [<ffffffffa02cd7b7>] ? cifs_reconnect+0x1fc/0x28a [cifs] > > [347829.678999] [<ffffffffa02cdbdc>] ? cifs_demultiplex_thread+0x397/0xb9f [cifs] > > [347829.679003] [<ffffffff81076afc>] ? perf_event_exit_task+0xb9/0x1bf > > [347829.679007] [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs] > > [347829.679012] [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs] > > [347829.679014] [<ffffffff810444a1>] ? kthread+0x7a/0x82 > > [347829.679018] [<ffffffff81002d14>] ? kernel_thread_helper+0x4/0x10 > > [347829.679020] [<ffffffff81044427>] ? kthread+0x0/0x82 > > [347829.679022] [<ffffffff81002d10>] ? kernel_thread_helper+0x0/0x10 > > [347829.679036] > > [347829.679037] Restarting tasks ... done. > > [347829.679862] video LNXVIDEO:00: Restoring backlight state > > > > > > client : > > ii cifs-utils 2:4.5-2 Common Internet File System utilities > > ii samba 2:3.4.8~dfsg-2 SMB/CIFS file, print, and login server for Unix > > ii samba-common 2:3.4.8~dfsg-2 common files used by both the Samba server and client > > ii samba-common-bin 2:3.4.8~dfsg-2 common files used by both the Samba server and client > > > > shares are mounted with mount.cifs > > > > > > server: > > ii samba 2:3.5.6~dfsg-3 SMB/CIFS file, print, and login server for Unix > > ii samba-common 2:3.5.6~dfsg-3 common files used by both the Samba server and client > > ii samba-common-bin 2:3.5.6~dfsg-3 common files used by both the Samba server and client > > > > > > I tried to suspend multiple times, but every time I got the same > > stack trace. Before I tried to suspend I thought the shares are > > responding slower than they normally do. > > > > Looks like it's stuck down in the TCP connect routines. I suspect that > it takes longer than 20s for a connect attempt to time out and the task > is stuck sleeping for longer than that. > > The problem is likely similar to this bug: > > https://bugzilla.kernel.org/show_bug.cgi?id=11050 > > There are a set of patches waiting to be merged for 2.6.38 that change > the timeout and reconnect behavior with CIFS that may paper over the > problem. > > Other than that, I'm not sure what we can do as cifsd is blocked > waiting for the connection to complete. cifsd unfortunately was > designed to work similarly to a userspace thread, and can't easily take > advantage of the socket callback routines to do a non-blocking connect. > Benjamin, would you be able to test this patch? It should apply to the current mainline tree. It builds cleanly, but I haven't tested it yet... ---------------[snip]----------------- [PATCH] cifs: set socket send and receive timeouts before attempting connect Benjamin S. reported that he was unable to suspend his machine while it had a cifs share mounted. The freezer caused this to spew when he tried it: -----------------------[snip]------------------ [347809.421490] PM: Syncing filesystems ... done. [347809.647465] Freezing user space processes ... (elapsed 0.01 seconds) done. [347809.663090] Freezing remaining freezable tasks ... [347829.678854] Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0): [347829.678873] cifsd S ffff880127f7b1b0 0 1821 2 0x00800000 [347829.678883] ffff880127f7b1b0 0000000000000046 ffff88005fe008a8 ffff8800ffffffff [347829.678890] ffff880127cee6b0 0000000000011100 ffff880127737fd8 0000000000004000 [347829.678897] ffff880127737fd8 0000000000011100 ffff880127f7b1b0 ffff880127736010 [347829.678904] Call Trace: [347829.678915] [<ffffffff811e85dd>] ? sk_reset_timer+0xf/0x19 [347829.678921] [<ffffffff8122cf3f>] ? tcp_connect+0x43c/0x445 [347829.678928] [<ffffffff8123374e>] ? tcp_v4_connect+0x40d/0x47f [347829.678935] [<ffffffff8126ce41>] ? schedule_timeout+0x21/0x1ad [347829.678942] [<ffffffff8126e358>] ? _raw_spin_lock_bh+0x9/0x1f [347829.678947] [<ffffffff811e81c7>] ? release_sock+0x19/0xef [347829.678953] [<ffffffff8123e8be>] ? inet_stream_connect+0x14c/0x24a [347829.678961] [<ffffffff8104485b>] ? autoremove_wake_function+0x0/0x2a [347829.678986] [<ffffffffa02ccfe2>] ? ipv4_connect+0x39c/0x3b5 [cifs] [347829.678991] [<ffffffffa02cd7b7>] ? cifs_reconnect+0x1fc/0x28a [cifs] [347829.678999] [<ffffffffa02cdbdc>] ? cifs_demultiplex_thread+0x397/0xb9f [cifs] [347829.679003] [<ffffffff81076afc>] ? perf_event_exit_task+0xb9/0x1bf [347829.679007] [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs] [347829.679012] [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs] [347829.679014] [<ffffffff810444a1>] ? kthread+0x7a/0x82 [347829.679018] [<ffffffff81002d14>] ? kernel_thread_helper+0x4/0x10 [347829.679020] [<ffffffff81044427>] ? kthread+0x0/0x82 [347829.679022] [<ffffffff81002d10>] ? kernel_thread_helper+0x0/0x10 [347829.679036] [347829.679037] Restarting tasks ... done. -----------------------[snip]------------------ We do attempt to perform a try_to_freeze in cifs_reconnect, but the connection attempt itself seems to be taking longer than 20s to time out. The connect timeout is governed by the socket send and receive timeouts, so we can shorten that period by setting those timeouts before attempting the connect instead of after. Reported-by: Benjamin S <da_joind@xxxxxxx> Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> tried it: --- fs/cifs/connect.c | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 99a5f18..32c2f55 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -2290,14 +2290,6 @@ generic_ip_connect(struct TCP_Server_Info *server) if (rc < 0) return rc; - rc = socket->ops->connect(socket, saddr, slen, 0); - if (rc < 0) { - cFYI(1, "Error %d connecting to server", rc); - sock_release(socket); - server->ssocket = NULL; - return rc; - } - /* * Eventually check for other socket options to change from * the default. sock_setsockopt not used because it expects @@ -2326,6 +2318,14 @@ generic_ip_connect(struct TCP_Server_Info *server) socket->sk->sk_sndbuf, socket->sk->sk_rcvbuf, socket->sk->sk_rcvtimeo); + rc = socket->ops->connect(socket, saddr, slen, 0); + if (rc < 0) { + cFYI(1, "Error %d connecting to server", rc); + sock_release(socket); + server->ssocket = NULL; + return rc; + } + if (sport == htons(RFC1001_PORT)) rc = ip_rfc1001_connect(server); -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html