Re: Suspend failed - unable to freeze cifsd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 12 Jan 2011 11:49:04 -0500
Jeff Layton <jlayton@xxxxxxxxxx> wrote:

> On Wed, 12 Jan 2011 10:34:22 +0100
> "Benjamin S." <da_joind@xxxxxxx> wrote:
> 
> > 
> > 
> > dmesg Output after I have tried to suspend my computer:
> > 
> > [334447.728980] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
> > [334447.729525] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
> > [334447.729571] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
> > [334447.729979] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
> > [334447.730806] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
> > [334447.730853] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
> > [334447.730918] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
> > [334447.734428] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
> > [334447.734465] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
> > [347809.421490] PM: Syncing filesystems ... done.
> > [347809.647465] Freezing user space processes ... (elapsed 0.01 seconds) done.
> > [347809.663090] Freezing remaining freezable tasks ...
> > [347829.678854] Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
> > [347829.678873] cifsd         S ffff880127f7b1b0     0  1821      2 0x00800000
> > [347829.678883]  ffff880127f7b1b0 0000000000000046 ffff88005fe008a8 ffff8800ffffffff
> > [347829.678890]  ffff880127cee6b0 0000000000011100 ffff880127737fd8 0000000000004000
> > [347829.678897]  ffff880127737fd8 0000000000011100 ffff880127f7b1b0 ffff880127736010
> > [347829.678904] Call Trace:
> > [347829.678915]  [<ffffffff811e85dd>] ? sk_reset_timer+0xf/0x19
> > [347829.678921]  [<ffffffff8122cf3f>] ? tcp_connect+0x43c/0x445
> > [347829.678928]  [<ffffffff8123374e>] ? tcp_v4_connect+0x40d/0x47f
> > [347829.678935]  [<ffffffff8126ce41>] ? schedule_timeout+0x21/0x1ad
> > [347829.678942]  [<ffffffff8126e358>] ? _raw_spin_lock_bh+0x9/0x1f
> > [347829.678947]  [<ffffffff811e81c7>] ? release_sock+0x19/0xef
> > [347829.678953]  [<ffffffff8123e8be>] ? inet_stream_connect+0x14c/0x24a
> > [347829.678961]  [<ffffffff8104485b>] ? autoremove_wake_function+0x0/0x2a
> > [347829.678986]  [<ffffffffa02ccfe2>] ? ipv4_connect+0x39c/0x3b5 [cifs]
> > [347829.678991]  [<ffffffffa02cd7b7>] ? cifs_reconnect+0x1fc/0x28a [cifs]
> > [347829.678999]  [<ffffffffa02cdbdc>] ? cifs_demultiplex_thread+0x397/0xb9f [cifs]
> > [347829.679003]  [<ffffffff81076afc>] ? perf_event_exit_task+0xb9/0x1bf
> > [347829.679007]  [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs]
> > [347829.679012]  [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs]
> > [347829.679014]  [<ffffffff810444a1>] ? kthread+0x7a/0x82
> > [347829.679018]  [<ffffffff81002d14>] ? kernel_thread_helper+0x4/0x10
> > [347829.679020]  [<ffffffff81044427>] ? kthread+0x0/0x82
> > [347829.679022]  [<ffffffff81002d10>] ? kernel_thread_helper+0x0/0x10
> > [347829.679036]
> > [347829.679037] Restarting tasks ... done.
> > [347829.679862] video LNXVIDEO:00: Restoring backlight state
> > 
> > 
> > client :
> > ii  cifs-utils        2:4.5-2         Common Internet File System utilities
> > ii  samba             2:3.4.8~dfsg-2  SMB/CIFS file, print, and login server for Unix
> > ii  samba-common      2:3.4.8~dfsg-2  common files used by both the Samba server and client
> > ii  samba-common-bin  2:3.4.8~dfsg-2  common files used by both the Samba server and client
> > 
> > shares are mounted with mount.cifs
> > 
> > 
> > server:
> > ii  samba             2:3.5.6~dfsg-3  SMB/CIFS file, print, and login server for Unix
> > ii  samba-common      2:3.5.6~dfsg-3  common files used by both the Samba server and client
> > ii  samba-common-bin  2:3.5.6~dfsg-3  common files used by both the Samba server and client
> > 
> > 
> > I tried to suspend multiple times, but every time I got the same 
> > stack trace. Before I tried to suspend I thought the shares are
> > responding slower than they normally do.
> > 
> 
> Looks like it's stuck down in the TCP connect routines. I suspect that
> it takes longer than 20s for a connect attempt to time out and the task
> is stuck sleeping for longer than that.
> 
> The problem is likely similar to this bug:
> 
>     https://bugzilla.kernel.org/show_bug.cgi?id=11050
> 
> There are a set of patches waiting to be merged for 2.6.38 that change
> the timeout and reconnect behavior with CIFS that may paper over the
> problem.
> 
> Other than that, I'm not sure what we can do as cifsd is blocked
> waiting for the connection to complete. cifsd unfortunately was
> designed to work similarly to a userspace thread, and can't easily take
> advantage of the socket callback routines to do a non-blocking connect.
> 

Benjamin, would you be able to test this patch? It should apply to the
current mainline tree. It builds cleanly, but I haven't tested it yet...

---------------[snip]-----------------
[PATCH] cifs: set socket send and receive timeouts before attempting connect

Benjamin S. reported that he was unable to suspend his machine while
it had a cifs share mounted. The freezer caused this to spew when he
tried it:

-----------------------[snip]------------------
[347809.421490] PM: Syncing filesystems ... done.
[347809.647465] Freezing user space processes ... (elapsed 0.01 seconds)
done.
[347809.663090] Freezing remaining freezable tasks ...
[347829.678854] Freezing of tasks failed after 20.01 seconds (1 tasks
refusing to freeze, wq_busy=0):
[347829.678873] cifsd         S ffff880127f7b1b0     0  1821      2
0x00800000
[347829.678883]  ffff880127f7b1b0 0000000000000046 ffff88005fe008a8
ffff8800ffffffff
[347829.678890]  ffff880127cee6b0 0000000000011100 ffff880127737fd8
0000000000004000
[347829.678897]  ffff880127737fd8 0000000000011100 ffff880127f7b1b0
ffff880127736010
[347829.678904] Call Trace:
[347829.678915]  [<ffffffff811e85dd>] ? sk_reset_timer+0xf/0x19
[347829.678921]  [<ffffffff8122cf3f>] ? tcp_connect+0x43c/0x445
[347829.678928]  [<ffffffff8123374e>] ? tcp_v4_connect+0x40d/0x47f
[347829.678935]  [<ffffffff8126ce41>] ? schedule_timeout+0x21/0x1ad
[347829.678942]  [<ffffffff8126e358>] ? _raw_spin_lock_bh+0x9/0x1f
[347829.678947]  [<ffffffff811e81c7>] ? release_sock+0x19/0xef
[347829.678953]  [<ffffffff8123e8be>] ? inet_stream_connect+0x14c/0x24a
[347829.678961]  [<ffffffff8104485b>] ? autoremove_wake_function+0x0/0x2a
[347829.678986]  [<ffffffffa02ccfe2>] ? ipv4_connect+0x39c/0x3b5 [cifs]
[347829.678991]  [<ffffffffa02cd7b7>] ? cifs_reconnect+0x1fc/0x28a [cifs]
[347829.678999]  [<ffffffffa02cdbdc>] ? cifs_demultiplex_thread+0x397/0xb9f
[cifs]
[347829.679003]  [<ffffffff81076afc>] ? perf_event_exit_task+0xb9/0x1bf
[347829.679007]  [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f
[cifs]
[347829.679012]  [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f
[cifs]
[347829.679014]  [<ffffffff810444a1>] ? kthread+0x7a/0x82
[347829.679018]  [<ffffffff81002d14>] ? kernel_thread_helper+0x4/0x10
[347829.679020]  [<ffffffff81044427>] ? kthread+0x0/0x82
[347829.679022]  [<ffffffff81002d10>] ? kernel_thread_helper+0x0/0x10
[347829.679036]
[347829.679037] Restarting tasks ... done.
-----------------------[snip]------------------

We do attempt to perform a try_to_freeze in cifs_reconnect, but the
connection attempt itself seems to be taking longer than 20s to time
out. The connect timeout is governed by the socket send and receive
timeouts, so we can shorten that period by setting those timeouts
before attempting the connect instead of after.

Reported-by: Benjamin S <da_joind@xxxxxxx>
Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
tried it:
---
 fs/cifs/connect.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 99a5f18..32c2f55 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -2290,14 +2290,6 @@ generic_ip_connect(struct TCP_Server_Info *server)
 	if (rc < 0)
 		return rc;
 
-	rc = socket->ops->connect(socket, saddr, slen, 0);
-	if (rc < 0) {
-		cFYI(1, "Error %d connecting to server", rc);
-		sock_release(socket);
-		server->ssocket = NULL;
-		return rc;
-	}
-
 	/*
 	 * Eventually check for other socket options to change from
 	 * the default. sock_setsockopt not used because it expects
@@ -2326,6 +2318,14 @@ generic_ip_connect(struct TCP_Server_Info *server)
 		 socket->sk->sk_sndbuf,
 		 socket->sk->sk_rcvbuf, socket->sk->sk_rcvtimeo);
 
+	rc = socket->ops->connect(socket, saddr, slen, 0);
+	if (rc < 0) {
+		cFYI(1, "Error %d connecting to server", rc);
+		sock_release(socket);
+		server->ssocket = NULL;
+		return rc;
+	}
+
 	if (sport == htons(RFC1001_PORT))
 		rc = ip_rfc1001_connect(server);
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux