Re: Frequent reconnections / session startups?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If you think that the disconnects are due to timeouts accessing files
on offline storage you can also try mounting with the "hard" mount
option.  The mount parm "echo_interval" can be also increased to make
it less likely that we give up on an unresponsive server (it defaults
to 60 seconds and can set to maximum of "echo_interval=600" ie 600
seconds).

There were many fixes relating to crediting and reconnection that went
in almost a year ago, but would not be in an older kernel like 4.15
unless Ubuntu backported them.   Fortunately, Ubuntu makes it very
easy to test if the fix is in a newer kernel by installing (as a test)
a newer kernel on your client for doing an experiment like this (see
https://wiki.ubuntu.com/Kernel/MainlineBuilds).

If after installing a more recent mainline kernel as a quick test, if
you don't see the reconnect problem, this would make it easier to ask
Ubuntu to backport the various reconnect fixes marked for stable that
went in late last year (or you could continue to use the more recent
kernel).

Also note that it is possible with dynamic tracing now in cifs.ko to
do easier tracing of reconnect events (or all cifs events "trace-cmd
record -e cifs") which can sometimes help narrow down the cause.
Reconnect statistics are also updated in /proc/fs/cifs/Stats

On Mon, Aug 26, 2019 at 1:57 AM James Wettenhall
<james.wettenhall@xxxxxxxxxx> wrote:
>
> Hi,
>
> We run a Django / Celery application which makes heavy use of CIFS
> mounts.  We are experiencing frequent reconnections / session startups
> and would like to understand how to avoid hammering the CIFS server
> and/or the authentication server.  We've had multiple reports of
> DoS-like hammering from server admins, causing frequent
> re-authentication attempts and in one case causing core dumps on the
> CIFS server.
>
> Our CIFS client VMs have the following:
>
> OS: Ubuntu 18.04.3
> Kernel: 4.15.0-58-generic
> mount.cifs: 6.8
>
> Current mount options:
> rw,relatime,vers=3.0,sec=ntlmssp,cache=strict,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1
>
> We don't run the CIFS server, but we can request any information
> required to diagnose the issue.
>
> Over the past 10 hours, one of our virtual machine's kernel log has accumulated:
>
> 8453 kern.log messages including "CIFS"
>
> To break that down, we have:
>
> 8305 "Free previous auth_key.response" messages
> 111 "validate protocol negotiate failed: -11" messages
> 26 "Close unmatched open" messages
> 7 "has not responded in 120 seconds" messages
> 4  "cifs_mount failed w/return code = -11" messages
>
> The server is an HSM (Hierarchical Storage Management) system, so it
> can be slow to respond if our application requests a file which is
> only available on tape, not on disk.
>
> The most common operation our application is performing on the
> CIFS-mounted files is calculating MD5 checksums - with many Celery
> worker processes running concurrently.
>
> We would appreciate any advice on how to investigate further.
>
> Thanks,
> James



-- 
Thanks,

Steve



[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux