On Fri, 2017-01-20 at 15:30 -0600, Steve French wrote: > A couple quick questions: > 1) I would not expect "hard" vs "soft" mount option makes no > difference here, but just doublechecking > 2) How does smb2 reconnect behave in the same scenario (because we > prefer smb3 to be used if the server is non-Samba)? > > Looks like a fix is doable - see line 1464-1465 of fs/cifs/sess.c > > while (sess_data->func) > sess_data->func(sess_data); > > looking at cifs_reconnect in the case where the ip address is not > available we wait 3 seconds (if needed to retry), and when that > succeeds we schedule delayed work to issue an "echo" (see > cifs_reconnect) and then as we do cifs_reconnect_tcon we could wait > up > to 10 seconds at a time for the socket to come back. If socket is ok > we do a negotiate protocol which is not necessarily retried on > failure > (depending on the request it can return EAGAIN - e.g. > read/write/lock/close). If the negprot succeeds we get to your case > where we call cifs_setup_session in fs/cifs/connect.c which calls > CIFS_SessSetup (in fs/cifs/sess.c) which looks like it will loop on > the sessionsetup retry for the cifs case - which should as you note > rate limit (especially on bad password case). > > I also would like Sachin's feedback as he made some significant > cleanup of session establishment for cifs and rewrote this - wanted > to > see if he wanted to move the throttling of retries differently I think the suggestion is perfectly valid and would be a nice addition to the cifs module. Maybe a better place to add this change would be at cifs_reconnect_tcon() { .. mutex_lock(&ses->session_mutex); rc = cifs_negotiate_protocol(0, ses); if (rc == 0 && ses->need_reconnect) rc = cifs_setup_session(0, ses, nls_codepage); .. } Where in case of EACCES, we can setup a delayed work to unlock ses- >session_mutex set to run after the required interval. Sachin Prabhu > > On Thu, Jan 19, 2017 at 1:48 AM, Valentin Hilbig > <externer.dl.hilbig@xxxxxxxxxxx> wrote: > > Hello Linux Kernel CIFS-List, > > > > please forgive me to ninja-register to the list and start my > > firstpost right > > with the questions. This is done in the hope to save your time. > > The long > > background story is below in case you are interested: > > > > Q1) Is it possible on the CIFS client to implement caching for > > failed > > CIFS/SMB authentication replies? My wish is to cache those > > negative replies > > just a second (HZ), as 3600 retries per hour to re-establish a lost > > connection to a CIFS server seems enough. Enough to succeed and > > enough on > > semi-permanent failures. I'd like to see this 1000ms cache as a > > mount > > default, as it's not for the initial request, just for the > > subsequent > > retries, but setting it to 0 (no cache) is ok for me, too, as it > > then can be > > changed at mount-time. > > > > Q2) As an extension I also would like to see something like a > > maximum retry > > counter, which declares a CIFS mount dead if we do not succeed > > after N > > negative replies. In my case N=40000 (around at least 11 hrs for > > 1s cache > > time) sounds good. However the rate-limiting is much more > > important than > > deactivating a rogue CIFS mount. Hence mount's default should be > > N=0, which > > means, infinite retries (as it is today). > > > > Q3) According to > > https://www.kernel.org/doc/readme/Documentation-filesystems-cifs-RE > > ADME > > these features do not exist (yet). Are such features planned for > > the kernel > > CIFS client module? If not, is there a chance for me to get > > patches > > upstream in case that I provide them? Is there more to think of > > than to > > just follow the style guide (and provide kernel-grade code)? Of > > course I > > will extend the sysctl/proc interface to those new mount options in > > a > > compatible way (or discuss this with the list before I break > > heritage). > > However my patches will be for "our" kernels used here (3.13 and > > 4.4), so > > perhaps this needs some porting/upgrading for the latest (I am not > > sure that > > I get permission to take the time to provide patches to the current > > kernel > > as well). > > > > Sorry if some of those are FAQ, but as gmane.org is down/blank > > currently, I > > do not have access to the archive of kernel.cifs. > > > > If you some better ideas, please feel free to criticize me ;) > > > > Thanks, > > -Tino > > PS: FYI full long (sorry!) details follow in case you are > > interested: > > > > (Sorry for missing logs and plain prose, I have no access to the > > test > > installation ATM, because it belongs to another group.) > > > > Here at LiMux (Linux for Munich) in certain situations (for example > > the user > > has changed the password in LDAP) we observe, that CIFS clients > > might send > > 30 or more failing CIFS-setup-requests per second(!) to the CIFS > > server for > > an existing (old) CIFS-mount. Each of this requests tries to > > (re-)authenticate against AD/LDAP but fails, because the > > credentials are no > > more valid. After a short while the brute force protection of the > > AD kicks > > in and then blocks the AD-client (in this case the CIFS server) > > from > > accessing AD (for a while). Which means, other clients are > > affected by the > > faulty CIFS-mounts and prohibited to authenticate against the CIFS > > server. > > > > The CIFS-Server-people cannot help, as the CIFS' vendor (no, not > > Microsoft) > > tells us to switch off brute-force-protection on AD-side, which is > > something > > we do not want to do for obvious reasons. The AD shall continue to > > block > > IPs with too many wrong requests. So the only option we have is, > > to do > > something against the high rate of AD-requests with a wrong > > password coming > > from CIFS clients. > > > > To observe the effect following must happen: > > > > - There is an old CIFS mount (for example a User's $HOME), which is > > already > > successfully mounted and working. > > > > - The TCP session to the CIFS server breaks (like inactivity or > > some short > > outage on the network. I used "tcpkill" to simulate that), such > > that the > > Kernel's CIFS module needs to re-establish a connection to the CIFS > > server > > for the next access, which then triggers re-authenticating with the > > stored > > credentials. > > > > - This re-authentication fails, due to a password change or locked > > account > > on the AD side. (If it succeeds there will be no problem, as then > > the CIFS > > mount is back to fully functional. The problem starts, when this > > re-authentication does not work.) > > > > - And there also must be some culprit, in my case some user process > > (we > > haven't identified it yet but think it's something like > > Thunderbird), which > > tries to access the CIFS share in some looping fashion. (I used > > "while > > sleep 0.1; do touch /path/to/share/FILE; done" to test it.) > > > > Please note that there are too many possible user space > > applications out > > there which could rapidly hammer a defunct CIFS mount, such that > > you won't > > be able to fix them all. Hence we need a fix on some other level. > > > > (BTW we use version=1 of the protocol, and we require it, upgrading > > 18k of > > Linux workstations plus infrastructure against politics ain't > > easy.) > > > > The CIFS module just forwards the request(s) to the CIFS server, > > and, as the > > TCP-connection is broken, tries to establish a new one. This > > triggers > > authentication, but the authentication fails. So the CIFS-client > > sees a > > negative reply like NT ACCOUNT LOCKED OUT, and answers something > > like > > "permission denied" to the userspace. So far, so correct, > > everything works > > perfectly as it should! > > > > The problem starts when some userspace application starts to loop > > over the > > fault, thereby accessing the CIFS share over and over again, > > several times a > > second. Then the CIFS module continues to do it's job, but it does > > it much > > too perfect. Each single userspace access will try to re-open the > > session > > to the CIFS server, again and again, which means we see a massive > > amount of > > authentication requests to the server which all are doomed. Even > > worse, the > > faster the server and the better the network, the more such failing > > requests > > you will see, of course. This triggers the AD brute force > > protection even > > faster. > > > > However, if those few CIFS-clients, which "freak out", would be > > limited to > > only send 1 request per second, then AD does not see too many > > failed > > requests per timespan, so everything stays operable. > > > > But even if this is implemented, this is only half of the story > > (the > > important half, but there is more to it): > > > > If we had rate-limiting in place the AD and CIFS server are out of > > the loop. > > But we still have the user account locked by the failing AD > > requests. Let's > > start over the case from the beginning under the assumption, that > > we have > > failed authentication reply caching with a 1s retry: > > > > - The user changes his password (perhaps using Windows, not Linux) > > but does > > not log out afterwards (on Linux). > > > > - The TCP-session of the CIFS mount breaks for some reason. > > > > - Some userspace process tries to access this CIFS mount in the > > looping > > fashion. > > > > - The Kernel's CIFS-module tries to re-establish the connection. > > > > - The requests fails due to old credential. (As above. Windows has > > the new > > password, but Linux not.) > > > > - After 5 such false retries (seen from the CIFS-Server) the AD > > locks the > > account. Now the Linux-Client sees NT ACCOUNT LOCKED (sp?). This > > takes 5 > > seconds. > > > > - If the user comes back to work the next day and tries to login, > > his > > account is locked, of course. > > > > - He calls Help Desk to get his account unlocked. They do it. > > > > - But 5s later his account is locked, again. Thanks to 5 retries > > seen from > > the old login on the Linux client. > > > > - Wash, rinse, repeat. > > > > Eventually the user finds out where he is still logged in and logs > > out, such > > that (in our case) the (automated, yet no more working) user's > > CIFS-mounts > > vanish, too. This delays how long it takes until the user can work > > normally, also it usually involves a lot of effort of other people > > to solve > > the riddle where the login hides. > > > > This is why I asked Q2 which would allow us to configure, that > > after 11 > > hours (or so) the CIFS mount ceases to exist, such that the CIFS > > client > > stops trying to re-establish the connection. Which means, the next > > business > > day, the CIFS mount very likely has invalidated (it still is > > mounted, but > > quiet on the Linux side), such that the user can have his password > > unlocked > > without trouble. > > > > This is a tripple-win situation, as it not only helps the Users and > > takes > > the burden from Help Desk to diagnose a hard do diagnose situation, > > it also > > conserves some wasted network bandwidth and processing power due to > > all > > those fruitless authentication requests seen today. Sigh. > > > > I agree that all this is not the fault of the CIFS module. However > > it is > > better to start to be nice and polite to the infrastructure in case > > something stupid happens, than to continue as usual and thereby > > wasting > > resources and possibly impact others, even when you are rightfully > > doing > > this. > > > > (This is a technical list, so I do not introduce myself, because I > > am not > > important. All you need to know is that I know Linux from 0.99 and > > I am > > able to hack the kernel, but until now only for my very own > > needs. BTW, my > > private GitHub is https://github.com/hilbix/) > > > > Thanks for any help or comments, > > > > -Tino > > > > -- > > Mit freundlichen Grüßen > > Valentin Hilbig > > Externer Dienstleister > > > > IT@M - Dienstleister für Informations- und > > Telekommunikationstechnik der > > Landeshauptstadt München > > Geschäftsbereich Werkzeuge und Infrastruktur > > Servicebereich Städtische Arbeitsplätze > > Serviceteam LiMux-Arbeitsplatz I23 > > LiMux-Basisclient > > > > Raum A2.030, Agnes-Pockels-Bogen 21, 80992 München > > > > Tel.: +49 89 233-782273 > > E-Mail: externer.dl.hilbig@xxxxxxxxxxx > > -- > > To unsubscribe from this list: send the line "unsubscribe linux- > > cifs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html