So is this a server bug? It seems like the client is behaving correctly... -dros On Oct 10, 2013, at 5:56 AM, "Mkrtchyan, Tigran" <tigran.mkrtchyan@xxxxxxx> wrote: > > > Today we was 'luck' to have such situation at day time. > Here is what happens: > > The client sends an OPEN and gets an open state id. > This is followed by LAYOUTGET ... and READ to DS. > At some point, server returns back BAD_STATEID. > This triggers client to issue a new OPEN and use > new open stateid with READ request to DS. As new > stateid is not known to DS, it keeps returning > BAD_STATEID and becomes an infinite loop. > > Regards, > Tigran. > > > > ----- Original Message ----- >> From: "Tigran Mkrtchyan" <tigran.mkrtchyan@xxxxxxx> >> To: linux-nfs@xxxxxxxxxxxxxxx >> Cc: "Andy Adamson" <william.adamson@xxxxxxxxxx>, "Steve Dickson" <steved@xxxxxxxxxx> >> Sent: Wednesday, October 9, 2013 10:48:32 PM >> Subject: DoS with NFSv4.1 client >> >> >> Hi, >> >> last night we got a DoS attack with one of the NFS clients. >> The farm node, which was accessing data with pNFS, >> went mad and have tried to kill dCache NFS server. As usually >> this have happened over night and we was not able to >> get a network traffic or bump the debug level. >> >> The symptoms are: >> >> client starts to bombard the MDS with OPEN requests. As we see >> state created on the server side, the requests was processed by >> server. Nevertheless, for some reason, client did not like it. Here >> is the result of mountstats: >> >> OPEN: >> 17087065 ops (99%) 1 retrans (0%) 0 major timeouts >> avg bytes sent per op: 356 avg bytes received per op: 455 >> backlog wait: 0.014707 RTT: 4.535704 total execute time: 4.574094 >> (milliseconds) >> CLOSE: >> 290 ops (0%) 0 retrans (0%) 0 major timeouts >> avg bytes sent per op: 247 avg bytes received per op: 173 >> backlog wait: 308.827586 RTT: 1748.479310 total execute time: 2057.365517 >> (milliseconds) >> >> >> As you can see there is a quite a big difference between number of open and >> close requests. >> The same picture we can see on the server side as well: >> >> NFSServerV41 Stats: average±stderr(ns) min(ns) >> max(ns) Sampes >> DESTROY_SESSION 26056±4511.89 13000 >> 97000 17 >> OPEN 1197297± 0.00 816000 >> 31924558000 54398533 >> RESTOREFH 0± 0.00 0 >> 25018778000 54398533 >> SEQUENCE 1000± 0.00 1000 >> 26066722000 55601046 >> LOOKUP 4607959± 0.00 375000 >> 26977455000 32118 >> GETDEVICEINFO 13158±100.88 4000 >> 655000 11378 >> CLOSE 16236211± 0.00 5000 >> 21021819000 20420 >> LAYOUTGET 271736361± 0.00 10003000 >> 68414723000 21095 >> >> The last column is the number of requests. >> >> This is with RHEL6.4 as the client. By looking at the code, >> I can see a loop at nfs4proc.c#nfs4_do_open() which can be >> the cause of the problem. Nevertheless, I can't >> fine any reason why this look turned into an 'infinite' one. >> >> At the and our server ran out of memory and we have returned >> NFSERR_SERVERFAULT to the client. This triggered client to >> reestablish the session and all open state ids was >> invalidated and cleaned up. >> >> I am still trying to reproduce this behavior (on client >> and server) and any hint is welcome. >> >> Tigran. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html