Sorry - I answered this email thread from my netapp account and didn't 'cc the lists. -->Andy On Oct 10, 2013, at 10:19 AM, "Adamson, Andy" <William.Adamson@xxxxxxxxxx> wrote: > > On Oct 10, 2013, at 10:03 AM, "Mkrtchyan, Tigran" <tigran.mkrtchyan@xxxxxxx> > wrote: > >> Not only. As was able to reproduce it and fix on the server, >> we see that at the end client will send only one CLOSE. > > I don't understand. If it is fixed on the server, then the client will send an OPEN, get an openstateid - say OS-1 , do a LAYOUTGET, and READ to the DS using OS-1. The server then returns BAD stateid on the READ. > > The client then goes through stateid recovery, which means issuing another OPEN to get OS-2, which is then used for the DS READS > > the client then CLOSE the file using OS-2. > > Are you saying that the client does not close using OS-1? Note that is impossible, as OS-1 is a BAD stateid…. > > -->Andy > >> >> Tigran. >> >> ----- Original Message ----- >>> From: "Andy Adamson" <William.Adamson@xxxxxxxxxx> >>> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@xxxxxxx> >>> Sent: Thursday, October 10, 2013 3:55:55 PM >>> Subject: Re: DoS with NFSv4.1 client >>> >>> OK - so it's a server bug, >>> >>> -->Andy >>> >>> On Oct 10, 2013, at 5:56 AM, "Mkrtchyan, Tigran" <tigran.mkrtchyan@xxxxxxx> >>> wrote: >>> >>>> >>>> >>>> Today we was 'luck' to have such situation at day time. >>>> Here is what happens: >>>> >>>> The client sends an OPEN and gets an open state id. >>>> This is followed by LAYOUTGET ... and READ to DS. >>>> At some point, server returns back BAD_STATEID. >>>> This triggers client to issue a new OPEN and use >>>> new open stateid with READ request to DS. As new >>>> stateid is not known to DS, it keeps returning >>>> BAD_STATEID and becomes an infinite loop. >>>> >>>> Regards, >>>> Tigran. >>>> >>>> >>>> >>>> ----- Original Message ----- >>>>> From: "Tigran Mkrtchyan" <tigran.mkrtchyan@xxxxxxx> >>>>> To: linux-nfs@xxxxxxxxxxxxxxx >>>>> Cc: "Andy Adamson" <william.adamson@xxxxxxxxxx>, "Steve Dickson" >>>>> <steved@xxxxxxxxxx> >>>>> Sent: Wednesday, October 9, 2013 10:48:32 PM >>>>> Subject: DoS with NFSv4.1 client >>>>> >>>>> >>>>> Hi, >>>>> >>>>> last night we got a DoS attack with one of the NFS clients. >>>>> The farm node, which was accessing data with pNFS, >>>>> went mad and have tried to kill dCache NFS server. As usually >>>>> this have happened over night and we was not able to >>>>> get a network traffic or bump the debug level. >>>>> >>>>> The symptoms are: >>>>> >>>>> client starts to bombard the MDS with OPEN requests. As we see >>>>> state created on the server side, the requests was processed by >>>>> server. Nevertheless, for some reason, client did not like it. Here >>>>> is the result of mountstats: >>>>> >>>>> OPEN: >>>>> 17087065 ops (99%) 1 retrans (0%) 0 major timeouts >>>>> avg bytes sent per op: 356 avg bytes received per op: 455 >>>>> backlog wait: 0.014707 RTT: 4.535704 total execute time: 4.574094 >>>>> (milliseconds) >>>>> CLOSE: >>>>> 290 ops (0%) 0 retrans (0%) 0 major timeouts >>>>> avg bytes sent per op: 247 avg bytes received per op: 173 >>>>> backlog wait: 308.827586 RTT: 1748.479310 total execute time: >>>>> 2057.365517 >>>>> (milliseconds) >>>>> >>>>> >>>>> As you can see there is a quite a big difference between number of open >>>>> and >>>>> close requests. >>>>> The same picture we can see on the server side as well: >>>>> >>>>> NFSServerV41 Stats: average±stderr(ns) min(ns) >>>>> max(ns) Sampes >>>>> DESTROY_SESSION 26056±4511.89 13000 >>>>> 97000 17 >>>>> OPEN 1197297± 0.00 816000 >>>>> 31924558000 54398533 >>>>> RESTOREFH 0± 0.00 0 >>>>> 25018778000 54398533 >>>>> SEQUENCE 1000± 0.00 1000 >>>>> 26066722000 55601046 >>>>> LOOKUP 4607959± 0.00 375000 >>>>> 26977455000 32118 >>>>> GETDEVICEINFO 13158±100.88 4000 >>>>> 655000 11378 >>>>> CLOSE 16236211± 0.00 5000 >>>>> 21021819000 20420 >>>>> LAYOUTGET 271736361± 0.00 10003000 >>>>> 68414723000 21095 >>>>> >>>>> The last column is the number of requests. >>>>> >>>>> This is with RHEL6.4 as the client. By looking at the code, >>>>> I can see a loop at nfs4proc.c#nfs4_do_open() which can be >>>>> the cause of the problem. Nevertheless, I can't >>>>> fine any reason why this look turned into an 'infinite' one. >>>>> >>>>> At the and our server ran out of memory and we have returned >>>>> NFSERR_SERVERFAULT to the client. This triggered client to >>>>> reestablish the session and all open state ids was >>>>> invalidated and cleaned up. >>>>> >>>>> I am still trying to reproduce this behavior (on client >>>>> and server) and any hint is welcome. >>>>> >>>>> Tigran. >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>> >>> > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html