Re: Per user rate limiter

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Fri, 1 Jul 2022 18:22:50 +0000

On Fri, 2022-07-01 at 19:58 +0200, Mkrtchyan, Tigran wrote:
> 
> Hi NFS folks,
> 
> reticently we got a kind of DDoS from one of our user: 5k jobs ware
> aggressively
> reading a handful number of files. Of course we have an overload
> protection,
> however, such a large number of requests by a single user didn't give
> other
> users a chance to perform any IO. As we extensively use pNFS, such
> user behavior
> makes some DSes not available to other users.
> 
> To address this issues, we are looking at some kind of per user
> principal
> rate limiter. All users will get some IO portion and if there is no
> requests
> from other users, then a single user can get it all. Not ideal
> solution, of
> course, but a good starting point.
> 
> So, the question is how tell the aggressive user to back-off?
> Delaying the response
> will block all other requests from the same host for other users.
> Returning
> NFS4ERR_DELAY will have the same effect (this is what we do now).
> NFSv4.1 session
> slots are client wide, thus, any increase or decrease per client id
> will
> either give more slots to aggressive user or reduce for all other as
> well.
> 
> Are there any developments in the direction of per-client (cgroups or
> namespaces)
> timeout/error handling? Are there a nfs client friendly solutions,
> better that
> returning NFS4ERR_DELAY?
> 

Here are a few suggestions:

1) Recall the layout from the offending client
2) Define QoS policies for the connections using the kernel Traffic
Control mechanisms
3) Use mirroring/replication to allow read access to the same files
through multiple data servers.
4) Use NFS re-exporting in order to reduce the load on the data
servers.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx