Per user rate limiter

"Mkrtchyan, Tigran" <tigran.mkrtchyan@xxxxxxx> · Fri, 1 Jul 2022 19:58:14 +0200 (CEST)

Hi NFS folks,

reticently we got a kind of DDoS from one of our user: 5k jobs ware aggressively
reading a handful number of files. Of course we have an overload protection,
however, such a large number of requests by a single user didn't give other
users a chance to perform any IO. As we extensively use pNFS, such user behavior
makes some DSes not available to other users.

To address this issues, we are looking at some kind of per user principal
rate limiter. All users will get some IO portion and if there is no requests
from other users, then a single user can get it all. Not ideal solution, of
course, but a good starting point.

So, the question is how tell the aggressive user to back-off? Delaying the response
will block all other requests from the same host for other users. Returning
NFS4ERR_DELAY will have the same effect (this is what we do now). NFSv4.1 session
slots are client wide, thus, any increase or decrease per client id will
either give more slots to aggressive user or reduce for all other as well.

Are there any developments in the direction of per-client (cgroups or namespaces)
timeout/error handling? Are there a nfs client friendly solutions, better that
returning NFS4ERR_DELAY?

Thanks in advance,
   Tigran.
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature