Dear experts,we are operating a compute cluster with a number of storage systems serving data via NFS via LACP-bonded 10 Gbit/s links (usually 2 or 4).
From time to time, users may overload a server and use up all available bandwidth for hours and sometimes days at a time and I am currently exploring if we could use tools like nftlb[1] to achieve the following:
(a) For (NFS|any) traffic have at least two tiers (farms?), one are standard compute nodes and one are interactive log-in nodes/web servers. The former should have a much lower bandwidth priority as the latter. These targets can easily be distinguished by IP ranges/netmasks.
(b) Sometimes we need to perform bandwidth intensive/sensitive operations, e.g. just today I would like to move a user's file system from one too busy box to another but sending the ZFS snapshot over via mbuffer takes really long and I would like to prioritize this connection as well, i.e. I know the source and destination IP as well as target TCP port.
I am still learning to migrate from iptables to nft while I stumbled over nftlb which looks to support what I want, but I am not sure yet - and I have not found many documents describing potential set-ups and thus I wanted to ask the experts here first.
Base of all our systems is currently Debian 10 (buster) and thus kernel 4.19.
Cheers and thanks a lot in advance for any insights/pointers/...! Carsten [1] https://github.com/zevenet/nftlb -- Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics, Callinstraße 38, 30167 Hannover, Germany, Phone +49 511 762 17185
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature