One of my colleagues attempted to set quotas on a large number (some dozens) of users with the session below, but it caused the MDS to hang and reject client requests. Offending command was: cat recent-users | xargs -P16 -I% setfattr -n ceph.quota.max_bytes -v 8796093022208 /scratch/% Result was to hang /scratch and any other mounts managed by the same MDS on all clients. Status of ceph-mds while broken was: root@cnx-14:~# systemctl status ceph-mds@cnx-14 ● ceph-mds@cnx-14.service - Ceph metadata server daemon Loaded: loaded (/lib/systemd/system/ceph-mds@.service; indirect; vendor preset: enabled) Active: active (running) since Thu 2021-05-06 17:16:45 AEST; 1 weeks 3 days ago Main PID: 2385 (ceph-mds) Tasks: 23 CGroup: /system.slice/system-ceph\x2dmds.slice/ceph-mds@cnx-14.service └─2385 /usr/bin/ceph-mds -f --cluster ceph --id cnx-14 --setuser ceph --setgroup cephMay 13 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-13T06:25:01.724+1000 7f5444832700 -1 received signal: Hangup from killall -q -1 ceph-mon ceph-m May 13 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-13T06:25:01.736+1000 7f5444832700 -1 received signal: Hangup from (PID: 229281) UID: 0 May 14 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-14T06:25:01.992+1000 7f5444832700 -1 received signal: Hangup from killall -q -1 ceph-mon ceph-m May 14 06:25:02 cnx-14 ceph-mds[2385]: 2021-05-14T06:25:02.004+1000 7f5444832700 -1 received signal: Hangup from (PID: 232464) UID: 0 May 15 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-15T06:25:01.468+1000 7f5444832700 -1 received signal: Hangup from killall -q -1 ceph-mon ceph-m May 15 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-15T06:25:01.480+1000 7f5444832700 -1 received signal: Hangup from (PID: 236005) UID: 0 May 16 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-16T06:25:01.989+1000 7f5444832700 -1 received signal: Hangup from killall -q -1 ceph-mon ceph-m May 16 06:25:02 cnx-14 ceph-mds[2385]: 2021-05-16T06:25:02.001+1000 7f5444832700 -1 received signal: Hangup from (PID: 239260) UID: 0 May 17 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-17T06:25:01.813+1000 7f5444832700 -1 received signal: Hangup from killall -q -1 ceph-mon ceph-m May 17 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-17T06:25:01.829+1000 7f5444832700 -1 received signal: Hangup from (PID: 242044) UID: 0 Fix was to run: systemctl restart ceph-mds@cnx-14 Non parallelised run of xargs with sleep 1 between each iteration worked. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx