Octopus MDS hang under heavy setfattr load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



One of my colleagues attempted to set quotas on a large number (some
dozens) of users with the session below, but it caused the MDS to hang and
reject client requests.

Offending command was:

cat recent-users | xargs -P16 -I% setfattr -n ceph.quota.max_bytes -v
8796093022208 /scratch/%

Result was to hang /scratch and any other mounts managed by the same MDS on
all clients.
Status of ceph-mds while broken was:

root@cnx-14:~# systemctl status ceph-mds@cnx-14
● ceph-mds@cnx-14.service - Ceph metadata server daemon
   Loaded: loaded (/lib/systemd/system/ceph-mds@.service; indirect;
vendor preset: enabled)
   Active: active (running) since Thu 2021-05-06 17:16:45 AEST; 1
weeks 3 days ago
 Main PID: 2385 (ceph-mds)
    Tasks: 23
   CGroup: /system.slice/system-ceph\x2dmds.slice/ceph-mds@cnx-14.service
           └─2385 /usr/bin/ceph-mds -f --cluster ceph --id cnx-14
--setuser ceph --setgroup cephMay 13 06:25:01 cnx-14 ceph-mds[2385]:
2021-05-13T06:25:01.724+1000 7f5444832700 -1 received  signal: Hangup
from killall -q -1 ceph-mon ceph-m
May 13 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-13T06:25:01.736+1000
7f5444832700 -1 received  signal: Hangup from  (PID: 229281) UID: 0
May 14 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-14T06:25:01.992+1000
7f5444832700 -1 received  signal: Hangup from killall -q -1 ceph-mon
ceph-m
May 14 06:25:02 cnx-14 ceph-mds[2385]: 2021-05-14T06:25:02.004+1000
7f5444832700 -1 received  signal: Hangup from  (PID: 232464) UID: 0
May 15 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-15T06:25:01.468+1000
7f5444832700 -1 received  signal: Hangup from killall -q -1 ceph-mon
ceph-m
May 15 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-15T06:25:01.480+1000
7f5444832700 -1 received  signal: Hangup from  (PID: 236005) UID: 0
May 16 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-16T06:25:01.989+1000
7f5444832700 -1 received  signal: Hangup from killall -q -1 ceph-mon
ceph-m
May 16 06:25:02 cnx-14 ceph-mds[2385]: 2021-05-16T06:25:02.001+1000
7f5444832700 -1 received  signal: Hangup from  (PID: 239260) UID: 0
May 17 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-17T06:25:01.813+1000
7f5444832700 -1 received  signal: Hangup from killall -q -1 ceph-mon
ceph-m
May 17 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-17T06:25:01.829+1000
7f5444832700 -1 received  signal: Hangup from  (PID: 242044) UID: 0

Fix was to run:

systemctl restart ceph-mds@cnx-14

Non parallelised run of xargs with sleep 1 between each iteration worked.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux