Dear cephers, we have a cephfs volume, that will be mounted by many clients with concurrent read/write capability. From time to time, maybe when concurrency goes as high as 100 clients' access, accessing it will become very slow to be useful at all. the cluster has multiple active mds. All disks are hdd. Any ideas to improve this? here is one of mds log during the slow time, others are simillar: {"log":"debug 2023-11-08T07:26:00.114+0000 7f190b014700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for \u003e 5.662970 secs\n","stream":"stderr","time":"2023-11-08T07:26:00.121996282Z"} {"log":"debug 2023-11-08T07:26:00.114+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 5.662970 seconds old, received at 2023-11-08T07:25:54.458863+0000: peer_request:client.12917739:8654334 currently dispatched\n","stream":"stderr","time":"2023-11-08T07:26:00.122016551Z"} {"log":"debug 2023-11-08T07:29:54.118+0000 7f190b014700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for \u003e 11.900602 secs\n","stream":"stderr","time":"2023-11-08T07:29:54.124567293Z"} {"log":"debug 2023-11-08T07:29:54.118+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 11.900601 seconds old, received at 2023-11-08T07:29:42.223813+0000: client_request(client.27494331:18564666 getattr pAsLsXsFs #0x70001830366 2023-11-08T07:29:42.219416+0000 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting\n","stream":"stderr","time":"2023-11-08T07:29:54.124589613Z"} {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700 0 log_channel(cluster) log [WRN] : 5 slow requests, 5 included below; oldest blocked for \u003e 17.900670 secs\n","stream":"stderr","time":"2023-11-08T07:30:00.124691442Z"} {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 17.900670 seconds old, received at 2023-11-08T07:29:42.223813+0000: client_request(client.27494331:18564666 getattr pAsLsXsFs #0x70001830366 2023-11-08T07:29:42.219416+0000 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting\n","stream":"stderr","time":"2023-11-08T07:30:00.124726772Z"} {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 6.649942 seconds old, received at 2023-11-08T07:29:53.474541+0000: client_request(mds.1:305661 rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5 #0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently failed to acquire_locks\n","stream":"stderr","time":"2023-11-08T07:30:00.124731626Z"} {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 6.649864 seconds old, received at 2023-11-08T07:29:53.474619+0000: client_request(mds.1:305662 rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5 #0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently requesting remote authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124734415Z"} {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 6.649719 seconds old, received at 2023-11-08T07:29:53.474764+0000: client_request(client.27497255:25173 getattr pAsLsXsFs #0x800019f1da7 2023-11-08T07:29:53.473182+0000 caller_uid=0, caller_gid=0{}) currently requesting remote authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124736973Z"} {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 6.648454 seconds old, received at 2023-11-08T07:29:53.476029+0000: client_request(mds.1:305663 rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5 #0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently requesting remote authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124739607Z"} {"log":"debug 2023-11-08T07:43:30.127+0000 7f190b014700 0 log_channel(cluster) log [WRN] : 2 slow requests, 2 included below; oldest blocked for \u003e 5.206645 secs\n","stream":"stderr","time":"2023-11-08T07:43:30.133682292Z"} {"log":"debug 2023-11-08T07:43:30.127+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 5.206644 seconds old, received at 2023-11-08T07:43:24.926862+0000: client_request(client.27430891:5371608 mkdir #0x700018317cd/13 2023-11-08T07:43:24.924423+0000 caller_uid=0, caller_gid=0{}) currently submit entry: journal_and_reply\n","stream":"stderr","time":"2023-11-08T07:43:30.133708161Z"} {"log":"debug 2023-11-08T07:43:30.127+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 5.206209 seconds old, received at 2023-11-08T07:43:24.927297+0000: client_request(client.27430891:5371609 create #0x2000d8a535a/2d69440b170ada3dd3670b5f3e2ebe4d75eeb274ff2a8238a23162e95601 2023-11-08T07:43:24.924423+0000 caller_uid=0, caller_gid=0{}) currently submit entry: journal_and_reply\n","stream":"stderr","time":"2023-11-08T07:43:30.133711838Z"} {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700 0 log_channel(cluster) log [WRN] : 5 slow requests, 5 included below; oldest blocked for \u003e 7.332759 secs\n","stream":"stderr","time":"2023-11-08T07:50:18.138195062Z"} {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 7.318159 seconds old, received at 2023-11-08T07:50:10.819829+0000: client_request(client.27041435:5802294 getattr pAsLsXsFs #0x7000182f285 2023-11-08T07:50:10.815700+0000 caller_uid=0, caller_gid=0{}) currently dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138221837Z"} {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 7.305700 seconds old, received at 2023-11-08T07:50:10.832287+0000: client_request(client.27041435:5802298 getattr pAsLsXsFs #0x7000182f285 2023-11-08T07:50:10.827700+0000 caller_uid=0, caller_gid=0{}) currently dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138225662Z"} {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 6.977220 seconds old, received at 2023-11-08T07:50:11.160767+0000: client_request(client.27041435:5802299 getattr pAsLsXsFs #0x7000182f285 2023-11-08T07:50:11.155701+0000 caller_uid=0, caller_gid=0{}) currently dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138228595Z"} {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 7.332758 seconds old, received at 2023-11-08T07:50:10.805229+0000: client_request(client.27041435:5802289 getattr pAsLsXsFs #0x7000182f285 2023-11-08T07:50:10.799700+0000 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting\n","stream":"stderr","time":"2023-11-08T07:50:18.138231253Z"} {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 7.327715 seconds old, received at 2023-11-08T07:50:10.810272+0000: client_request(client.27041435:5802290 getattr pAsLsXsFs #0x7000182f285 2023-11-08T07:50:10.803700+0000 caller_uid=0, caller_gid=0{}) currently dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138234251Z"} {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700 0 log_channel(cluster) log [WRN] : 5 slow requests, 5 included below; oldest blocked for \u003e 13.332825 secs\n","stream":"stderr","time":"2023-11-08T07:50:24.138201305Z"} {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 13.318225 seconds old, received at 2023-11-08T07:50:10.819829+0000: client_request(client.27041435:5802294 getattr pAsLsXsFs #0x7000182f285 2023-11-08T07:50:10.815700+0000 caller_uid=0, caller_gid=0{}) currently dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.138221584Z"} {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 13.305766 seconds old, received at 2023-11-08T07:50:10.832287+0000: client_request(client.27041435:5802298 getattr pAsLsXsFs #0x7000182f285 2023-11-08T07:50:10.827700+0000 caller_uid=0, caller_gid=0{}) currently dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.13822536Z"} {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 12.977286 seconds old, received at 2023-11-08T07:50:11.160767+0000: client_request(client.27041435:5802299 getattr pAsLsXsFs #0x7000182f285 2023-11-08T07:50:11.155701+0000 caller_uid=0, caller_gid=0{}) currently dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.138242952Z"} {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 13.332825 seconds old, received at 2023-11-08T07:50:10.805229+0000: client_request(client.27041435:5802289 getattr pAsLsXsFs #0x7000182f285 2023-11-08T07:50:10.799700+0000 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting\n","stream":"stderr","time":"2023-11-08T07:50:24.138245661Z"} {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 13.327781 seconds old, received at 2023-11-08T07:50:10.810272+0000: client_request(client.27041435:5802290 getattr pAsLsXsFs #0x7000182f285 2023-11-08T07:50:10.803700+0000 caller_uid=0, caller_gid=0{}) currently dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.13824822Z"} {"log":"debug 2023-11-08T07:53:54.134+0000 7f190b014700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for \u003e 5.641663 secs\n","stream":"stderr","time":"2023-11-08T07:53:54.140575892Z"} {"log":"debug 2023-11-08T07:53:54.134+0000 7f190b014700 0 log_channel(cluster) log [WRN] : slow request 5.641662 seconds old, received at 2023-11-08T07:53:48.498707+0000: client_request(client.12072730:30349041 rename #0x70001831e1d/ec603fcfb5f191c00efd5f87d2fb96cbd04a40928b9d1f11744d7ef6d742 #0x7000182f285/e9e5e277 2023-11-08T07:53:48.497133+0000 caller_uid=0, caller_gid=0{}) currently acquired locks\n","stream":"stderr","time":"2023-11-08T07:53:54.140612721Z"} _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx