checked that disk utilization had been normal during the incident. cephfs slow performance could not seemingly be attributed to osd. All HDD, no ssd in fact. I found this thread related: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/B7K6B5VXM3I7TODM4GRF3N7S254O5ETY/ Does it have something to do with mds_max_caps_per_client? Given the volume contains 100K dir/files. Thanks Eugen Block <eblock@xxxxxx> 于2023年11月9日周四 19:02写道: > Do you see a high disk utilization? Are those OSDs hdd-only or at > least have their db on SSDs? I'd say the HDDs are the bottleneck. > There was a recent thread [1] where Zakhar explained nicely how many > IOPS you can expect from a hdd-only cluster. Maybe that helps. > > [1] > > https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/FPMCNPYIHBHIJLWVVG2ECI2DSTR6DZIO/ > > Zitat von Ben <ruidong.gao@xxxxxxxxx>: > > > Dear cephers, > > > > we have a cephfs volume, that will be mounted by many clients with > > concurrent read/write capability. From time to time, maybe when > concurrency > > goes as high as 100 clients' access, accessing it will become very slow > to > > be useful at all. > > the cluster has multiple active mds. All disks are hdd. > > Any ideas to improve this? > > > > here is one of mds log during the slow time, others are simillar: > > > > {"log":"debug 2023-11-08T07:26:00.114+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; > oldest > > blocked for \u003e 5.662970 > > secs\n","stream":"stderr","time":"2023-11-08T07:26:00.121996282Z"} > > > > {"log":"debug 2023-11-08T07:26:00.114+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 5.662970 seconds old, > > received at 2023-11-08T07:25:54.458863+0000: > > peer_request:client.12917739:8654334 currently > > dispatched\n","stream":"stderr","time":"2023-11-08T07:26:00.122016551Z"} > > > > {"log":"debug 2023-11-08T07:29:54.118+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; > oldest > > blocked for \u003e 11.900602 > > secs\n","stream":"stderr","time":"2023-11-08T07:29:54.124567293Z"} > > > > {"log":"debug 2023-11-08T07:29:54.118+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 11.900601 seconds old, > > received at 2023-11-08T07:29:42.223813+0000: > > client_request(client.27494331:18564666 getattr pAsLsXsFs #0x70001830366 > > 2023-11-08T07:29:42.219416+0000 caller_uid=0, caller_gid=0{}) currently > > failed to rdlock, > > waiting\n","stream":"stderr","time":"2023-11-08T07:29:54.124589613Z"} > > > > {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : 5 slow requests, 5 included below; > oldest > > blocked for \u003e 17.900670 > > secs\n","stream":"stderr","time":"2023-11-08T07:30:00.124691442Z"} > > > > {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 17.900670 seconds old, > > received at 2023-11-08T07:29:42.223813+0000: > > client_request(client.27494331:18564666 getattr pAsLsXsFs #0x70001830366 > > 2023-11-08T07:29:42.219416+0000 caller_uid=0, caller_gid=0{}) currently > > failed to rdlock, > > waiting\n","stream":"stderr","time":"2023-11-08T07:30:00.124726772Z"} > > > > {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 6.649942 seconds old, > > received at 2023-11-08T07:29:53.474541+0000: client_request(mds.1:305661 > > rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5 > > #0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently failed to > > > acquire_locks\n","stream":"stderr","time":"2023-11-08T07:30:00.124731626Z"} > > > > {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 6.649864 seconds old, > > received at 2023-11-08T07:29:53.474619+0000: client_request(mds.1:305662 > > rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5 > > #0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently requesting > > remote > > authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124734415Z"} > > > > {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 6.649719 seconds old, > > received at 2023-11-08T07:29:53.474764+0000: > > client_request(client.27497255:25173 getattr pAsLsXsFs #0x800019f1da7 > > 2023-11-08T07:29:53.473182+0000 caller_uid=0, caller_gid=0{}) currently > > requesting remote > > authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124736973Z"} > > > > {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 6.648454 seconds old, > > received at 2023-11-08T07:29:53.476029+0000: client_request(mds.1:305663 > > rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5 > > #0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently requesting > > remote > > authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124739607Z"} > > > > {"log":"debug 2023-11-08T07:43:30.127+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : 2 slow requests, 2 included below; > oldest > > blocked for \u003e 5.206645 > > secs\n","stream":"stderr","time":"2023-11-08T07:43:30.133682292Z"} > > > > {"log":"debug 2023-11-08T07:43:30.127+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 5.206644 seconds old, > > received at 2023-11-08T07:43:24.926862+0000: > > client_request(client.27430891:5371608 mkdir #0x700018317cd/13 > > 2023-11-08T07:43:24.924423+0000 caller_uid=0, caller_gid=0{}) currently > > submit entry: > > > journal_and_reply\n","stream":"stderr","time":"2023-11-08T07:43:30.133708161Z"} > > > > {"log":"debug 2023-11-08T07:43:30.127+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 5.206209 seconds old, > > received at 2023-11-08T07:43:24.927297+0000: > > client_request(client.27430891:5371609 create > > > #0x2000d8a535a/2d69440b170ada3dd3670b5f3e2ebe4d75eeb274ff2a8238a23162e95601 > > 2023-11-08T07:43:24.924423+0000 caller_uid=0, caller_gid=0{}) currently > > submit entry: > > > journal_and_reply\n","stream":"stderr","time":"2023-11-08T07:43:30.133711838Z"} > > > > {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : 5 slow requests, 5 included below; > oldest > > blocked for \u003e 7.332759 > > secs\n","stream":"stderr","time":"2023-11-08T07:50:18.138195062Z"} > > > > {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 7.318159 seconds old, > > received at 2023-11-08T07:50:10.819829+0000: > > client_request(client.27041435:5802294 getattr pAsLsXsFs #0x7000182f285 > > 2023-11-08T07:50:10.815700+0000 caller_uid=0, caller_gid=0{}) currently > > dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138221837Z"} > > > > {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 7.305700 seconds old, > > received at 2023-11-08T07:50:10.832287+0000: > > client_request(client.27041435:5802298 getattr pAsLsXsFs #0x7000182f285 > > 2023-11-08T07:50:10.827700+0000 caller_uid=0, caller_gid=0{}) currently > > dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138225662Z"} > > > > {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 6.977220 seconds old, > > received at 2023-11-08T07:50:11.160767+0000: > > client_request(client.27041435:5802299 getattr pAsLsXsFs #0x7000182f285 > > 2023-11-08T07:50:11.155701+0000 caller_uid=0, caller_gid=0{}) currently > > dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138228595Z"} > > > > {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 7.332758 seconds old, > > received at 2023-11-08T07:50:10.805229+0000: > > client_request(client.27041435:5802289 getattr pAsLsXsFs #0x7000182f285 > > 2023-11-08T07:50:10.799700+0000 caller_uid=0, caller_gid=0{}) currently > > failed to rdlock, > > waiting\n","stream":"stderr","time":"2023-11-08T07:50:18.138231253Z"} > > > > {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 7.327715 seconds old, > > received at 2023-11-08T07:50:10.810272+0000: > > client_request(client.27041435:5802290 getattr pAsLsXsFs #0x7000182f285 > > 2023-11-08T07:50:10.803700+0000 caller_uid=0, caller_gid=0{}) currently > > dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138234251Z"} > > > > {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : 5 slow requests, 5 included below; > oldest > > blocked for \u003e 13.332825 > > secs\n","stream":"stderr","time":"2023-11-08T07:50:24.138201305Z"} > > > > {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 13.318225 seconds old, > > received at 2023-11-08T07:50:10.819829+0000: > > client_request(client.27041435:5802294 getattr pAsLsXsFs #0x7000182f285 > > 2023-11-08T07:50:10.815700+0000 caller_uid=0, caller_gid=0{}) currently > > dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.138221584Z"} > > > > {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 13.305766 seconds old, > > received at 2023-11-08T07:50:10.832287+0000: > > client_request(client.27041435:5802298 getattr pAsLsXsFs #0x7000182f285 > > 2023-11-08T07:50:10.827700+0000 caller_uid=0, caller_gid=0{}) currently > > dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.13822536Z"} > > > > {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 12.977286 seconds old, > > received at 2023-11-08T07:50:11.160767+0000: > > client_request(client.27041435:5802299 getattr pAsLsXsFs #0x7000182f285 > > 2023-11-08T07:50:11.155701+0000 caller_uid=0, caller_gid=0{}) currently > > dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.138242952Z"} > > > > {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 13.332825 seconds old, > > received at 2023-11-08T07:50:10.805229+0000: > > client_request(client.27041435:5802289 getattr pAsLsXsFs #0x7000182f285 > > 2023-11-08T07:50:10.799700+0000 caller_uid=0, caller_gid=0{}) currently > > failed to rdlock, > > waiting\n","stream":"stderr","time":"2023-11-08T07:50:24.138245661Z"} > > > > {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 13.327781 seconds old, > > received at 2023-11-08T07:50:10.810272+0000: > > client_request(client.27041435:5802290 getattr pAsLsXsFs #0x7000182f285 > > 2023-11-08T07:50:10.803700+0000 caller_uid=0, caller_gid=0{}) currently > > dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.13824822Z"} > > > > {"log":"debug 2023-11-08T07:53:54.134+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; > oldest > > blocked for \u003e 5.641663 > > secs\n","stream":"stderr","time":"2023-11-08T07:53:54.140575892Z"} > > > > {"log":"debug 2023-11-08T07:53:54.134+0000 7f190b014700 0 > > log_channel(cluster) log [WRN] : slow request 5.641662 seconds old, > > received at 2023-11-08T07:53:48.498707+0000: > > client_request(client.12072730:30349041 rename > > > #0x70001831e1d/ec603fcfb5f191c00efd5f87d2fb96cbd04a40928b9d1f11744d7ef6d742 > > #0x7000182f285/e9e5e277 2023-11-08T07:53:48.497133+0000 caller_uid=0, > > caller_gid=0{}) currently acquired > > locks\n","stream":"stderr","time":"2023-11-08T07:53:54.140612721Z"} > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx