Re: one cephfs volume becomes very slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Did you check how many caps the clients are using? The thread you refer to contains some instructions.

Zitat von Ben <ruidong.gao@xxxxxxxxx>:

checked that disk utilization had been normal during the incident. cephfs
slow performance could not seemingly be attributed to osd.

All HDD, no ssd in fact.

I found this thread related:

https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/B7K6B5VXM3I7TODM4GRF3N7S254O5ETY/

Does it have something to do with mds_max_caps_per_client? Given the volume
contains 100K dir/files.

Thanks

Eugen Block <eblock@xxxxxx> 于2023年11月9日周四 19:02写道:

Do you see a high disk utilization? Are those OSDs hdd-only or at
least have their db on SSDs? I'd say the HDDs are the bottleneck.
There was a recent thread [1] where Zakhar explained nicely how many
IOPS you can expect from a hdd-only cluster. Maybe that helps.

[1]

https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/FPMCNPYIHBHIJLWVVG2ECI2DSTR6DZIO/

Zitat von Ben <ruidong.gao@xxxxxxxxx>:

> Dear cephers,
>
> we have a cephfs volume, that will be mounted by many clients with
> concurrent read/write capability. From time to time, maybe when
concurrency
> goes as high as 100 clients' access, accessing it will become very slow
to
> be useful at all.
> the cluster has multiple active mds. All disks are hdd.
> Any ideas to improve this?
>
> here is one of mds log during the slow time, others are simillar:
>
> {"log":"debug 2023-11-08T07:26:00.114+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : 1 slow requests, 1 included below;
oldest
> blocked for \u003e 5.662970
> secs\n","stream":"stderr","time":"2023-11-08T07:26:00.121996282Z"}
>
> {"log":"debug 2023-11-08T07:26:00.114+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 5.662970 seconds old,
> received at 2023-11-08T07:25:54.458863+0000:
> peer_request:client.12917739:8654334 currently
> dispatched\n","stream":"stderr","time":"2023-11-08T07:26:00.122016551Z"}
>
> {"log":"debug 2023-11-08T07:29:54.118+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : 1 slow requests, 1 included below;
oldest
> blocked for \u003e 11.900602
> secs\n","stream":"stderr","time":"2023-11-08T07:29:54.124567293Z"}
>
> {"log":"debug 2023-11-08T07:29:54.118+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 11.900601 seconds old,
> received at 2023-11-08T07:29:42.223813+0000:
> client_request(client.27494331:18564666 getattr pAsLsXsFs #0x70001830366
> 2023-11-08T07:29:42.219416+0000 caller_uid=0, caller_gid=0{}) currently
> failed to rdlock,
> waiting\n","stream":"stderr","time":"2023-11-08T07:29:54.124589613Z"}
>
> {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : 5 slow requests, 5 included below;
oldest
> blocked for \u003e 17.900670
> secs\n","stream":"stderr","time":"2023-11-08T07:30:00.124691442Z"}
>
> {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 17.900670 seconds old,
> received at 2023-11-08T07:29:42.223813+0000:
> client_request(client.27494331:18564666 getattr pAsLsXsFs #0x70001830366
> 2023-11-08T07:29:42.219416+0000 caller_uid=0, caller_gid=0{}) currently
> failed to rdlock,
> waiting\n","stream":"stderr","time":"2023-11-08T07:30:00.124726772Z"}
>
> {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 6.649942 seconds old,
> received at 2023-11-08T07:29:53.474541+0000: client_request(mds.1:305661
> rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5
> #0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently failed to
>
acquire_locks\n","stream":"stderr","time":"2023-11-08T07:30:00.124731626Z"}
>
> {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 6.649864 seconds old,
> received at 2023-11-08T07:29:53.474619+0000: client_request(mds.1:305662
> rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5
> #0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently requesting
> remote
> authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124734415Z"}
>
> {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 6.649719 seconds old,
> received at 2023-11-08T07:29:53.474764+0000:
> client_request(client.27497255:25173 getattr pAsLsXsFs #0x800019f1da7
> 2023-11-08T07:29:53.473182+0000 caller_uid=0, caller_gid=0{}) currently
> requesting remote
> authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124736973Z"}
>
> {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 6.648454 seconds old,
> received at 2023-11-08T07:29:53.476029+0000: client_request(mds.1:305663
> rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5
> #0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently requesting
> remote
> authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124739607Z"}
>
> {"log":"debug 2023-11-08T07:43:30.127+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : 2 slow requests, 2 included below;
oldest
> blocked for \u003e 5.206645
> secs\n","stream":"stderr","time":"2023-11-08T07:43:30.133682292Z"}
>
> {"log":"debug 2023-11-08T07:43:30.127+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 5.206644 seconds old,
> received at 2023-11-08T07:43:24.926862+0000:
> client_request(client.27430891:5371608 mkdir #0x700018317cd/13
> 2023-11-08T07:43:24.924423+0000 caller_uid=0, caller_gid=0{}) currently
> submit entry:
>
journal_and_reply\n","stream":"stderr","time":"2023-11-08T07:43:30.133708161Z"}
>
> {"log":"debug 2023-11-08T07:43:30.127+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 5.206209 seconds old,
> received at 2023-11-08T07:43:24.927297+0000:
> client_request(client.27430891:5371609 create
>
#0x2000d8a535a/2d69440b170ada3dd3670b5f3e2ebe4d75eeb274ff2a8238a23162e95601
> 2023-11-08T07:43:24.924423+0000 caller_uid=0, caller_gid=0{}) currently
> submit entry:
>
journal_and_reply\n","stream":"stderr","time":"2023-11-08T07:43:30.133711838Z"}
>
> {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : 5 slow requests, 5 included below;
oldest
> blocked for \u003e 7.332759
> secs\n","stream":"stderr","time":"2023-11-08T07:50:18.138195062Z"}
>
> {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 7.318159 seconds old,
> received at 2023-11-08T07:50:10.819829+0000:
> client_request(client.27041435:5802294 getattr pAsLsXsFs #0x7000182f285
> 2023-11-08T07:50:10.815700+0000 caller_uid=0, caller_gid=0{}) currently
> dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138221837Z"}
>
> {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 7.305700 seconds old,
> received at 2023-11-08T07:50:10.832287+0000:
> client_request(client.27041435:5802298 getattr pAsLsXsFs #0x7000182f285
> 2023-11-08T07:50:10.827700+0000 caller_uid=0, caller_gid=0{}) currently
> dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138225662Z"}
>
> {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 6.977220 seconds old,
> received at 2023-11-08T07:50:11.160767+0000:
> client_request(client.27041435:5802299 getattr pAsLsXsFs #0x7000182f285
> 2023-11-08T07:50:11.155701+0000 caller_uid=0, caller_gid=0{}) currently
> dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138228595Z"}
>
> {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 7.332758 seconds old,
> received at 2023-11-08T07:50:10.805229+0000:
> client_request(client.27041435:5802289 getattr pAsLsXsFs #0x7000182f285
> 2023-11-08T07:50:10.799700+0000 caller_uid=0, caller_gid=0{}) currently
> failed to rdlock,
> waiting\n","stream":"stderr","time":"2023-11-08T07:50:18.138231253Z"}
>
> {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 7.327715 seconds old,
> received at 2023-11-08T07:50:10.810272+0000:
> client_request(client.27041435:5802290 getattr pAsLsXsFs #0x7000182f285
> 2023-11-08T07:50:10.803700+0000 caller_uid=0, caller_gid=0{}) currently
> dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138234251Z"}
>
> {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : 5 slow requests, 5 included below;
oldest
> blocked for \u003e 13.332825
> secs\n","stream":"stderr","time":"2023-11-08T07:50:24.138201305Z"}
>
> {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 13.318225 seconds old,
> received at 2023-11-08T07:50:10.819829+0000:
> client_request(client.27041435:5802294 getattr pAsLsXsFs #0x7000182f285
> 2023-11-08T07:50:10.815700+0000 caller_uid=0, caller_gid=0{}) currently
> dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.138221584Z"}
>
> {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 13.305766 seconds old,
> received at 2023-11-08T07:50:10.832287+0000:
> client_request(client.27041435:5802298 getattr pAsLsXsFs #0x7000182f285
> 2023-11-08T07:50:10.827700+0000 caller_uid=0, caller_gid=0{}) currently
> dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.13822536Z"}
>
> {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 12.977286 seconds old,
> received at 2023-11-08T07:50:11.160767+0000:
> client_request(client.27041435:5802299 getattr pAsLsXsFs #0x7000182f285
> 2023-11-08T07:50:11.155701+0000 caller_uid=0, caller_gid=0{}) currently
> dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.138242952Z"}
>
> {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 13.332825 seconds old,
> received at 2023-11-08T07:50:10.805229+0000:
> client_request(client.27041435:5802289 getattr pAsLsXsFs #0x7000182f285
> 2023-11-08T07:50:10.799700+0000 caller_uid=0, caller_gid=0{}) currently
> failed to rdlock,
> waiting\n","stream":"stderr","time":"2023-11-08T07:50:24.138245661Z"}
>
> {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 13.327781 seconds old,
> received at 2023-11-08T07:50:10.810272+0000:
> client_request(client.27041435:5802290 getattr pAsLsXsFs #0x7000182f285
> 2023-11-08T07:50:10.803700+0000 caller_uid=0, caller_gid=0{}) currently
> dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.13824822Z"}
>
> {"log":"debug 2023-11-08T07:53:54.134+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : 1 slow requests, 1 included below;
oldest
> blocked for \u003e 5.641663
> secs\n","stream":"stderr","time":"2023-11-08T07:53:54.140575892Z"}
>
> {"log":"debug 2023-11-08T07:53:54.134+0000 7f190b014700  0
> log_channel(cluster) log [WRN] : slow request 5.641662 seconds old,
> received at 2023-11-08T07:53:48.498707+0000:
> client_request(client.12072730:30349041 rename
>
#0x70001831e1d/ec603fcfb5f191c00efd5f87d2fb96cbd04a40928b9d1f11744d7ef6d742
> #0x7000182f285/e9e5e277 2023-11-08T07:53:48.497133+0000 caller_uid=0,
> caller_gid=0{}) currently acquired
> locks\n","stream":"stderr","time":"2023-11-08T07:53:54.140612721Z"}
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux