Re: one cephfs volume becomes very slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Do you see a high disk utilization? Are those OSDs hdd-only or at least have their db on SSDs? I'd say the HDDs are the bottleneck. There was a recent thread [1] where Zakhar explained nicely how many IOPS you can expect from a hdd-only cluster. Maybe that helps.

[1] https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/FPMCNPYIHBHIJLWVVG2ECI2DSTR6DZIO/

Zitat von Ben <ruidong.gao@xxxxxxxxx>:

Dear cephers,

we have a cephfs volume, that will be mounted by many clients with
concurrent read/write capability. From time to time, maybe when concurrency
goes as high as 100 clients' access, accessing it will become very slow to
be useful at all.
the cluster has multiple active mds. All disks are hdd.
Any ideas to improve this?

here is one of mds log during the slow time, others are simillar:

{"log":"debug 2023-11-08T07:26:00.114+0000 7f190b014700  0
log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest
blocked for \u003e 5.662970
secs\n","stream":"stderr","time":"2023-11-08T07:26:00.121996282Z"}

{"log":"debug 2023-11-08T07:26:00.114+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 5.662970 seconds old,
received at 2023-11-08T07:25:54.458863+0000:
peer_request:client.12917739:8654334 currently
dispatched\n","stream":"stderr","time":"2023-11-08T07:26:00.122016551Z"}

{"log":"debug 2023-11-08T07:29:54.118+0000 7f190b014700  0
log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest
blocked for \u003e 11.900602
secs\n","stream":"stderr","time":"2023-11-08T07:29:54.124567293Z"}

{"log":"debug 2023-11-08T07:29:54.118+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 11.900601 seconds old,
received at 2023-11-08T07:29:42.223813+0000:
client_request(client.27494331:18564666 getattr pAsLsXsFs #0x70001830366
2023-11-08T07:29:42.219416+0000 caller_uid=0, caller_gid=0{}) currently
failed to rdlock,
waiting\n","stream":"stderr","time":"2023-11-08T07:29:54.124589613Z"}

{"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
log_channel(cluster) log [WRN] : 5 slow requests, 5 included below; oldest
blocked for \u003e 17.900670
secs\n","stream":"stderr","time":"2023-11-08T07:30:00.124691442Z"}

{"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 17.900670 seconds old,
received at 2023-11-08T07:29:42.223813+0000:
client_request(client.27494331:18564666 getattr pAsLsXsFs #0x70001830366
2023-11-08T07:29:42.219416+0000 caller_uid=0, caller_gid=0{}) currently
failed to rdlock,
waiting\n","stream":"stderr","time":"2023-11-08T07:30:00.124726772Z"}

{"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 6.649942 seconds old,
received at 2023-11-08T07:29:53.474541+0000: client_request(mds.1:305661
rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5
#0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently failed to
acquire_locks\n","stream":"stderr","time":"2023-11-08T07:30:00.124731626Z"}

{"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 6.649864 seconds old,
received at 2023-11-08T07:29:53.474619+0000: client_request(mds.1:305662
rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5
#0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently requesting
remote
authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124734415Z"}

{"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 6.649719 seconds old,
received at 2023-11-08T07:29:53.474764+0000:
client_request(client.27497255:25173 getattr pAsLsXsFs #0x800019f1da7
2023-11-08T07:29:53.473182+0000 caller_uid=0, caller_gid=0{}) currently
requesting remote
authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124736973Z"}

{"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 6.648454 seconds old,
received at 2023-11-08T07:29:53.476029+0000: client_request(mds.1:305663
rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5
#0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently requesting
remote
authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124739607Z"}

{"log":"debug 2023-11-08T07:43:30.127+0000 7f190b014700  0
log_channel(cluster) log [WRN] : 2 slow requests, 2 included below; oldest
blocked for \u003e 5.206645
secs\n","stream":"stderr","time":"2023-11-08T07:43:30.133682292Z"}

{"log":"debug 2023-11-08T07:43:30.127+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 5.206644 seconds old,
received at 2023-11-08T07:43:24.926862+0000:
client_request(client.27430891:5371608 mkdir #0x700018317cd/13
2023-11-08T07:43:24.924423+0000 caller_uid=0, caller_gid=0{}) currently
submit entry:
journal_and_reply\n","stream":"stderr","time":"2023-11-08T07:43:30.133708161Z"}

{"log":"debug 2023-11-08T07:43:30.127+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 5.206209 seconds old,
received at 2023-11-08T07:43:24.927297+0000:
client_request(client.27430891:5371609 create
#0x2000d8a535a/2d69440b170ada3dd3670b5f3e2ebe4d75eeb274ff2a8238a23162e95601
2023-11-08T07:43:24.924423+0000 caller_uid=0, caller_gid=0{}) currently
submit entry:
journal_and_reply\n","stream":"stderr","time":"2023-11-08T07:43:30.133711838Z"}

{"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
log_channel(cluster) log [WRN] : 5 slow requests, 5 included below; oldest
blocked for \u003e 7.332759
secs\n","stream":"stderr","time":"2023-11-08T07:50:18.138195062Z"}

{"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 7.318159 seconds old,
received at 2023-11-08T07:50:10.819829+0000:
client_request(client.27041435:5802294 getattr pAsLsXsFs #0x7000182f285
2023-11-08T07:50:10.815700+0000 caller_uid=0, caller_gid=0{}) currently
dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138221837Z"}

{"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 7.305700 seconds old,
received at 2023-11-08T07:50:10.832287+0000:
client_request(client.27041435:5802298 getattr pAsLsXsFs #0x7000182f285
2023-11-08T07:50:10.827700+0000 caller_uid=0, caller_gid=0{}) currently
dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138225662Z"}

{"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 6.977220 seconds old,
received at 2023-11-08T07:50:11.160767+0000:
client_request(client.27041435:5802299 getattr pAsLsXsFs #0x7000182f285
2023-11-08T07:50:11.155701+0000 caller_uid=0, caller_gid=0{}) currently
dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138228595Z"}

{"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 7.332758 seconds old,
received at 2023-11-08T07:50:10.805229+0000:
client_request(client.27041435:5802289 getattr pAsLsXsFs #0x7000182f285
2023-11-08T07:50:10.799700+0000 caller_uid=0, caller_gid=0{}) currently
failed to rdlock,
waiting\n","stream":"stderr","time":"2023-11-08T07:50:18.138231253Z"}

{"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 7.327715 seconds old,
received at 2023-11-08T07:50:10.810272+0000:
client_request(client.27041435:5802290 getattr pAsLsXsFs #0x7000182f285
2023-11-08T07:50:10.803700+0000 caller_uid=0, caller_gid=0{}) currently
dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138234251Z"}

{"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
log_channel(cluster) log [WRN] : 5 slow requests, 5 included below; oldest
blocked for \u003e 13.332825
secs\n","stream":"stderr","time":"2023-11-08T07:50:24.138201305Z"}

{"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 13.318225 seconds old,
received at 2023-11-08T07:50:10.819829+0000:
client_request(client.27041435:5802294 getattr pAsLsXsFs #0x7000182f285
2023-11-08T07:50:10.815700+0000 caller_uid=0, caller_gid=0{}) currently
dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.138221584Z"}

{"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 13.305766 seconds old,
received at 2023-11-08T07:50:10.832287+0000:
client_request(client.27041435:5802298 getattr pAsLsXsFs #0x7000182f285
2023-11-08T07:50:10.827700+0000 caller_uid=0, caller_gid=0{}) currently
dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.13822536Z"}

{"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 12.977286 seconds old,
received at 2023-11-08T07:50:11.160767+0000:
client_request(client.27041435:5802299 getattr pAsLsXsFs #0x7000182f285
2023-11-08T07:50:11.155701+0000 caller_uid=0, caller_gid=0{}) currently
dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.138242952Z"}

{"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 13.332825 seconds old,
received at 2023-11-08T07:50:10.805229+0000:
client_request(client.27041435:5802289 getattr pAsLsXsFs #0x7000182f285
2023-11-08T07:50:10.799700+0000 caller_uid=0, caller_gid=0{}) currently
failed to rdlock,
waiting\n","stream":"stderr","time":"2023-11-08T07:50:24.138245661Z"}

{"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 13.327781 seconds old,
received at 2023-11-08T07:50:10.810272+0000:
client_request(client.27041435:5802290 getattr pAsLsXsFs #0x7000182f285
2023-11-08T07:50:10.803700+0000 caller_uid=0, caller_gid=0{}) currently
dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.13824822Z"}

{"log":"debug 2023-11-08T07:53:54.134+0000 7f190b014700  0
log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest
blocked for \u003e 5.641663
secs\n","stream":"stderr","time":"2023-11-08T07:53:54.140575892Z"}

{"log":"debug 2023-11-08T07:53:54.134+0000 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 5.641662 seconds old,
received at 2023-11-08T07:53:48.498707+0000:
client_request(client.12072730:30349041 rename
#0x70001831e1d/ec603fcfb5f191c00efd5f87d2fb96cbd04a40928b9d1f11744d7ef6d742
#0x7000182f285/e9e5e277 2023-11-08T07:53:48.497133+0000 caller_uid=0,
caller_gid=0{}) currently acquired
locks\n","stream":"stderr","time":"2023-11-08T07:53:54.140612721Z"}
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux