Re: one cephfs volume becomes very slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



checked that disk utilization had been normal during the incident. cephfs
slow performance could not seemingly be attributed to osd.

All HDD, no ssd in fact.

I found this thread related:

https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/B7K6B5VXM3I7TODM4GRF3N7S254O5ETY/

Does it have something to do with mds_max_caps_per_client? Given the volume
contains 100K dir/files.

Thanks

Eugen Block <eblock@xxxxxx> 于2023年11月9日周四 19:02写道:

> Do you see a high disk utilization? Are those OSDs hdd-only or at
> least have their db on SSDs? I'd say the HDDs are the bottleneck.
> There was a recent thread [1] where Zakhar explained nicely how many
> IOPS you can expect from a hdd-only cluster. Maybe that helps.
>
> [1]
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/FPMCNPYIHBHIJLWVVG2ECI2DSTR6DZIO/
>
> Zitat von Ben <ruidong.gao@xxxxxxxxx>:
>
> > Dear cephers,
> >
> > we have a cephfs volume, that will be mounted by many clients with
> > concurrent read/write capability. From time to time, maybe when
> concurrency
> > goes as high as 100 clients' access, accessing it will become very slow
> to
> > be useful at all.
> > the cluster has multiple active mds. All disks are hdd.
> > Any ideas to improve this?
> >
> > here is one of mds log during the slow time, others are simillar:
> >
> > {"log":"debug 2023-11-08T07:26:00.114+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : 1 slow requests, 1 included below;
> oldest
> > blocked for \u003e 5.662970
> > secs\n","stream":"stderr","time":"2023-11-08T07:26:00.121996282Z"}
> >
> > {"log":"debug 2023-11-08T07:26:00.114+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 5.662970 seconds old,
> > received at 2023-11-08T07:25:54.458863+0000:
> > peer_request:client.12917739:8654334 currently
> > dispatched\n","stream":"stderr","time":"2023-11-08T07:26:00.122016551Z"}
> >
> > {"log":"debug 2023-11-08T07:29:54.118+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : 1 slow requests, 1 included below;
> oldest
> > blocked for \u003e 11.900602
> > secs\n","stream":"stderr","time":"2023-11-08T07:29:54.124567293Z"}
> >
> > {"log":"debug 2023-11-08T07:29:54.118+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 11.900601 seconds old,
> > received at 2023-11-08T07:29:42.223813+0000:
> > client_request(client.27494331:18564666 getattr pAsLsXsFs #0x70001830366
> > 2023-11-08T07:29:42.219416+0000 caller_uid=0, caller_gid=0{}) currently
> > failed to rdlock,
> > waiting\n","stream":"stderr","time":"2023-11-08T07:29:54.124589613Z"}
> >
> > {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : 5 slow requests, 5 included below;
> oldest
> > blocked for \u003e 17.900670
> > secs\n","stream":"stderr","time":"2023-11-08T07:30:00.124691442Z"}
> >
> > {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 17.900670 seconds old,
> > received at 2023-11-08T07:29:42.223813+0000:
> > client_request(client.27494331:18564666 getattr pAsLsXsFs #0x70001830366
> > 2023-11-08T07:29:42.219416+0000 caller_uid=0, caller_gid=0{}) currently
> > failed to rdlock,
> > waiting\n","stream":"stderr","time":"2023-11-08T07:30:00.124726772Z"}
> >
> > {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 6.649942 seconds old,
> > received at 2023-11-08T07:29:53.474541+0000: client_request(mds.1:305661
> > rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5
> > #0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently failed to
> >
> acquire_locks\n","stream":"stderr","time":"2023-11-08T07:30:00.124731626Z"}
> >
> > {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 6.649864 seconds old,
> > received at 2023-11-08T07:29:53.474619+0000: client_request(mds.1:305662
> > rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5
> > #0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently requesting
> > remote
> > authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124734415Z"}
> >
> > {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 6.649719 seconds old,
> > received at 2023-11-08T07:29:53.474764+0000:
> > client_request(client.27497255:25173 getattr pAsLsXsFs #0x800019f1da7
> > 2023-11-08T07:29:53.473182+0000 caller_uid=0, caller_gid=0{}) currently
> > requesting remote
> > authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124736973Z"}
> >
> > {"log":"debug 2023-11-08T07:30:00.118+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 6.648454 seconds old,
> > received at 2023-11-08T07:29:53.476029+0000: client_request(mds.1:305663
> > rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5
> > #0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently requesting
> > remote
> > authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124739607Z"}
> >
> > {"log":"debug 2023-11-08T07:43:30.127+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : 2 slow requests, 2 included below;
> oldest
> > blocked for \u003e 5.206645
> > secs\n","stream":"stderr","time":"2023-11-08T07:43:30.133682292Z"}
> >
> > {"log":"debug 2023-11-08T07:43:30.127+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 5.206644 seconds old,
> > received at 2023-11-08T07:43:24.926862+0000:
> > client_request(client.27430891:5371608 mkdir #0x700018317cd/13
> > 2023-11-08T07:43:24.924423+0000 caller_uid=0, caller_gid=0{}) currently
> > submit entry:
> >
> journal_and_reply\n","stream":"stderr","time":"2023-11-08T07:43:30.133708161Z"}
> >
> > {"log":"debug 2023-11-08T07:43:30.127+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 5.206209 seconds old,
> > received at 2023-11-08T07:43:24.927297+0000:
> > client_request(client.27430891:5371609 create
> >
> #0x2000d8a535a/2d69440b170ada3dd3670b5f3e2ebe4d75eeb274ff2a8238a23162e95601
> > 2023-11-08T07:43:24.924423+0000 caller_uid=0, caller_gid=0{}) currently
> > submit entry:
> >
> journal_and_reply\n","stream":"stderr","time":"2023-11-08T07:43:30.133711838Z"}
> >
> > {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : 5 slow requests, 5 included below;
> oldest
> > blocked for \u003e 7.332759
> > secs\n","stream":"stderr","time":"2023-11-08T07:50:18.138195062Z"}
> >
> > {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 7.318159 seconds old,
> > received at 2023-11-08T07:50:10.819829+0000:
> > client_request(client.27041435:5802294 getattr pAsLsXsFs #0x7000182f285
> > 2023-11-08T07:50:10.815700+0000 caller_uid=0, caller_gid=0{}) currently
> > dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138221837Z"}
> >
> > {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 7.305700 seconds old,
> > received at 2023-11-08T07:50:10.832287+0000:
> > client_request(client.27041435:5802298 getattr pAsLsXsFs #0x7000182f285
> > 2023-11-08T07:50:10.827700+0000 caller_uid=0, caller_gid=0{}) currently
> > dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138225662Z"}
> >
> > {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 6.977220 seconds old,
> > received at 2023-11-08T07:50:11.160767+0000:
> > client_request(client.27041435:5802299 getattr pAsLsXsFs #0x7000182f285
> > 2023-11-08T07:50:11.155701+0000 caller_uid=0, caller_gid=0{}) currently
> > dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138228595Z"}
> >
> > {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 7.332758 seconds old,
> > received at 2023-11-08T07:50:10.805229+0000:
> > client_request(client.27041435:5802289 getattr pAsLsXsFs #0x7000182f285
> > 2023-11-08T07:50:10.799700+0000 caller_uid=0, caller_gid=0{}) currently
> > failed to rdlock,
> > waiting\n","stream":"stderr","time":"2023-11-08T07:50:18.138231253Z"}
> >
> > {"log":"debug 2023-11-08T07:50:18.134+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 7.327715 seconds old,
> > received at 2023-11-08T07:50:10.810272+0000:
> > client_request(client.27041435:5802290 getattr pAsLsXsFs #0x7000182f285
> > 2023-11-08T07:50:10.803700+0000 caller_uid=0, caller_gid=0{}) currently
> > dispatched\n","stream":"stderr","time":"2023-11-08T07:50:18.138234251Z"}
> >
> > {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : 5 slow requests, 5 included below;
> oldest
> > blocked for \u003e 13.332825
> > secs\n","stream":"stderr","time":"2023-11-08T07:50:24.138201305Z"}
> >
> > {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 13.318225 seconds old,
> > received at 2023-11-08T07:50:10.819829+0000:
> > client_request(client.27041435:5802294 getattr pAsLsXsFs #0x7000182f285
> > 2023-11-08T07:50:10.815700+0000 caller_uid=0, caller_gid=0{}) currently
> > dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.138221584Z"}
> >
> > {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 13.305766 seconds old,
> > received at 2023-11-08T07:50:10.832287+0000:
> > client_request(client.27041435:5802298 getattr pAsLsXsFs #0x7000182f285
> > 2023-11-08T07:50:10.827700+0000 caller_uid=0, caller_gid=0{}) currently
> > dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.13822536Z"}
> >
> > {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 12.977286 seconds old,
> > received at 2023-11-08T07:50:11.160767+0000:
> > client_request(client.27041435:5802299 getattr pAsLsXsFs #0x7000182f285
> > 2023-11-08T07:50:11.155701+0000 caller_uid=0, caller_gid=0{}) currently
> > dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.138242952Z"}
> >
> > {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 13.332825 seconds old,
> > received at 2023-11-08T07:50:10.805229+0000:
> > client_request(client.27041435:5802289 getattr pAsLsXsFs #0x7000182f285
> > 2023-11-08T07:50:10.799700+0000 caller_uid=0, caller_gid=0{}) currently
> > failed to rdlock,
> > waiting\n","stream":"stderr","time":"2023-11-08T07:50:24.138245661Z"}
> >
> > {"log":"debug 2023-11-08T07:50:24.134+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 13.327781 seconds old,
> > received at 2023-11-08T07:50:10.810272+0000:
> > client_request(client.27041435:5802290 getattr pAsLsXsFs #0x7000182f285
> > 2023-11-08T07:50:10.803700+0000 caller_uid=0, caller_gid=0{}) currently
> > dispatched\n","stream":"stderr","time":"2023-11-08T07:50:24.13824822Z"}
> >
> > {"log":"debug 2023-11-08T07:53:54.134+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : 1 slow requests, 1 included below;
> oldest
> > blocked for \u003e 5.641663
> > secs\n","stream":"stderr","time":"2023-11-08T07:53:54.140575892Z"}
> >
> > {"log":"debug 2023-11-08T07:53:54.134+0000 7f190b014700  0
> > log_channel(cluster) log [WRN] : slow request 5.641662 seconds old,
> > received at 2023-11-08T07:53:48.498707+0000:
> > client_request(client.12072730:30349041 rename
> >
> #0x70001831e1d/ec603fcfb5f191c00efd5f87d2fb96cbd04a40928b9d1f11744d7ef6d742
> > #0x7000182f285/e9e5e277 2023-11-08T07:53:48.497133+0000 caller_uid=0,
> > caller_gid=0{}) currently acquired
> > locks\n","stream":"stderr","time":"2023-11-08T07:53:54.140612721Z"}
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux