Hi Eugen,
On Mon, 2020-08-24 at 14:26 +0000, Eugen Block wrote:
> Hi,
>
> there have been several threads about this topic [1], most likely
> it's
> the metadata operation during the cleanup that saturates your
> disks.
>
> The recommended settings seem to be:
>
> [osd]
> osd op queue = wpq
> osd op queue cut off = high
Yeah, I've stumbled upon those settings recently.
However, it seems to be the default nowadays...
root@cephosd01:~# ceph config get mds.cephosd01 osd_op_queue
wpq
root@0cephosd01:~# ceph config get mds.cephosd01 osd_op_queue_cut_off
high
root@cephosd01:~#
I do appreciate your input anyway.
> This helped us a lot, the number of slow requests has decreased
> significantly.
>
> Regards,
> Eugen
>
>
> [1]
>
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/MK672ROJSW3X56PC2KWOK2GX7ENQP2LS/#FF3FMP5EEMOBCXAYB4ZVFIAAN6U4IRS3
>
>
> Zitat von Momčilo Medić <fedorauser@xxxxxxxxxxxxxxxxx>:
>
> > Hi friends,
> >
> > Since deployment of our Ceph cluster we've been plagued by slow
> > metadata error.
> > Namely, cluster goes into HEALTH_WARN with a message similar to
> > this
> > one:
> >
> > 2 MDSs report slow metadata IOs
> > 1 MDSs report slow requests
> > 1 slow ops, oldest one blocked for 32 sec, daemons [osd.22,osd.4]
> > have
> > slow ops.
> >
> > Here is a brief overview of our setup:
> > - 7 OSD nodes with 6 OSD drives each
> > - three of those are also monitors, managers and MDS
> > - there is a single Ceph client (at the moment)
> > - there is only CephFS being used (at the moment)
> > - metadata for CephFS is on HDD (was on HDD, but we moved it as
> > suggested - no improvement)
> >
> > Our expectation is that this is not a RAM issue as we have 64GiB
> > of
> > memory and is never fully utilized.
> >
> > It might be a CPU problem, as issue happens mostly during high
> > loads
> > (load of ~12 on a 8-core Intel Xeon Bronze 3106).
> > However, the load is present on all OSD nodes, not just MDS ones.
> >
> > Cluster is used for (mostly nightly) backups and has no critical
> > performance requirement.
> > Interestingly, significant load across all nodes appears when
> > running
> > cleanup of outdated backups.
> > This boils down to mostly truncating files and some removal, but
> > it
> > is
> > usually small number of large files.
> >
> > Bellow you can find an example of "dump_ops_in_flight" output
> > during
> > the problem (which you may find useful - I couldn't make sense
> > out
> > of
> > it).
> >
> > Should we invest into more powerfull CPU hardware (or should we
> > move
> > MDS roles to more powerful nodes)?
> >
> > Please let me know if I can share any more information to help
> > resolve
> > this thing.
> >
> > Thanks in advance!
> >
> > Kind regards,
> > Momo.
> >
> > ===
> >
> > {
> > "ops": [
> > {
> > "description":
> > "client_request(client.22661659:706483006
> > create #0x10000002742/a-random-file 2020-08-
> > 23T23:09:33.919740+0200
> > caller_uid=117, caller_gid=121{})",
> > "initiated_at": "2020-08-23T23:09:33.926509+0200",
> > "age": 30.193027896,
> > "duration": 30.193083934000001,
> > "type_data": {
> > "flag_point": "failed to authpin, subtree is
> > being
> > exported",
> > "reqid": "client.22661659:706483006",
> > "op_type": "client_request",
> > "client_info": {
> > "client": "client.22661659",
> > "tid": 706483006
> > },
> > "events": [
> > {
> > "time": "2020-08-
> > 23T23:09:33.926509+0200",
> > "event": "initiated"
> > },
> > {
> > "time": "2020-08-
> > 23T23:09:33.926510+0200",
> > "event": "throttled"
> > },
> > {
> > "time": "2020-08-
> > 23T23:09:33.926509+0200",
> > "event": "header_read"
> > },
> > {
> > "time": "2020-08-
> > 23T23:09:33.926516+0200",
> > "event": "all_read"
> > },
> > {
> > "time": "2020-08-
> > 23T23:09:33.926540+0200",
> > "event": "dispatched"
> > },
> > {
> > "time": "2020-08-
> > 23T23:09:33.926595+0200",
> > "event": "failed to authpin, subtree is
> > being
> > exported"
> > }
> > ]
> > }
> > }
> > ],
> > "num_ops": 1
> > }
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx