Hi Eugen, On Mon, 2020-08-24 at 14:26 +0000, Eugen Block wrote: > Hi, > > there have been several threads about this topic [1], most likely > it's > the metadata operation during the cleanup that saturates your disks. > > The recommended settings seem to be: > > [osd] > osd op queue = wpq > osd op queue cut off = high Yeah, I've stumbled upon those settings recently. However, it seems to be the default nowadays... root@cephosd01:~# ceph config get mds.cephosd01 osd_op_queue wpq root@0cephosd01:~# ceph config get mds.cephosd01 osd_op_queue_cut_off high root@cephosd01:~# I do appreciate your input anyway. > This helped us a lot, the number of slow requests has decreased > significantly. > > Regards, > Eugen > > > [1] > https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/MK672ROJSW3X56PC2KWOK2GX7ENQP2LS/#FF3FMP5EEMOBCXAYB4ZVFIAAN6U4IRS3 > > > Zitat von Momčilo Medić <fedorauser@xxxxxxxxxxxxxxxxx>: > > > Hi friends, > > > > Since deployment of our Ceph cluster we've been plagued by slow > > metadata error. > > Namely, cluster goes into HEALTH_WARN with a message similar to > > this > > one: > > > > 2 MDSs report slow metadata IOs > > 1 MDSs report slow requests > > 1 slow ops, oldest one blocked for 32 sec, daemons [osd.22,osd.4] > > have > > slow ops. > > > > Here is a brief overview of our setup: > > - 7 OSD nodes with 6 OSD drives each > > - three of those are also monitors, managers and MDS > > - there is a single Ceph client (at the moment) > > - there is only CephFS being used (at the moment) > > - metadata for CephFS is on HDD (was on HDD, but we moved it as > > suggested - no improvement) > > > > Our expectation is that this is not a RAM issue as we have 64GiB of > > memory and is never fully utilized. > > > > It might be a CPU problem, as issue happens mostly during high > > loads > > (load of ~12 on a 8-core Intel Xeon Bronze 3106). > > However, the load is present on all OSD nodes, not just MDS ones. > > > > Cluster is used for (mostly nightly) backups and has no critical > > performance requirement. > > Interestingly, significant load across all nodes appears when > > running > > cleanup of outdated backups. > > This boils down to mostly truncating files and some removal, but it > > is > > usually small number of large files. > > > > Bellow you can find an example of "dump_ops_in_flight" output > > during > > the problem (which you may find useful - I couldn't make sense out > > of > > it). > > > > Should we invest into more powerfull CPU hardware (or should we > > move > > MDS roles to more powerful nodes)? > > > > Please let me know if I can share any more information to help > > resolve > > this thing. > > > > Thanks in advance! > > > > Kind regards, > > Momo. > > > > === > > > > { > > "ops": [ > > { > > "description": > > "client_request(client.22661659:706483006 > > create #0x10000002742/a-random-file 2020-08-23T23:09:33.919740+0200 > > caller_uid=117, caller_gid=121{})", > > "initiated_at": "2020-08-23T23:09:33.926509+0200", > > "age": 30.193027896, > > "duration": 30.193083934000001, > > "type_data": { > > "flag_point": "failed to authpin, subtree is being > > exported", > > "reqid": "client.22661659:706483006", > > "op_type": "client_request", > > "client_info": { > > "client": "client.22661659", > > "tid": 706483006 > > }, > > "events": [ > > { > > "time": "2020-08-23T23:09:33.926509+0200", > > "event": "initiated" > > }, > > { > > "time": "2020-08-23T23:09:33.926510+0200", > > "event": "throttled" > > }, > > { > > "time": "2020-08-23T23:09:33.926509+0200", > > "event": "header_read" > > }, > > { > > "time": "2020-08-23T23:09:33.926516+0200", > > "event": "all_read" > > }, > > { > > "time": "2020-08-23T23:09:33.926540+0200", > > "event": "dispatched" > > }, > > { > > "time": "2020-08-23T23:09:33.926595+0200", > > "event": "failed to authpin, subtree is > > being > > exported" > > } > > ] > > } > > } > > ], > > "num_ops": 1 > > } > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx