Re: Persistent problem with slow metadata

Eugen Block <eblock@xxxxxx> · Wed, 26 Aug 2020 09:29:22 +0000

Hi,

root@cephosd01:~# ceph config get mds.cephosd01 osd_op_queue
wpq
root@0cephosd01:~# ceph config get mds.cephosd01 osd_op_queue_cut_off
high

just to make sure, I referred to OSD not MDS settings, maybe check again?

I wouldn't focus too much on the MDS service, 64 GB RAM should be  
enough, but you could and should also check the actual RAM usage, of  
course. But in our case it's pretty clear that the hard disks are the  
bottleneck although we  have rocksDB on SSD for all OSDs. We seem to  
have a similar use case (we have nightly compile jobs running in  
cephfs) just with fewer clients. Our HDDs are saturated especially if  
we also run deep-scrubs during the night,  but the slow requests have  
been reduced since we changed the osd_op_queue settings for our OSDs.

Have you checked your disk utilization?

Regards,
Eugen

Zitat von Momčilo Medić <fedorauser@xxxxxxxxxxxxxxxxx>:

Hi friends,

I was re-reading documentation[1] when I noticed that 64GiB of RAM
should suffice even for a 1000 clients.
That really makes our issue that much more difficult to troubleshoot.

There are no assumptions that I can make that can encompass all of the
details I observe.
With no assumptions, there is nothing to test, nothing to start from -
feels like a dead end.

If anyone has any ideas that we could explore and look into, I'd
appreciate it.

We made little to no configuration changes, and we believe we followed
all the best practices.
Cluster is by no means under extreme stress, I would even argue that it
is a very dormant one.

For the time being, automated cleanup of oudated backups is disabled,
and is to be performed manually.

[1]
https://docs.ceph.com/docs/master/cephfs/add-remove-mds/#provisioning-hardware-for-an-mds

Kind regards,
Momo.

On Mon, 2020-08-24 at 16:39 +0200, Momčilo Medić wrote:
Hi Eugen,

On Mon, 2020-08-24 at 14:26 +0000, Eugen Block wrote:
> Hi,
>
> there have been several threads about this topic [1], most likely
> it's
> the metadata operation during the cleanup that saturates your
> disks.
>
> The recommended settings seem to be:
>
> [osd]
> osd op queue = wpq
> osd op queue cut off = high

Yeah, I've stumbled upon those settings recently.
However, it seems to be the default nowadays...

root@cephosd01:~# ceph config get mds.cephosd01 osd_op_queue
wpq
root@0cephosd01:~# ceph config get mds.cephosd01 osd_op_queue_cut_off
high
root@cephosd01:~#

I do appreciate your input anyway.

> This helped us a lot, the number of slow requests has decreased
> significantly.
>
> Regards,
> Eugen
>
>
> [1]
>  
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/MK672ROJSW3X56PC2KWOK2GX7ENQP2LS/#FF3FMP5EEMOBCXAYB4ZVFIAAN6U4IRS3
>
>
> Zitat von Momčilo Medić <fedorauser@xxxxxxxxxxxxxxxxx>:
>
> > Hi friends,
> >
> > Since deployment of our Ceph cluster we've been plagued by slow
> > metadata error.
> > Namely, cluster goes into HEALTH_WARN with a message similar to
> > this
> > one:
> >
> > 2 MDSs report slow metadata IOs
> > 1 MDSs report slow requests
> > 1 slow ops, oldest one blocked for 32 sec, daemons [osd.22,osd.4]
> > have
> > slow ops.
> >
> > Here is a brief overview of our setup:
> > - 7 OSD nodes with 6 OSD drives each
> > - three of those are also monitors, managers and MDS
> > - there is a single Ceph client (at the moment)
> > - there is only CephFS being used (at the moment)
> > - metadata for CephFS is on HDD (was on HDD, but we moved it as
> > suggested - no improvement)
> >
> > Our expectation is that this is not a RAM issue as we have 64GiB
> > of
> > memory and is never fully utilized.
> >
> > It might be a CPU problem, as issue happens mostly during high
> > loads
> > (load of ~12 on a 8-core Intel Xeon Bronze 3106).
> > However, the load is present on all OSD nodes, not just MDS ones.
> >
> > Cluster is used for (mostly nightly) backups and has no critical
> > performance requirement.
> > Interestingly, significant load across all nodes appears when
> > running
> > cleanup of outdated backups.
> > This boils down to mostly truncating files and some removal, but
> > it
> > is
> > usually small number of large files.
> >
> > Bellow you can find an example of "dump_ops_in_flight" output
> > during
> > the problem (which you may find useful - I couldn't make sense
> > out
> > of
> > it).
> >
> > Should we invest into more powerfull CPU hardware (or should we
> > move
> > MDS roles to more powerful nodes)?
> >
> > Please let me know if I can share any more information to help
> > resolve
> > this thing.
> >
> > Thanks in advance!
> >
> > Kind regards,
> > Momo.
> >
> > ===
> >
> > {
> >     "ops": [
> >         {
> >             "description":
> > "client_request(client.22661659:706483006
> > create #0x10000002742/a-random-file 2020-08-
> > 23T23:09:33.919740+0200
> > caller_uid=117, caller_gid=121{})",
> >             "initiated_at": "2020-08-23T23:09:33.926509+0200",
> >             "age": 30.193027896,
> >             "duration": 30.193083934000001,
> >             "type_data": {
> >                 "flag_point": "failed to authpin, subtree is
> > being
> > exported",
> >                 "reqid": "client.22661659:706483006",
> >                 "op_type": "client_request",
> >                 "client_info": {
> >                     "client": "client.22661659",
> >                     "tid": 706483006
> >                 },
> >                 "events": [
> >                     {
> >                         "time": "2020-08-
> > 23T23:09:33.926509+0200",
> >                         "event": "initiated"
> >                     },
> >                     {
> >                         "time": "2020-08-
> > 23T23:09:33.926510+0200",
> >                         "event": "throttled"
> >                     },
> >                     {
> >                         "time": "2020-08-
> > 23T23:09:33.926509+0200",
> >                         "event": "header_read"
> >                     },
> >                     {
> >                         "time": "2020-08-
> > 23T23:09:33.926516+0200",
> >                         "event": "all_read"
> >                     },
> >                     {
> >                         "time": "2020-08-
> > 23T23:09:33.926540+0200",
> >                         "event": "dispatched"
> >                     },
> >                     {
> >                         "time": "2020-08-
> > 23T23:09:33.926595+0200",
> >                         "event": "failed to authpin, subtree is
> > being
> > exported"
> >                     }
> >                 ]
> >             }
> >         }
> >     ],
> >     "num_ops": 1
> > }
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx