Re: mimic: MDS standby-replay causing blocked ops (MDS bug?)

Frank Schilder <frans@xxxxxx> · Sat, 18 May 2019 16:27:19 +0000

Hi Stefan,

thanks for being so thorough. I am aware of that. We are still in a pilot phase, which is also the reason that I'm still relatively relaxed about the observed issue. I guess you also noticed that our cluster is almost empty too.

I don't have a complete list of storage requirements yet and had to restrict allocation of PGs to a reasonable minimum as with mimic I cannot reduce the PG count of a pool. With the current values I see imbalance but still reasonable performance. Once I have more information about what pools I still need to create, I will aim for the 100 PGs per OSD. I actually plan to give the cephfs a bit higher share for performance reasons. Its on the list.

Thanks again and have a good weekend,

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Stefan Kooman <stefan@xxxxxx>
Sent: 18 May 2019 17:41
To: Frank Schilder
Cc: Yan, Zheng; ceph-users@xxxxxxxxxxxxxx
Subject: Re:  mimic: MDS standby-replay causing blocked ops (MDS bug?)

Quoting Frank Schilder (frans@xxxxxx):
>
> [root@ceph-01 ~]# ceph status # before the MDS failed over
>   cluster:
>     id: ###
>     health: HEALTH_WARN
>             1 MDSs report slow requests
>
>   services:
>     mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
>     mgr: ceph-01(active), standbys: ceph-02, ceph-03
>     mds: con-fs-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby
>     osd: 192 osds: 192 up, 192 in
>
>   data:
>     pools:   5 pools, 750 pgs
>     objects: 6.35 M objects, 5.2 TiB
>     usage:   5.1 TiB used, 1.3 PiB / 1.3 PiB avail
>     pgs:     750 active+clean

How many pools do you plan to use? You have 5 pools and only 750 PGs
total? What hardware do you have for OSDs? If cephfs is your biggest
user I would at up to 6150! PGs to your pool(s). Having around ~ 100 PGs
per OSD is healthy. The cluster will also be able to balance way better.
Math: (100 (PG/OSD) * 192 (# OSDs)) - 750)) / 3 = 6150 for 3 replica
pools. You might have a lot of contention going on on your OSDs, they
are probably under performing.

Gr. Stefan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com