Re: Slow requests blocked. No rebalancing

Darius Kasparavičius <daznis@xxxxxxxxx> · Thu, 20 Sep 2018 14:43:28 +0300

Hello,

2018-09-20 09:32:58.851160 mon.dri-ceph01 [WRN] Health check update:
249 PGs pending on creation (PENDING_CREATING_PGS)

This error might indicate that you are hitting a PG limit per osd.
Here some information on it
https://ceph.com/community/new-luminous-pg-overdose-protection/ . You
might need to increase mon_max_pg_per_osd for OSD to start balancing
out.

On Thu, Sep 20, 2018 at 2:25 PM Jaime Ibar <jaime@xxxxxxxxxxxx> wrote:
>
> Hi all,
>
> we recently upgrade from Jewel 10.2.10 to Luminous 12.2.7, now we're trying to migrate the
>
> osd's to Bluestore following this document[0], however when I mark the osd as out,
>
> I'm getting warnings similar to these ones
>
> 2018-09-20 09:32:46.079630 mon.dri-ceph01 [WRN] Health check failed: 2 slow requests are blocked > 32 sec. Implicated osds 16,28 (REQUEST_SLOW)
> 2018-09-20 09:32:52.841123 mon.dri-ceph01 [WRN] Health check update: 7 slow requests are blocked > 32 sec. Implicated osds 10,16,28,32,59 (REQUEST_SLOW)
> 2018-09-20 09:32:57.842230 mon.dri-ceph01 [WRN] Health check update: 15 slow requests are blocked > 32 sec. Implicated osds 10,16,28,31,32,59,78,80 (REQUEST_SLOW)
>
> 2018-09-20 09:32:58.851142 mon.dri-ceph01 [WRN] Health check update: 244944/40100780 objects misplaced (0.611%) (OBJECT_MISPLACED)
> 2018-09-20 09:32:58.851160 mon.dri-ceph01 [WRN] Health check update: 249 PGs pending on creation (PENDING_CREATING_PGS)
>
> which prevent ceph start rebalancing and the vm's running on ceph start hanging and we have to mark the osd back in.
>
> I tried to reweight the osd to 0.90 in order to minimize the impact on the cluster but the warnings are the same.
>
> I tried to increased these settings to
>
> mds cache memory limit = 2147483648
> rocksdb cache size = 2147483648
>
> but with no luck, same warnings.
>
> We also have cephfs for storing files from different projects(no directory fragmentation enabled).
>
> The problem here is that if one osd dies, all the services will be blocked as ceph won't be able to
>
> start rebalancing.
>
> The cluster is
>
> - 3 mons
>
> - 3 mds(running on the same hosts as the mons). 2 mds active and 1 standby
>
> - 3 mgr(running on the same hosts as the mons)
>
> - 6 servers, 12 osd's each.
>
> - 1GB private network
>
>
> Does anyone know how to fix or where the problem could be?
>
> Thanks a lot in advance.
>
> Jaime
>
>
> [0] http://docs.ceph.com/docs/luminous/rados/operations/bluestore-migration/
>
> --
>
> Jaime Ibar
> High Performance & Research Computing, IS Services
> Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
> http://www.tchpc.tcd.ie/ | jaime@xxxxxxxxxxxx
> Tel: +353-1-896-3725
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com