Re: HEALTH_ERR pgs are stuck inactive for more than 300 seconds

Traiano Welcome <traiano@xxxxxxxxx> · Thu, 23 Nov 2017 11:46:11 +0800

Hi David

Thanks for the response.

On Wed, Nov 22, 2017 at 12:29 AM, David Turner <drakonstein@xxxxxxxxx> wrote:
> All you have to do is figure out why osd.0, osd.1, and osd.2 are down and
> get the daemons running.  They have PGs assigned to them, but since they are
> not up and running those PGs are in a down state.  You can check the logs
> for them in /var/log/ceph/.  Did you have any errors when deploying these
> OSDs?

In the end I reinstalled, this time with only 3 osd nodes, and the cluster is healthy and functional.
I'll try to repeat this issue and root cause it when I have some spare time. Currently:

    cluster 0bb54801-846d-47ac-b14a-3828d830ff3a

     health HEALTH_OK

     monmap e1: 1 mons at {lol-045=172.16.1.20:6789/0}

            election epoch 3, quorum 0 lol-045

      fsmap e5: 1/1/1 up {0=lol-050=up:active}

     osdmap e26: 3 osds: 3 up, 3 in

            flags sortbitwise,require_jewel_osds

      pgmap v357: 192 pgs, 3 pools, 174 kB data, 21 objects

            104 MB used, 4590 GB / 4590 GB avail

                 192 active+clean

>
> On Tue, Nov 21, 2017 at 10:25 AM Traiano Welcome <traiano@xxxxxxxxx> wrote:
>>
>> Hi List
>>
>> I've just begun using ceph and installed a small cluster on ubuntu
>> 16.04 nodes using this the process in this guide:
>>
>>
>> https://www.howtoforge.com/tutorial/how-to-install-a-ceph-cluster-on-ubuntu-16-04/
>>
>> However, once the installation is complete, I see the newly installed
>> cluster is not healthy, and complaining about pgs stuck in inactive:
>>
>> ---
>> root@lol-045:~# ceph -s
>>
>>     cluster 220c92fb-2daa-4860-b511-d65ec88d6060
>>      health HEALTH_ERR
>>             448 pgs are stuck inactive for more than 300 seconds
>>             64 pgs degraded
>>             256 pgs stale
>>             64 pgs stuck degraded
>>             192 pgs stuck inactive
>>             256 pgs stuck stale
>>             256 pgs stuck unclean
>>             64 pgs stuck undersized
>>             64 pgs undersized
>>             noout flag(s) set
>>      monmap e1: 1 mons at {lol-045=17.16.2.20:6789/0}
>>             election epoch 4, quorum 0 lol-045
>>      osdmap e66: 7 osds: 4 up, 4 in; 55 remapped pgs
>>             flags noout,sortbitwise,require_jewel_osds
>>       pgmap v526: 256 pgs, 1 pools, 0 bytes data, 0 objects
>>             134 MB used, 6120 GB / 6121 GB avail
>>                  192 stale+creating
>>                   64 stale+active+undersized+degraded
>>
>> ---
>>
>> Why is this, and how can troubleshoot and I fix it? (I've googled
>> extensively but couldn't find a solution to this).
>>
>>
>> My osd tree looks like this:
>>
>> ----
>> ID WEIGHT   TYPE NAME             UP/DOWN REWEIGHT PRIMARY-AFFINITY
>> -1 10.46080 root default
>> -2  2.98880     host anx-dp02-046
>>  0  1.49440         osd.0            down        0          1.00000
>>  4  1.49440         osd.4              up  1.00000          1.00000
>> -3  2.98880     host anx-dp02-047
>>  1  1.49440         osd.1            down        0          1.00000
>>  5  1.49440         osd.5              up  1.00000          1.00000
>> -4  2.98880     host anx-dp02-048
>>  2  1.49440         osd.2            down        0          1.00000
>>  6  1.49440         osd.6              up  1.00000          1.00000
>> -5  1.49440     host anx-dp02-049
>>  7  1.49440         osd.7              up  1.00000          1.00000
>> ----
>>
>> Many thanks in advance,
>> Traiano
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com