Re: help

Caspar Smit <casparsmit@xxxxxxxxxxx> · Thu, 29 Aug 2019 15:21:19 +0200

Hi,
This output doesn't show anything 'wrong' with the cluster. It's just still recovering (backfilling) from what seems like one of your OSD's crashed and restarted.The backfilling is taking a while because max_backfills = 1 and you only have 3 OSD's total so the backfilling per PG has to have for the previous PG backfill to complete.

The real concern is not the current state of the cluster but how you end up in this state. Probably the script overloaded the OSD's.

I also advise you to add a monitor to your other 2 nodes as well (running 3 mons total). Running 1 mon is not advised.

Furthermore, just let the backfilling complete and HEALTH_OK will return eventually if nothing goes wrong in between.

Met vriendelijke groet,

Caspar Smit
Systemengineer
SuperNAS
Dorsvlegelstraat 13
1445 PA Purmerend

t: (+31) 299 410 414
e: casparsmit@xxxxxxxxxxx
w: www.supernas.eu

Op do 29 aug. 2019 om 14:35 schreef Amudhan P <amudhan83@xxxxxxxxx>:
output from "ceph -s " 
  cluster:
    id:     7c138e13-7b98-4309-b591-d4091a1742b4
    health: HEALTH_WARN
            Degraded data redundancy: 1141587/7723191 objects degraded (14.781%), 15 pgs degraded, 16 pgs undersized

  services:
    mon: 1 daemons, quorum mon01
    mgr: mon01(active)
    mds: cephfs-tst-1/1/1 up  {0=mon01=up:active}
    osd: 3 osds: 3 up, 3 in; 16 remapped pgs

  data:
    pools:   2 pools, 64 pgs
    objects: 2.57 M objects, 59 GiB
    usage:   190 GiB used, 5.3 TiB / 5.5 TiB avail
    pgs:     1141587/7723191 objects degraded (14.781%)
             48 active+clean
             15 active+undersized+degraded+remapped+backfill_wait
             1  active+undersized+remapped+backfilling

  io:
    recovery: 0 B/s, 10 objects/s

output from  "ceph osd tree"
ID CLASS WEIGHT  TYPE NAME           STATUS REWEIGHT PRI-AFF
-1       5.45819 root default
-3       1.81940     host test-node1
 0   hdd 1.81940         osd.0           up  1.00000 1.00000
-5       1.81940     host test-node2
 1   hdd 1.81940         osd.1           up  1.00000 1.00000
-7       1.81940     host test-node3
 2   hdd 1.81940         osd.2           up  1.00000 1.00000

failure domain not configured yet, setup is 3 OSD node each with a single disk, 1 node with mon&mds&mgr running.
the cluster was healthy until I run a script for creating multiple folders.

regards
Amudhan

On Thu, Aug 29, 2019 at 5:33 PM Heðin Ejdesgaard Møller <hej@xxxxxxxxx> wrote:
In adition to ceph -s, could you provide the output of

ceph osd tree 

and specify what your failure domain is ?

/Heðin

On hós, 2019-08-29 at 13:55 +0200, Janne Johansson wrote:

> 

> 

> Den tors 29 aug. 2019 kl 13:50 skrev Amudhan P <amudhan83@xxxxxxxxx>:

> > Hi,

> > 

> > I am using ceph version 13.2.6 (mimic) on test setup trying with

> > cephfs.

> > my ceph health status showing warning .

> > 

> > "ceph health"

> > HEALTH_WARN Degraded data redundancy: 1197023/7723191 objects

> > degraded (15.499%)

> > 

> > "ceph health detail"

> > HEALTH_WARN Degraded data redundancy: 1197128/7723191 objects

> > degraded (15.500%)

> > PG_DEGRADED Degraded data redundancy: 1197128/7723191 objects

> > degraded (15.500%)

> >     pg 2.0 is stuck undersized for 1076.454929, current state

> > active+undersized+

> >     pg 2.2 is stuck undersized for 1076.456639, current state

> > active+undersized+

> > 

> 

> How does "ceph -s" look?

> It should have more info on what else is wrong.

>  

> -- 

> May the most significant bit of your life be positive.

> _______________________________________________

> ceph-users mailing list -- ceph-users@xxxxxxx

> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________

ceph-users mailing list -- ceph-users@xxxxxxx

To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx