After leaving 12 hours time now cluster status is healthy, but why did it take such a long time for backfill?
How do I fine-tune? if in case of same kind error pop-out again.
On Thu, Aug 29, 2019 at 6:52 PM Caspar Smit <casparsmit@xxxxxxxxxxx> wrote:
Hi,This output doesn't show anything 'wrong' with the cluster. It's just still recovering (backfilling) from what seems like one of your OSD's crashed and restarted.The backfilling is taking a while because max_backfills = 1 and you only have 3 OSD's total so the backfilling per PG has to have for the previous PG backfill to complete.The real concern is not the current state of the cluster but how you end up in this state. Probably the script overloaded the OSD's.I also advise you to add a monitor to your other 2 nodes as well (running 3 mons total). Running 1 mon is not advised.Furthermore, just let the backfilling complete and HEALTH_OK will return eventually if nothing goes wrong in between.Met vriendelijke groet,
Caspar Smit
Systemengineer
SuperNAS
Dorsvlegelstraat 13
1445 PA Purmerend
t: (+31) 299 410 414
e: casparsmit@xxxxxxxxxxx
w: www.supernas.eu_______________________________________________Op do 29 aug. 2019 om 14:35 schreef Amudhan P <amudhan83@xxxxxxxxx>:output from "ceph -s "cluster:
id: 7c138e13-7b98-4309-b591-d4091a1742b4
health: HEALTH_WARN
Degraded data redundancy: 1141587/7723191 objects degraded (14.781%), 15 pgs degraded, 16 pgs undersized
services:
mon: 1 daemons, quorum mon01
mgr: mon01(active)
mds: cephfs-tst-1/1/1 up {0=mon01=up:active}
osd: 3 osds: 3 up, 3 in; 16 remapped pgs
data:
pools: 2 pools, 64 pgs
objects: 2.57 M objects, 59 GiB
usage: 190 GiB used, 5.3 TiB / 5.5 TiB avail
pgs: 1141587/7723191 objects degraded (14.781%)
48 active+clean
15 active+undersized+degraded+remapped+backfill_wait
1 active+undersized+remapped+backfilling
io:
recovery: 0 B/s, 10 objects/soutput from "ceph osd tree"
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 5.45819 root default
-3 1.81940 host test-node1
0 hdd 1.81940 osd.0 up 1.00000 1.00000
-5 1.81940 host test-node2
1 hdd 1.81940 osd.1 up 1.00000 1.00000
-7 1.81940 host test-node3
2 hdd 1.81940 osd.2 up 1.00000 1.00000failure domain not configured yet, setup is 3 OSD node each with a single disk, 1 node with mon&mds&mgr running.the cluster was healthy until I run a script for creating multiple folders.regardsAmudhan_______________________________________________On Thu, Aug 29, 2019 at 5:33 PM Heðin Ejdesgaard Møller <hej@xxxxxxxxx> wrote:In adition to ceph -s, could you provide the output of
ceph osd tree
and specify what your failure domain is ?
/Heðin
On hós, 2019-08-29 at 13:55 +0200, Janne Johansson wrote:
>
>
> Den tors 29 aug. 2019 kl 13:50 skrev Amudhan P <amudhan83@xxxxxxxxx>:
> > Hi,
> >
> > I am using ceph version 13.2.6 (mimic) on test setup trying with
> > cephfs.
> > my ceph health status showing warning .
> >
> > "ceph health"
> > HEALTH_WARN Degraded data redundancy: 1197023/7723191 objects
> > degraded (15.499%)
> >
> > "ceph health detail"
> > HEALTH_WARN Degraded data redundancy: 1197128/7723191 objects
> > degraded (15.500%)
> > PG_DEGRADED Degraded data redundancy: 1197128/7723191 objects
> > degraded (15.500%)
> > pg 2.0 is stuck undersized for 1076.454929, current state
> > active+undersized+
> > pg 2.2 is stuck undersized for 1076.456639, current state
> > active+undersized+
> >
>
> How does "ceph -s" look?
> It should have more info on what else is wrong.
>
> --
> May the most significant bit of your life be positive.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx