I would recommend sucking with the weight of 9.09560 for the osds as that is the TiB size of the osds that ceph details to as supposed to the TB size of the osds. New osds will have their weights based on the TiB value. What is your `ceph osd df` output just to see what things look like? Hopefully very healthy.
On Tue, Jul 18, 2017, 11:16 PM Roger Brown <rogerpbrown@xxxxxxxxx> wrote:
Resolution confirmed!$ ceph -scluster:id: eea7b78c-b138-40fc-9f3e-3d77afb770f0health: HEALTH_OKservices:mon: 3 daemons, quorum desktop,mon1,nuc2mgr: desktop(active), standbys: mon1osd: 3 osds: 3 up, 3 indata:pools: 19 pools, 372 pgsobjects: 54243 objects, 71722 MBusage: 129 GB used, 27812 GB / 27941 GB availpgs: 372 active+clean_______________________________________________On Tue, Jul 18, 2017 at 8:47 PM Roger Brown <rogerpbrown@xxxxxxxxx> wrote:Ah, that was the problem!So I edited the crushmap (http://docs.ceph.com/docs/master/rados/operations/crush-map/) with a weight of 10.000 for all three 10TB OSD hosts. The instant result was all those pgs with only 2 OSDs were replaced with 3 OSDs while the cluster started rebalancing the data. I trust it will complete with time and I'll be good to go!New OSD tree:$ ceph osd treeID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY-1 30.00000 root default-5 10.00000 host osd13 10.00000 osd.3 up 1.00000 1.00000-6 10.00000 host osd24 10.00000 osd.4 up 1.00000 1.00000-2 10.00000 host osd30 10.00000 osd.0 up 1.00000 1.00000Kudos to Brad Hubbard for steering me in the right direction!On Tue, Jul 18, 2017 at 8:27 PM Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:ID WEIGHT TYPE NAME
-5 1.00000 host osd1
-6 9.09560 host osd2
-2 9.09560 host osd3
The weight allocated to host "osd1" should presumably be the same as
the other two hosts?
Dump your crushmap and take a good look at it, specifically the
weighting of "osd1".
On Wed, Jul 19, 2017 at 11:48 AM, Roger Brown <rogerpbrown@xxxxxxxxx> wrote:
> I also tried ceph pg query, but it gave no helpful recommendations for any
> of the stuck pgs.
>
>
> On Tue, Jul 18, 2017 at 7:45 PM Roger Brown <rogerpbrown@xxxxxxxxx> wrote:
>>
>> Problem:
>> I have some pgs with only two OSDs instead of 3 like all the other pgs
>> have. This is causing active+undersized+degraded status.
>>
>> History:
>> 1. I started with 3 hosts, each with 1 OSD process (min_size 2) for a 1TB
>> drive.
>> 2. Added 3 more hosts, each with 1 OSD process for a 10TB drive.
>> 3. Removed the original 3 1TB OSD hosts from the osd tree (reweight 0,
>> wait, stop, remove, del osd&host, rm).
>> 4. The last OSD to be removed would never return to active+clean after
>> reweight 0. It returned undersized instead, but I went on with removal
>> anyway, leaving me stuck with 5 undersized pgs.
>>
>> Things tried that didn't help:
>> * give it time to go away on its own
>> * Replace replicated default.rgw.buckets.data pool with erasure-code 2+1
>> version.
>> * ceph osd lost 1 (and 2)
>> * ceph pg repair (pgs from dump_stuck)
>> * googled 'ceph pg undersized' and similar searches for help.
>>
>> Current status:
>> $ ceph osd tree
>> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
>> -1 19.19119 root default
>> -5 1.00000 host osd1
>> 3 1.00000 osd.3 up 1.00000 1.00000
>> -6 9.09560 host osd2
>> 4 9.09560 osd.4 up 1.00000 1.00000
>> -2 9.09560 host osd3
>> 0 9.09560 osd.0 up 1.00000 1.00000
>> $ ceph pg dump_stuck
>> ok
>> PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY
>> 88.3 active+undersized+degraded [4,0] 4 [4,0] 4
>> 97.3 active+undersized+degraded [4,0] 4 [4,0] 4
>> 85.6 active+undersized+degraded [4,0] 4 [4,0] 4
>> 87.5 active+undersized+degraded [0,4] 0 [0,4] 0
>> 70.0 active+undersized+degraded [0,4] 0 [0,4] 0
>> $ ceph osd pool ls detail
>> pool 70 'default.rgw.rgw.gc' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 4 pgp_num 4 last_change 548 flags hashpspool
>> stripe_width 0
>> pool 83 'default.rgw.buckets.non-ec' replicated size 3 min_size 2
>> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 576 owner
>> 18446744073709551615 flags hashpspool stripe_width 0
>> pool 85 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 652 flags hashpspool
>> stripe_width 0
>> pool 86 'default.rgw.data.root' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 653 flags hashpspool
>> stripe_width 0
>> pool 87 'default.rgw.gc' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 654 flags hashpspool
>> stripe_width 0
>> pool 88 'default.rgw.lc' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 600 flags hashpspool
>> stripe_width 0
>> pool 89 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 655 flags hashpspool
>> stripe_width 0
>> pool 90 'default.rgw.users.uid' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 662 flags hashpspool
>> stripe_width 0
>> pool 91 'default.rgw.users.email' replicated size 3 min_size 2 crush_rule
>> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 660 flags hashpspool
>> stripe_width 0
>> pool 92 'default.rgw.users.keys' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 659 flags hashpspool
>> stripe_width 0
>> pool 93 'default.rgw.buckets.index' replicated size 3 min_size 2
>> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 664 flags
>> hashpspool stripe_width 0
>> pool 95 'default.rgw.intent-log' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 4 pgp_num 4 last_change 656 flags hashpspool
>> stripe_width 0
>> pool 96 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 4 pgp_num 4 last_change 657 flags hashpspool
>> stripe_width 0
>> pool 97 'default.rgw.usage' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 4 pgp_num 4 last_change 658 flags hashpspool
>> stripe_width 0
>> pool 98 'default.rgw.users.swift' replicated size 3 min_size 2 crush_rule
>> 0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 661 flags hashpspool
>> stripe_width 0
>> pool 99 'default.rgw.buckets.extra' replicated size 3 min_size 2
>> crush_rule 0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 663 flags
>> hashpspool stripe_width 0
>> pool 100 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 4 pgp_num 4 last_change 651 flags hashpspool stripe_width 0
>> pool 101 'default.rgw.reshard' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 1529 owner
>> 18446744073709551615 flags hashpspool stripe_width 0
>> pool 103 'default.rgw.buckets.data' erasure size 3 min_size 2 crush_rule 1
>> object_hash rjenkins pg_num 256 pgp_num 256 last_change 2106 flags
>> hashpspool stripe_width 8192
>>
>> I'll keep on googling, but I'm open to advice!
>>
>> Thank you,
>>
>> Roger
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
--
Cheers,
Brad
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com