Yes the pg should get remapped, but that is not always the case. For discussion on thi, check out the tracker below. Your particular circumstances may be a little different, but the idea is the same.
http://tracker.ceph.com/issues/3806
http://tracker.ceph.com/issues/3806
On Tue, May 3, 2016 at 9:16 AM, Gaurav Bafna <bafnag@xxxxxxxxx> wrote:
Thanks Tupper for replying.
Shouldn't the PG be remapped to other OSDs ?
Yes , removing OSD from the cluster is resulting into full recovery.
But that should not be needed , right ?
--
On Tue, May 3, 2016 at 6:31 PM, Tupper Cole <tcole@xxxxxxxxxx> wrote:
> The degraded pgs are mapped to the down OSD and have not mapped to a new
> OSD. Removing the OSD would likely result in a full recovery.
>
> As a note, having two monitors (or any even number of monitors) is not
> recommended. If either monitor goes down you will lose quorum. The
> recommended number of monitors for any cluster is at least three.
>
> On Tue, May 3, 2016 at 8:42 AM, Gaurav Bafna <bafnag@xxxxxxxxx> wrote:
>>
>> Hi Cephers,
>>
>> I am running a very small cluster of 3 storage and 2 monitor nodes.
>>
>> After I kill 1 osd-daemon, the cluster never recovers fully. 9 PGs
>> remain undersized for unknown reason.
>>
>> After I restart that 1 osd deamon, the cluster recovers in no time .
>>
>> Size of all pools are 3 and min_size is 2.
>>
>> Can anybody please help ?
>>
>> Output of "ceph -s"
>> cluster fac04d85-db48-4564-b821-deebda046261
>> health HEALTH_WARN
>> 9 pgs degraded
>> 9 pgs stuck degraded
>> 9 pgs stuck unclean
>> 9 pgs stuck undersized
>> 9 pgs undersized
>> recovery 3327/195138 objects degraded (1.705%)
>> pool .users pg_num 512 > pgp_num 8
>> monmap e2: 2 mons at
>> {dssmon2=10.140.13.13:6789/0,dssmonleader1=10.140.13.11:6789/0}
>> election epoch 1038, quorum 0,1 dssmonleader1,dssmon2
>> osdmap e857: 69 osds: 68 up, 68 in
>> pgmap v106601: 896 pgs, 9 pools, 435 MB data, 65047 objects
>> 279 GB used, 247 TB / 247 TB avail
>> 3327/195138 objects degraded (1.705%)
>> 887 active+clean
>> 9 active+undersized+degraded
>> client io 395 B/s rd, 0 B/s wr, 0 op/s
>>
>> ceph health detail output :
>>
>> HEALTH_WARN 9 pgs degraded; 9 pgs stuck degraded; 9 pgs stuck unclean;
>> 9 pgs stuck undersized; 9 pgs undersized; recovery 3327/195138 objects
>> degraded (1.705%); pool .users pg_num 512 > pgp_num 8
>> pg 7.a is stuck unclean for 322742.938959, current state
>> active+undersized+degraded, last acting [38,2]
>> pg 5.27 is stuck unclean for 322754.823455, current state
>> active+undersized+degraded, last acting [26,19]
>> pg 5.32 is stuck unclean for 322750.685684, current state
>> active+undersized+degraded, last acting [39,19]
>> pg 6.13 is stuck unclean for 322732.665345, current state
>> active+undersized+degraded, last acting [30,16]
>> pg 5.4e is stuck unclean for 331869.103538, current state
>> active+undersized+degraded, last acting [16,38]
>> pg 5.72 is stuck unclean for 331871.208948, current state
>> active+undersized+degraded, last acting [16,49]
>> pg 4.17 is stuck unclean for 331822.771240, current state
>> active+undersized+degraded, last acting [47,20]
>> pg 5.2c is stuck unclean for 323021.274535, current state
>> active+undersized+degraded, last acting [47,18]
>> pg 5.37 is stuck unclean for 323007.574395, current state
>> active+undersized+degraded, last acting [43,1]
>> pg 7.a is stuck undersized for 322487.284302, current state
>> active+undersized+degraded, last acting [38,2]
>> pg 5.27 is stuck undersized for 322487.287164, current state
>> active+undersized+degraded, last acting [26,19]
>> pg 5.32 is stuck undersized for 322487.285566, current state
>> active+undersized+degraded, last acting [39,19]
>> pg 6.13 is stuck undersized for 322487.287168, current state
>> active+undersized+degraded, last acting [30,16]
>> pg 5.4e is stuck undersized for 331351.476170, current state
>> active+undersized+degraded, last acting [16,38]
>> pg 5.72 is stuck undersized for 331351.475707, current state
>> active+undersized+degraded, last acting [16,49]
>> pg 4.17 is stuck undersized for 322487.280309, current state
>> active+undersized+degraded, last acting [47,20]
>> pg 5.2c is stuck undersized for 322487.286347, current state
>> active+undersized+degraded, last acting [47,18]
>> pg 5.37 is stuck undersized for 322487.280027, current state
>> active+undersized+degraded, last acting [43,1]
>> pg 7.a is stuck degraded for 322487.284340, current state
>> active+undersized+degraded, last acting [38,2]
>> pg 5.27 is stuck degraded for 322487.287202, current state
>> active+undersized+degraded, last acting [26,19]
>> pg 5.32 is stuck degraded for 322487.285604, current state
>> active+undersized+degraded, last acting [39,19]
>> pg 6.13 is stuck degraded for 322487.287207, current state
>> active+undersized+degraded, last acting [30,16]
>> pg 5.4e is stuck degraded for 331351.476209, current state
>> active+undersized+degraded, last acting [16,38]
>> pg 5.72 is stuck degraded for 331351.475746, current state
>> active+undersized+degraded, last acting [16,49]
>> pg 4.17 is stuck degraded for 322487.280348, current state
>> active+undersized+degraded, last acting [47,20]
>> pg 5.2c is stuck degraded for 322487.286386, current state
>> active+undersized+degraded, last acting [47,18]
>> pg 5.37 is stuck degraded for 322487.280066, current state
>> active+undersized+degraded, last acting [43,1]
>> pg 5.72 is active+undersized+degraded, acting [16,49]
>> pg 5.4e is active+undersized+degraded, acting [16,38]
>> pg 5.32 is active+undersized+degraded, acting [39,19]
>> pg 5.37 is active+undersized+degraded, acting [43,1]
>> pg 5.2c is active+undersized+degraded, acting [47,18]
>> pg 5.27 is active+undersized+degraded, acting [26,19]
>> pg 6.13 is active+undersized+degraded, acting [30,16]
>> pg 4.17 is active+undersized+degraded, acting [47,20]
>> pg 7.a is active+undersized+degraded, acting [38,2]
>> recovery 3327/195138 objects degraded (1.705%)
>> pool .users pg_num 512 > pgp_num 8
>>
>>
>> My crush map is default.
>>
>> Ceph.conf is :
>>
>> [osd]
>> osd mkfs type=xfs
>> osd recovery threads=2
>> osd disk thread ioprio class=idle
>> osd disk thread ioprio priority=7
>> osd journal=/var/lib/ceph/osd/ceph-$id/journal
>> filestore flusher=False
>> osd op num shards=3
>> debug osd=5
>> osd disk threads=2
>> osd data=""> >> osd op num threads per shard=5
>> osd op threads=4
>> keyring=/var/lib/ceph/osd/ceph-$id/keyring
>> osd journal size=4096
>>
>>
>> [global]
>> filestore max sync interval=10
>> auth cluster required=cephx
>> osd pool default min size=3
>> osd pool default size=3
>> public network=10.140.13.0/26
>> objecter inflight op_bytes=1073741824
>> auth service required=cephx
>> filestore min sync interval=1
>> fsid=fac04d85-db48-4564-b821-deebda046261
>> keyring=/etc/ceph/keyring
>> cluster network=10.140.13.0/26
>> auth client required=cephx
>> filestore xattr use omap=True
>> max open files=65536
>> objecter inflight ops=2048
>> osd pool default pg num=512
>> log to syslog = true
>> #err to syslog = true
>>
>>
>> --
>> Gaurav Bafna
>> 9540631400
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
>
> Thanks,
> Tupper Cole
> Senior Storage Consultant
> Global Storage Consulting, Red Hat
> tcole@xxxxxxxxxx
> phone: + 01 919-720-2612
Gaurav Bafna
9540631400
Thanks,
Tupper Cole
Senior Storage Consultant
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com