Re: pages stuck unclean (but remapped)

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 25 Feb 2014 10:03:07 -0800



With the reweight-by-utilization applied, CRUSH is failing to generate
mappings of enough OSDs, so the system is falling back to keeping
around copies that already exist, even though they aren't located on
the correct CRUSH-mapped OSDs (since there aren't enough OSDs).
Are your OSDs correctly weighted in CRUSH by their size? If not, you
want to apply that there and return all of the monitor override
weights to 1.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Feb 25, 2014 at 9:19 AM, Gautam Saxena <gsaxena@xxxxxxxxxxx> wrote:
> So the "backfill_tooful" was an old state; it disappeared after I
> reweighted. Yesterday, I even set up the Ceph system's tunables to optimal,
> added one more osd, let it rebalance, and then after rebalancing, I ran a
> "ceph osd reweight-by-utilization 105". After several hours, though, CEPH
> stabilized (that is no more recovery), but the final state is worse than
> before.  So here are my questions (I also included the results of "ceph -s"
> right after these questions):
>
> 1) why are 153 pages in "active+remapped" but not going anywhere? Shouldn't
> they be more like "active+remapped+wait_backfill" instead?
> 2) Why are 10 pages "active+remapped+backfilling" but there is no actual
> activity occurring in CEPH? Shouldn't it instead say
> "active+remapped+wait_backfill+backfill_toofull"
> 3) Why is there a backfill_tooful when all my osds are well under 95% full
> -- in fact, they are all under 81% full (as determined by "df -h" command?)
> (One theory I have is that the "too_full" percentage is based NOT on the
> actual physical space on the OSD, but on the *reweighted* physical space. Is
> this theory accurate?
> 4) When I did a "ceph pg dump", I saw that all 153 pages that are in
> active+remapped have only 1 OSD in the "up" state but 2 OSDs in the "acting"
> state. I'm confused as to the difference between "up" and "acting" -- does
> this scenario mean that if I lose 1 OSD that in the "up" state, I lose data
> for that page? Or does the "acting" mean that the page data is still on 2
> OSDs, so I can afford to lose 1 OSD.
>
> --> ceph -s produces:
>
> ================
> [root@ia2 ceph]# ceph -s
>     cluster 14f78538-6085-43f9-ac80-e886ca4de119
>      health HEALTH_WARN 10 pgs backfill; 5 pgs backfill_toofull; 10 pgs
> backfilling; 173 pgs stuck unclean; recovery 44940/5858368 objects degraded
> (0.767%)
>      monmap e9: 3 mons at
> {ia1=192.168.1.11:6789/0,ia2=192.168.1.12:6789/0,ia3=192.168.1.13:6789/0},
> election epoch 500, quorum 0,1,2 ia1,ia2,ia3
>      osdmap e9700: 23 osds: 23 up, 23 in
>       pgmap v2003396: 1500 pgs, 1 pools, 11225 GB data, 2841 kobjects
>             22452 GB used, 23014 GB / 45467 GB avail
>             44940/5858368 objects degraded (0.767%)
>                 1327 active+clean
>                    5 active+remapped+wait_backfill
>                    5 active+remapped+wait_backfill+backfill_toofull
>                  153 active+remapped
>                   10 active+remapped+backfilling
>   client io 4369 kB/s rd, 64377 B/s wr, 26 op/s
> ==========
>
>
>
> On Sun, Feb 23, 2014 at 8:09 PM, Gautam Saxena <gsaxena@xxxxxxxxxxx> wrote:
>>
>> I have 19 pages that are stuck unclean (see below result of ceph -s). This
>> occurred after I executed a "ceph osd reweight-by-utilization 108" to
>> resolve problems with "backfill_too_full" messages, which I believe occurred
>> because my OSDs sizes vary significantly in size (from a low of 600GB to a
>> high of 3 TB). How can I get ceph to get these pages out of stuck-unclean?
>> (And why is this occurring anyways?) My best guess of how to fix (though I
>> don't know why) is that I need to run:
>>
>> ceph osd crush tunables optimal.
>>
>> However, my kernel version (on a fully up-to-date Centos 6.5) is 2.6.32,
>> which is well below the minimum required version of 3.6 that's stated in the
>> documentation (http://ceph.com/docs/master/rados/operations/crush-map/) --
>> so if I must run "ceph osd crush tunables optimal" to fix this problem, I
>> presume I must upgrade my kernel first, right?...Any thoughts or am I
>> chasing the wrong solution -- I want to avoid kernel upgrade unless it's
>> needed.)
>>
>> =====================
>>
>> [root@ia2 ceph4]# ceph -s
>>     cluster 14f78538-6085-43f9-ac80-e886ca4de119
>>      health HEALTH_WARN 19 pgs backfilling; 19 pgs stuck unclean; recovery
>> 42959/5511127 objects degraded (0.779%)
>>      monmap e9: 3 mons at
>> {ia1=192.168.1.11:6789/0,ia2=192.168.1.12:6789/0,ia3=192.168.1.13:6789/0},
>> election epoch 496, quorum 0,1,2 ia1,ia2,ia3
>>      osdmap e7931: 23 osds: 23 up, 23 in
>>       pgmap v1904820: 1500 pgs, 1 pools, 10531 GB data, 2670 kobjects
>>             18708 GB used, 26758 GB / 45467 GB avail
>>             42959/5511127 objects degraded (0.779%)
>>                 1481 active+clean
>>                   19 active+remapped+backfilling
>>   client io 1457 B/s wr, 0 op/s
>>
>> [root@ia2 ceph4]# ceph -v
>> ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
>>
>> [root@ia2 ceph4]# uname -r
>> 2.6.32-431.3.1.el6.x86_64
>>
>> ====
>
>
>
>
> --
> Gautam Saxena
> President & CEO
> Integrated Analysis Inc.
>
> Making Sense of Data.(tm)
> Biomarker Discovery Software | Bioinformatics Services | Data Warehouse
> Consulting | Data Migration Consulting
> www.i-a-inc.com
> gsaxena@xxxxxxxxxxx
> (301) 760-3077  office
> (240) 479-4272  direct
> (301) 560-3463  fax
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com