Re: Problem with data distribution

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 3 Jul 2013 10:34:59 -0700



On Wed, Jul 3, 2013 at 2:12 AM, Pierre BLONDEAU
<pierre.blondeau@xxxxxxxxxx> wrote:
> Hy,
>
> Thank you very much for your answer. Sorry for the late reply but a
> modification of a cluster of 67T is long ;)
>
> Actually my pg number was very insufficient :
>
> ceph osd pool get data pg_num
> pg_num: 48
>
> As I'm not sure of the rate of replication that I will set, I change the
> number of pg to 1800:
> ceph osd pool set data pg_num 1800
>
> But the placement is always heterogeneous especially on the machine where I
> had an full osd. I now have two osd on this machine to the limit and I can
> not write to the cluster
>
> jack
> 67 -> 67% /var/lib/ceph/osd/ceph-6
> 86 -> 86% /var/lib/ceph/osd/ceph-8
> 85 -> 77% /var/lib/ceph/osd/ceph-11
> ?  -> 66% /var/lib/ceph/osd/ceph-7
> 47 -> 47% /var/lib/ceph/osd/ceph-10
> 29 -> 29% /var/lib/ceph/osd/ceph-9
>
> joe
> 86 -> 77% /var/lib/ceph/osd/ceph-15
> 67 -> 67% /var/lib/ceph/osd/ceph-13
> 95 -> 96% /var/lib/ceph/osd/ceph-14
> 92 -> 95% /var/lib/ceph/osd/ceph-17
> 86 -> 87% /var/lib/ceph/osd/ceph-12
> 20 -> 20% /var/lib/ceph/osd/ceph-16
>
> william
> 68 -> 86% /var/lib/ceph/osd/ceph-0
> 86 -> 86% /var/lib/ceph/osd/ceph-3
> 67 -> 61% /var/lib/ceph/osd/ceph-4
> 79 -> 71% /var/lib/ceph/osd/ceph-1
> 58 -> 58% /var/lib/ceph/osd/ceph-18
> 64 -> 50% /var/lib/ceph/osd/ceph-2
>
> ceph -w :
> 2013-07-03 10:56:06.610928 mon.0 [INF] pgmap v174071: 1928 pgs: 1816
> active+clean, 84 active+remapped+backfill_toofull, 9
> active+degraded+backfill_toofull, 19
> active+degraded+remapped+backfill_toofull; 300 TB data, 45284 GB used, 21719
> GB / 67004 GB avail; 15EB/s rd, 15EB/s wr, 15Eop/s; 9975324/165229620
> degraded (6.037%);  recovering 15E o/s, 15EB/s
> 2013-07-03 10:56:08.404701 osd.14 [WRN] OSD near full (95%)
> 2013-07-03 10:56:29.729297 osd.17 [WRN] OSD near full (94%)
>
> And I do not understand why the OSD 16 and 19 are hardly used

Hmm, maybe there's something else going on as well. What's the output
of "ceph osd tree"? Are you storing anything in any pool besides
"data"? (ceph osd dump will provide stats on each pool.)
And what version are you running? It shouldn't impact this issue but
the units for recovery are obviously incorrect and I think that's been
squashed in all our current releases. :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com