Re: dealing with the full osd / help reweight

Jacek Jarosiewicz <jjarosiewicz@xxxxxxxxxxxxx> · Thu, 24 Mar 2016 13:57:10 +0100

[root@cf01 ceph]# ceph osd pool ls detail
pool 0 'vms' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 67 flags hashpspool stripe_width 0
pool 1 '.rgw.root' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 117 flags hashpspool 
stripe_width 0
pool 2 '.rgw.control' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 118 flags hashpspool 
stripe_width 0
pool 3 '.rgw.gc' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 119 flags hashpspool 
stripe_width 0
pool 4 '.rgw.buckets_cache' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 121 flags hashpspool 
stripe_width 0
pool 5 '.rgw.buckets.index' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 122 flags hashpspool 
stripe_width 0
pool 6 '.rgw.buckets.extra' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 123 flags hashpspool 
stripe_width 0
pool 7 '.log' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 8 pgp_num 8 last_change 124 flags hashpspool stripe_width 0
pool 8 '.intent-log' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 125 flags hashpspool 
stripe_width 0
pool 9 '.usage' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 8 pgp_num 8 last_change 126 flags hashpspool stripe_width 0
pool 10 '.users' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 127 flags hashpspool 
stripe_width 0
pool 11 '.users.email' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 128 flags hashpspool 
stripe_width 0
pool 12 '.users.swift' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 129 flags hashpspool 
stripe_width 0
pool 13 '.users.uid' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 130 flags hashpspool 
stripe_width 0
pool 14 '.rgw.buckets' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 64 pgp_num 64 last_change 5717 flags 
hashpspool stripe_width 0
pool 15 '.rgw' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 8 pgp_num 8 last_change 135 owner 18446744073709551615 
flags hashpspool stripe_width 0
pool 17 'one' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 611 flags hashpspool 
stripe_width 0
	removed_snaps [1~e,13~1]

I've tried reweight-by-utilization, but after some data shifting the 
cluster came up with a near full osd again..

Do I assume correctly that a lower weight of an osd means 'use that osd 
less'?

J

On 03/24/2016 01:43 PM, koukou73gr wrote:
What is your pool size? 304 pgs sound awfuly small for 20 OSDs.
More pgs will help distribute full pgs better.

But with a full or near full OSD in hand, increasing pgs is a no-no
operation. If you search in the list archive, I believe there was a
thread last month or so which provided a walkthrough-sort of for dealing
with uneven distribution and a full OSD.

-K.

On 03/24/2016 01:54 PM, Jacek Jarosiewicz wrote:
disk usage on the full osd is as below. What are the *_TEMP directories
for? How can I make sure which pg directories are safe to remove?

[root@cf04 current]# du -hs *
156G    0.14_head
156G    0.21_head
155G    0.32_head
157G    0.3a_head
155G    0.e_head
156G    0.f_head
40K    10.2_head
4.0K    11.3_head
218G    14.13_head
218G    14.15_head
218G    14.1b_head
219G    14.1f_head
9.1G    14.29_head
219G    14.2a_head
75G    14.2d_head
125G    14.2e_head
113G    14.32_head
163G    14.33_head
218G    14.35_head
151G    14.39_head
218G    14.3b_head
103G    14.3d_head
217G    14.3f_head
219G    14.a_head
773M    17.0_head
814M    17.10_head
4.0K    17.10_TEMP
747M    17.19_head
4.0K    17.19_TEMP
669M    17.1b_head
659M    17.1c_head
638M    17.1f_head
681M    17.30_head
4.0K    17.30_TEMP
721M    17.34_head
695M    17.3d_head
726M    17.3e_head
734M    17.3f_head
4.0K    17.3f_TEMP
670M    17.d_head
597M    17.e_head
4.0K    17.e_TEMP
4.0K    1.7_head
34M    5.1_head
34M    5.6_head
4.0K    9.6_head
4.0K    commit_op_seq
30M    meta
0    nosnap
614M    omap

On 03/24/2016 10:11 AM, Jacek Jarosiewicz wrote:
Hi!

I have a problem with the osds getting full on our cluster.

--
Jacek Jarosiewicz
Administrator Systemów Informatycznych

----------------------------------------------------------------------------------------
SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
ul. Senatorska 13/15, 00-075 Warszawa
Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego 
Rejestru Sądowego,
nr KRS 0000029537; kapitał zakładowy 44.556.000,00 zł
NIP: 957-05-49-503
Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa

----------------------------------------------------------------------------------------
SUPERMEDIA ->   http://www.supermedia.pl
dostep do internetu - hosting - kolokacja - lacza - telefonia
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com