Re: near full osd

Oliver Francke <Oliver.Francke@xxxxxxxx> · Thu, 14 Nov 2013 11:16:43 +0100

Hi Sam,

On 11/12/2013 09:46 PM, Samuel Just wrote:
I think we removed the experimental warning in cuttlefish.  It
probably wouldn't hurt to do it in bobtail particularly if you test it
extensively on a test cluster first.  However, we didn't do extensive
testing on it until cuttlefish.  I would upgrade to cuttlefish
(actually, dumpling or emperor, now) first.  Also, please note that in
any version, pg split causes massive data movement.

thanks for stepping in. We have some 300 pools, each with default 32 PG's.
In some of the pools there are only one to few 10-20GiB, in others 
couple of ~300GiB images.
We cleaned up one 2TiB image - means: tune as ext4 with virtio-scsi and 
discard-option and do an fstrim - worked like a charm, so situation is a 
bit relaxed now.

BTW: is there an easy way for the following:
    - we have a near-full OSD
    - discover which PG's are mostly affected ( ceph [pg|osd] dump is 
our friend)
    - know which images and pools could be optimzed from there?

I know of "ceoh osd getmap..." and osdmaptool if I have a concrete 
rbd-object, but I think I overlook the obvious ;)

These pools we could re-create with a couple of more pg(p)_nums to 
better deploy across our OSD's...

Thnx n regards,

Oliver.

-Sam

On Mon, Nov 11, 2013 at 7:04 AM, Oliver Francke <Oliver.Francke@xxxxxxxx> wrote:
Hi Greg,

we are in a similar situation with a huge disbalance, so some of our 28
OSD's are about 40%, whereas some are "near full" 84%.
Default is 8, we have a default with 32, but for some pools where customers
raised their VM-hd's quickly to 1TB and more in sum, I think this is where
the problems come from?!

For some other reason we are still running good'ol' bobtail, and in the lab
I tried to force increase via "--allow-experimental-feature" with
0.56.7-3...
It's working, but how experimental is it for production?

Thnx in advance,

Oliver.

On 11/08/2013 06:26 PM, Gregory Farnum wrote:

After you increase the number of PGs, *and* increase the "pgp_num" to do the
rebalancing (this is all described in the docs; do a search), data will move
around and the overloaded OSD will have less stuff on it. If it's actually
marked as full, though, this becomes a bit trickier. Search the list
archives for some instructions; I don't remember the best course to follow.
-Greg

On Friday, November 8, 2013, Kevin Weiler wrote:
Thanks again Gregory!

One more quick question. If I raise the amount of PGs for a pool, will
this REMOVE any data from the full OSD? Or will I have to take the OSD out
and put it back in to realize this benefit? Thanks!

--

Kevin Weiler

IT

IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
60606 | http://imc-chicago.com/

Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
kevin.weiler@xxxxxxxxxxxxxxx

From: Gregory Farnum <greg@xxxxxxxxxxx>
Date: Friday, November 8, 2013 11:00 AM
To: Kevin Weiler <kevin.weiler@xxxxxxxxxxxxxxx>
Cc: "Aronesty, Erik" <earonesty@xxxxxxxxxxxxxxxxxxxxxx>, Greg Chavez
<greg.chavez@xxxxxxxxx>, "ceph-users@xxxxxxxxxxxxxx"
<ceph-users@xxxxxxxxxxxxxx>
Subject: Re:  near full osd

It's not a hard value; you should adjust based on the size of your pools
(many of then are quite small when used with RGW, for instance). But in
general it is better to have more than fewer, and if you want to check you
can look at the sizes of each PG (ceph pg dump) and increase the counts for
pools with wide variability-Greg

On Friday, November 8, 2013, Kevin Weiler wrote:

Thanks Gregory,

One point that was a bit unclear in documentation is whether or not this
equation for PGs applies to a single pool, or the entirety of pools.
Meaning, if I calculate 3000 PGs, should each pool have 3000 PGs or should
all the pools ADD UP to 3000 PGs? Thanks!

--

Kevin Weiler

IT

IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
60606 | http://imc-chicago.com/

Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
kevin.weiler@xxxxxxxxxxxxxxx

On 11/7/13 9:59 PM, "Gregory Farnum" <greg@xxxxxxxxxxx> wrote:

It sounds like maybe your PG counts on your pools are too low and so
you're just getting a bad balance. If that's the case, you can
increase the PG count with "ceph osd pool <name> set pgnum <higher
value>".

OSDs should get data approximately equal to <node weight>/<sum of node
weights>, so higher weights get more data and all its associated
traffic.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On Tue, Nov 5, 2013 at 8:30 AM, Kevin Weiler
<Kevin.Weiler@xxxxxxxxxxxxxxx> wrote:
All of the disks in my cluster are identical and therefore all have the
same
weight (each drive is 2TB and the automatically generated weight is
1.82 for
each one).

Would the procedure here be to reduce the weight, let it rebal, and
then put
the weight back to where it was?

--

Kevin Weiler

IT

IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
60606
| http://imc-chicago.com/

Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
kevin.weiler@xxxxxxxxxxxxxxx

From: <Aronesty>, Erik <earonesty@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tuesday, November 5, 2013 10:27 AM
To: Greg Chavez <greg.chavez@xxxxxxxxx>, Kevin Weiler
<kevin.weiler@xxxxxxxxxxxxxxx>
Cc: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
Subject: RE:  near full osd

If there¹s an underperforming disk, why on earth would more data be put
on
it?  You¹d think it would be lessŠ.   I would think an overperforming
disk
should (desirably) cause that case,right?

From: ceph-users-bounces@xxxxxxxxxxxxxx
[mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Greg Chavez
Sent: Tuesday, November 05, 2013 11:20 AM
To: Kevin Weiler
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  near full osd

Kevin, in my experience that usually indicates a bad or underperforming
disk, or a too-high priority.  Try running "ceph osd crush reweight
osd.<##>
1.0.  If that doesn't do the trick, you may want to just out that guy.

I don't think the crush algorithm guarantees balancing things out in
the way
you're expecting.

--Greg

On Tue, Nov 5, 2013 at 11:11 AM, Kevin Weiler
<Kevin.Weiler@xxxxxxxxxxxxxxx>
wrote:

Hi guys,

I have an OSD in my cluster that is near full at 90%, but we're using a
little less than half the available storage in the cluster. Shouldn't
this
be balanced out?

--

________________________________

The information in this e-mail is intended only for the person or entity
to which it is addressed.

It may contain confidential and /or privileged material. If someone other
than the intended recipient should receive this e-mail, he / she shall not
be entitled to read, disseminate, disclose or duplicate it.

If you receive this e-mail unintentionally, please inform us immediately
by "reply" and then delete it from your system. Although this information
has been compiled with great care, neither IMC Financial Markets & Asset
Management nor any of its related entities shall accept any responsibility
for any errors, omissions or other inaccuracies in this information or for
the consequences thereof, nor shall it be bound in any way by the contents
of this e-mail or its attachments. In the event of incomplete or incorrect
transmission, please return the e-mail to the sender and permanently delete
this message and any attachments.

Messages and attachments are scanned for all known viruses. Always scan
attachments before opening them.

--
Software Engineer #42 @ http://inktank.com | http://ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Oliver Francke

filoo GmbH
Moltkestraße 25a
33330 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Oliver Francke

filoo GmbH
Moltkestraße 25a
33330 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com