Re: near full osd

Samuel Just <sam.just@xxxxxxxxxxx> · Tue, 12 Nov 2013 12:46:15 -0800



I think we removed the experimental warning in cuttlefish.  It
probably wouldn't hurt to do it in bobtail particularly if you test it
extensively on a test cluster first.  However, we didn't do extensive
testing on it until cuttlefish.  I would upgrade to cuttlefish
(actually, dumpling or emperor, now) first.  Also, please note that in
any version, pg split causes massive data movement.
-Sam

On Mon, Nov 11, 2013 at 7:04 AM, Oliver Francke <Oliver.Francke@xxxxxxxx> wrote:
> Hi Greg,
>
> we are in a similar situation with a huge disbalance, so some of our 28
> OSD's are about 40%, whereas some are "near full" 84%.
> Default is 8, we have a default with 32, but for some pools where customers
> raised their VM-hd's quickly to 1TB and more in sum, I think this is where
> the problems come from?!
>
> For some other reason we are still running good'ol' bobtail, and in the lab
> I tried to force increase via "--allow-experimental-feature" with
> 0.56.7-3...
> It's working, but how experimental is it for production?
>
> Thnx in advance,
>
> Oliver.
>
>
> On 11/08/2013 06:26 PM, Gregory Farnum wrote:
>
> After you increase the number of PGs, *and* increase the "pgp_num" to do the
> rebalancing (this is all described in the docs; do a search), data will move
> around and the overloaded OSD will have less stuff on it. If it's actually
> marked as full, though, this becomes a bit trickier. Search the list
> archives for some instructions; I don't remember the best course to follow.
> -Greg
>
> On Friday, November 8, 2013, Kevin Weiler wrote:
>>
>> Thanks again Gregory!
>>
>> One more quick question. If I raise the amount of PGs for a pool, will
>> this REMOVE any data from the full OSD? Or will I have to take the OSD out
>> and put it back in to realize this benefit? Thanks!
>>
>>
>> --
>>
>> Kevin Weiler
>>
>> IT
>>
>>
>>
>> IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
>> 60606 | http://imc-chicago.com/
>>
>> Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
>> kevin.weiler@xxxxxxxxxxxxxxx
>>
>>
>> From: Gregory Farnum <greg@xxxxxxxxxxx>
>> Date: Friday, November 8, 2013 11:00 AM
>> To: Kevin Weiler <kevin.weiler@xxxxxxxxxxxxxxx>
>> Cc: "Aronesty, Erik" <earonesty@xxxxxxxxxxxxxxxxxxxxxx>, Greg Chavez
>> <greg.chavez@xxxxxxxxx>, "ceph-users@xxxxxxxxxxxxxx"
>> <ceph-users@xxxxxxxxxxxxxx>
>> Subject: Re:  near full osd
>>
>> It's not a hard value; you should adjust based on the size of your pools
>> (many of then are quite small when used with RGW, for instance). But in
>> general it is better to have more than fewer, and if you want to check you
>> can look at the sizes of each PG (ceph pg dump) and increase the counts for
>> pools with wide variability-Greg
>>
>> On Friday, November 8, 2013, Kevin Weiler wrote:
>>
>> Thanks Gregory,
>>
>> One point that was a bit unclear in documentation is whether or not this
>> equation for PGs applies to a single pool, or the entirety of pools.
>> Meaning, if I calculate 3000 PGs, should each pool have 3000 PGs or should
>> all the pools ADD UP to 3000 PGs? Thanks!
>>
>> --
>>
>> Kevin Weiler
>>
>> IT
>>
>>
>> IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
>> 60606 | http://imc-chicago.com/
>>
>> Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
>> kevin.weiler@xxxxxxxxxxxxxxx
>>
>>
>>
>>
>>
>>
>>
>> On 11/7/13 9:59 PM, "Gregory Farnum" <greg@xxxxxxxxxxx> wrote:
>>
>> >It sounds like maybe your PG counts on your pools are too low and so
>> >you're just getting a bad balance. If that's the case, you can
>> >increase the PG count with "ceph osd pool <name> set pgnum <higher
>> >value>".
>> >
>> >OSDs should get data approximately equal to <node weight>/<sum of node
>> >weights>, so higher weights get more data and all its associated
>> >traffic.
>> >-Greg
>> >Software Engineer #42 @ http://inktank.com | http://ceph.com
>> >
>> >
>> >On Tue, Nov 5, 2013 at 8:30 AM, Kevin Weiler
>> ><Kevin.Weiler@xxxxxxxxxxxxxxx> wrote:
>> >> All of the disks in my cluster are identical and therefore all have the
>> >>same
>> >> weight (each drive is 2TB and the automatically generated weight is
>> >>1.82 for
>> >> each one).
>> >>
>> >> Would the procedure here be to reduce the weight, let it rebal, and
>> >>then put
>> >> the weight back to where it was?
>> >>
>> >>
>> >> --
>> >>
>> >> Kevin Weiler
>> >>
>> >> IT
>> >>
>> >>
>> >>
>> >> IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
>> >>60606
>> >> | http://imc-chicago.com/
>> >>
>> >> Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:
>> >> kevin.weiler@xxxxxxxxxxxxxxx
>> >>
>> >>
>> >> From: <Aronesty>, Erik <earonesty@xxxxxxxxxxxxxxxxxxxxxx>
>> >> Date: Tuesday, November 5, 2013 10:27 AM
>> >> To: Greg Chavez <greg.chavez@xxxxxxxxx>, Kevin Weiler
>> >> <kevin.weiler@xxxxxxxxxxxxxxx>
>> >> Cc: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
>> >> Subject: RE:  near full osd
>> >>
>> >> If there¹s an underperforming disk, why on earth would more data be put
>> >>on
>> >> it?  You¹d think it would be lessŠ.   I would think an overperforming
>> >>disk
>> >> should (desirably) cause that case,right?
>> >>
>> >>
>> >>
>> >> From: ceph-users-bounces@xxxxxxxxxxxxxx
>> >> [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Greg Chavez
>> >> Sent: Tuesday, November 05, 2013 11:20 AM
>> >> To: Kevin Weiler
>> >> Cc: ceph-users@xxxxxxxxxxxxxx
>> >> Subject: Re:  near full osd
>> >>
>> >>
>> >>
>> >> Kevin, in my experience that usually indicates a bad or underperforming
>> >> disk, or a too-high priority.  Try running "ceph osd crush reweight
>> >>osd.<##>
>> >> 1.0.  If that doesn't do the trick, you may want to just out that guy.
>> >>
>> >>
>> >>
>> >> I don't think the crush algorithm guarantees balancing things out in
>> >>the way
>> >> you're expecting.
>> >>
>> >>
>> >>
>> >> --Greg
>> >>
>> >> On Tue, Nov 5, 2013 at 11:11 AM, Kevin Weiler
>> >><Kevin.Weiler@xxxxxxxxxxxxxxx>
>> >> wrote:
>> >>
>> >> Hi guys,
>> >>
>> >>
>> >>
>> >> I have an OSD in my cluster that is near full at 90%, but we're using a
>> >> little less than half the available storage in the cluster. Shouldn't
>> >>this
>> >> be balanced out?
>> >>
>> >>
>> >>
>> >> --
>> >
>>
>>
>> ________________________________
>>
>> The information in this e-mail is intended only for the person or entity
>> to which it is addressed.
>>
>> It may contain confidential and /or privileged material. If someone other
>> than the intended recipient should receive this e-mail, he / she shall not
>> be entitled to read, disseminate, disclose or duplicate it.
>>
>> If you receive this e-mail unintentionally, please inform us immediately
>> by "reply" and then delete it from your system. Although this information
>> has been compiled with great care, neither IMC Financial Markets & Asset
>> Management nor any of its related entities shall accept any responsibility
>> for any errors, omissions or other inaccuracies in this information or for
>> the consequences thereof, nor shall it be bound in any way by the contents
>> of this e-mail or its attachments. In the event of incomplete or incorrect
>> transmission, please return the e-mail to the sender and permanently delete
>> this message and any attachments.
>>
>> Messages and attachments are scanned for all known viruses. Always scan
>> attachments before opening them.
>
>
>
> --
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
>
> Oliver Francke
>
> filoo GmbH
> Moltkestraße 25a
> 33330 Gütersloh
> HRB4355 AG Gütersloh
>
> Geschäftsführer: J.Rehpöhler | C.Kunz
>
> Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com