Re: Changing pg_num => RBD VM down !

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



@Michael Kuriger: when ceph/librbd operate normally, i know that double the pg_num is the safe way. But when it has problem, i think double it can make many many VMs die (maybe >= 50%?)


On Mon, Mar 16, 2015 at 9:53 PM, Michael Kuriger <mk7193@xxxxxx> wrote:
I always keep my pg number a power of 2.  So I’d go from 2048 to 4096.  I’m not sure if this is the safest way, but it’s worked for me.

 

yp

 

Michael Kuriger

Sr. Unix Systems Engineer

* mk7193@xxxxxx |( 818-649-7235


From: Chu Duc Minh <chu.ducminh@xxxxxxxxx>
Date: Monday, March 16, 2015 at 7:49 AM
To: Florent B <florent@xxxxxxxxxxx>
Cc: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
Subject: Re: Changing pg_num => RBD VM down !

I'm using the latest Giant and have the same issue. When i increase PG_num of a pool from 2048 to 2148, my VMs is still ok. When i increase from 2148 to 2400, some VMs die (Qemu-kvm process die).
My physical servers (host VMs) running kernel 3.13 and use librbd.
I think it's a bug in librbd with crushmap.
(I set crush_tunables3 on my ceph cluster, does it make sense?)

Do you know a way to safely increase PG_num? (I don't think increase PG_num 100 each times is a safe & good way)

Regards,

On Mon, Mar 16, 2015 at 8:50 PM, Florent B <florent@xxxxxxxxxxx> wrote:
We are on Giant.

On 03/16/2015 02:03 PM, Azad Aliyar wrote:
>
> May I know your ceph version.?. The latest version of firefly 80.9 has
> patches to avoid excessive data migrations during rewighting osds. You
> may need set a tunable inorder make this patch active.
>
> This is a bugfix release for firefly.  It fixes a performance regression
> in librbd, an important CRUSH misbehavior (see below), and several RGW
> bugs.  We have also backported support for flock/fcntl locks to ceph-fuse
> and libcephfs.
>
> We recommend that all Firefly users upgrade.
>
> For more detailed information, see
>   http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt
>
> Adjusting CRUSH maps
> --------------------
>
> * This point release fixes several issues with CRUSH that trigger
>   excessive data migration when adjusting OSD weights.  These are most
>   obvious when a very small weight change (e.g., a change from 0 to
>   .01) triggers a large amount of movement, but the same set of bugs
>   can also lead to excessive (though less noticeable) movement in
>   other cases.
>
>   However, because the bug may already have affected your cluster,
>   fixing it may trigger movement *back* to the more correct location.
>   For this reason, you must manually opt-in to the fixed behavior.
>
>   In order to set the new tunable to correct the behavior::
>
>      ceph osd crush set-tunable straw_calc_version 1
>
>   Note that this change will have no immediate effect.  However, from
>   this point forward, any 'straw' bucket in your CRUSH map that is
>   adjusted will get non-buggy internal weights, and that transition
>   may trigger some rebalancing.
>
>   You can estimate how much rebalancing will eventually be necessary
>   on your cluster with::
>
>      ceph osd getcrushmap -o /tmp/cm
>      crushtool -i /tmp/cm --num-rep 3 --test --show-mappings > /tmp/a 2>&1
>      crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2
>      crushtool -i /tmp/cm2 --reweight -o /tmp/cm2
>      crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings > /tmp/b
> 2>&1
>      wc -l /tmp/a                          # num total mappings
>      diff -u /tmp/a /tmp/b | grep -c ^+    # num changed mappings
>
>    Divide the total number of lines in /tmp/a with the number of lines
>    changed.  We've found that most clusters are under 10%.
>
>    You can force all of this rebalancing to happen at once with::
>
>      ceph osd crush reweight-all
>
>    Otherwise, it will happen at some unknown point in the future when
>    CRUSH weights are next adjusted.
>
> Notable Changes
> ---------------
>
> * ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum)
> * crush: fix straw bucket weight calculation, add straw_calc_version
>   tunable (#10095 Sage Weil)
> * crush: fix tree bucket (Rongzu Zhu)
> * crush: fix underflow of tree weights (Loic Dachary, Sage Weil)
> * crushtool: add --reweight (Sage Weil)
> * librbd: complete pending operations before losing image (#10299 Jason
>   Dillaman)
> * librbd: fix read caching performance regression (#9854 Jason Dillaman)
> * librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman)
> * mon: fix dump of chooseleaf_vary_r tunable (Sage Weil)
> * osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai)
> * osd: handle no-op write with snapshot (#10262 Sage Weil)
> * radosgw-admi
>
>
>
>
> On 03/16/2015 12:37 PM, Alexandre DERUMIER wrote:
> >>> VMs are running on the same nodes than OSD
> > Are you sure that you didn't some kind of out of memory.
> > pg rebalance can be memory hungry. (depend how many osd you have).
>
> 2 OSD per host, and 5 hosts in this cluster.
> hosts h
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux