I'm using the latest Giant and have the same issue. When i increase PG_num of a pool from 2048 to 2148, my VMs is still ok. When i increase from 2148 to 2400, some VMs die (Qemu-kvm process die).
My physical servers (host VMs) running kernel 3.13 and use librbd.I think it's a bug in librbd with crushmap.
(I set crush_tunables3 on my ceph cluster, does it make sense?)
Do you know a way to safely increase PG_num? (I don't think increase PG_num 100 each times is a safe & good way)
On Mon, Mar 16, 2015 at 8:50 PM, Florent B <florent@xxxxxxxxxxx> wrote:
We are on Giant.
On 03/16/2015 02:03 PM, Azad Aliyar wrote:
>
> May I know your ceph version.?. The latest version of firefly 80.9 has
> patches to avoid excessive data migrations during rewighting osds. You
> may need set a tunable inorder make this patch active.
>
> This is a bugfix release for firefly. It fixes a performance regression
> in librbd, an important CRUSH misbehavior (see below), and several RGW
> bugs. We have also backported support for flock/fcntl locks to ceph-fuse
> and libcephfs.
>
> We recommend that all Firefly users upgrade.
>
> For more detailed information, see
> http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt
>
> Adjusting CRUSH maps
> --------------------
>
> * This point release fixes several issues with CRUSH that trigger
> excessive data migration when adjusting OSD weights. These are most
> obvious when a very small weight change (e.g., a change from 0 to
> .01) triggers a large amount of movement, but the same set of bugs
> can also lead to excessive (though less noticeable) movement in
> other cases.
>
> However, because the bug may already have affected your cluster,
> fixing it may trigger movement *back* to the more correct location.
> For this reason, you must manually opt-in to the fixed behavior.
>
> In order to set the new tunable to correct the behavior::
>
> ceph osd crush set-tunable straw_calc_version 1
>
> Note that this change will have no immediate effect. However, from
> this point forward, any 'straw' bucket in your CRUSH map that is
> adjusted will get non-buggy internal weights, and that transition
> may trigger some rebalancing.
>
> You can estimate how much rebalancing will eventually be necessary
> on your cluster with::
>
> ceph osd getcrushmap -o /tmp/cm
> crushtool -i /tmp/cm --num-rep 3 --test --show-mappings > /tmp/a 2>&1
> crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2
> crushtool -i /tmp/cm2 --reweight -o /tmp/cm2
> crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings > /tmp/b
> 2>&1
> wc -l /tmp/a # num total mappings
> diff -u /tmp/a /tmp/b | grep -c ^+ # num changed mappings
>
> Divide the total number of lines in /tmp/a with the number of lines
> changed. We've found that most clusters are under 10%.
>
> You can force all of this rebalancing to happen at once with::
>
> ceph osd crush reweight-all
>
> Otherwise, it will happen at some unknown point in the future when
> CRUSH weights are next adjusted.
>
> Notable Changes
> ---------------
>
> * ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum)
> * crush: fix straw bucket weight calculation, add straw_calc_version
> tunable (#10095 Sage Weil)
> * crush: fix tree bucket (Rongzu Zhu)
> * crush: fix underflow of tree weights (Loic Dachary, Sage Weil)
> * crushtool: add --reweight (Sage Weil)
> * librbd: complete pending operations before losing image (#10299 Jason
> Dillaman)
> * librbd: fix read caching performance regression (#9854 Jason Dillaman)
> * librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman)
> * mon: fix dump of chooseleaf_vary_r tunable (Sage Weil)
> * osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai)
> * osd: handle no-op write with snapshot (#10262 Sage Weil)
> * radosgw-admi
>
>
>
>
> On 03/16/2015 12:37 PM, Alexandre DERUMIER wrote:
> >>> VMs are running on the same nodes than OSD
> > Are you sure that you didn't some kind of out of memory.
> > pg rebalance can be memory hungry. (depend how many osd you have).
>
> 2 OSD per host, and 5 hosts in this cluster.
> hosts h
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com