Re: Changing pg_num => RBD VM down !

Steffen W Sørensen <stefws@xxxxxx> · Mon, 16 Mar 2015 11:37:56 +0100

On 16/03/2015, at 11.14, Florent B <florent@xxxxxxxxxxx> wrote:

> On 03/16/2015 11:03 AM, Alexandre DERUMIER wrote:
>> This is strange, that could be:
>> 
>> - qemu crash, maybe a bug in rbd block storage (if you use librbd)
>> - oom-killer on you host (any logs ?)
>> 
>> what is your qemu version ?
>> 
> 
> Now, we have version 2.1.3.
> 
> Some VMs that stopped were running for a long time, but some other had
> only 4 days uptime.
> 
> And I precise that not all VMs on that pool crashed, only some of them
> (a large majority), and on a same host, some crashed and others not.
> 
> We use Proxmox, so I think it uses librbd ?
I had the same issue once also when bumping up PG_NUM, majority of my ProxMox VMs stopped. I believe this might be due to heavy rebalancing causing time out when VMs tries to do IO OPs and thus generating kernel panics.

Next time around I want to go smaller increments of pg_num and hopefully avoid this.

I follow the need for more PGs when having more OSDs, but how come PGs gets to few when adding more objects/data to a pool?

/Steffen
Attachment:
signature.asc

Description: Message signed with OpenPGP using GPGMail
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com