On 16/03/2015, at 11.14, Florent B <florent@xxxxxxxxxxx> wrote: > On 03/16/2015 11:03 AM, Alexandre DERUMIER wrote: >> This is strange, that could be: >> >> - qemu crash, maybe a bug in rbd block storage (if you use librbd) >> - oom-killer on you host (any logs ?) >> >> what is your qemu version ? >> > > Now, we have version 2.1.3. > > Some VMs that stopped were running for a long time, but some other had > only 4 days uptime. > > And I precise that not all VMs on that pool crashed, only some of them > (a large majority), and on a same host, some crashed and others not. > > We use Proxmox, so I think it uses librbd ? I had the same issue once also when bumping up PG_NUM, majority of my ProxMox VMs stopped. I believe this might be due to heavy rebalancing causing time out when VMs tries to do IO OPs and thus generating kernel panics. Next time around I want to go smaller increments of pg_num and hopefully avoid this. I follow the need for more PGs when having more OSDs, but how come PGs gets to few when adding more objects/data to a pool? /Steffen
Attachment:
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com