Long interruption when increasing placement groups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello ceph community,

Last week I was increasing the PGs in a pool used for RBD, in a attempt to reach 1024 PGs (from 128 PGs). The increments were of 32 each time and after creating the new placement groups I trigger re-balance of data using the pgp_num parameter.

Every thing was fine until the pool reach the ~400 PGs. Before 414 PGs, the cluster interrupted the client io for 10 seconds approx., while creating the new 32 PGs, which was fine for the SLA we try to accomplish. After 414 PGs that interruption was longer, reaching 40 seconds and some downtime in our virtual machines which last 1 minute more or less and hundreds of blocked ops in the ceph log.

I would like to understand how the client io interruption took longer when I had more PGs. I've bee unable to figure that out from the documentation and distribution list.

Some info of the cluster:

  • n° OSD: 24. This cluster born with 6 OSDs.
  • 3 OSD nodes.
  • 3 monitors.
  • version: Jewel 10.2.10
  • OSD backend disks: HDD
  • OSD journal disks: SSD

Let me know if you need further information and thanks in advance.

Kind regards to you all.

-- 
Fernando Cid O.
Ingeniero de Operaciones
AltaVoz S.A.
 http://www.altavoz.net
Viña del Mar, Valparaiso:
 2 Poniente 355 of 53
 +56 32 276 8060
Providencia, Santiago:
 Antonio Bellet 292 of 701
 +56 2 585 4264 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux