Hello, On Mon, 4 Aug 2014 11:03:37 +0800 ? wrote: > hello, I have running a ceph cluster(RBD) on production environment to > host 200 VMs, Under normal circumstances, ceph's performance is quite > good. but when I delete a snapshot or image, ceph cluster will be > appear ?a lot of blocked requests(generally morn than 1000?), then , the > whole cluster have slow down, many VMs are very slow, any idea ? than you > > the hardware of my cluster---------------------------------- > my cluster have 3 nodes,every node have 2TB sata * 10 and 120G SSD * 1 > I suspect your cluster is pretty close to full capacity when operating normally and overwhelmed when something very intensive like an image deletion (that has to touch every last object of the image) comes along. It would be nice if operations like these would have (more and better) configuration options like with scrub (load) and recovery operations. Monitor your cluster with atop on all 3 nodes in parallel, observe the utilization of your HDDs and SSDs, CPU and network during a time of normal usage. Compare that to what you see when you delete an image (use a small one ^o^). About your cluster, what OS, Ceph version, replication factor? What CPU, memory and network configuration? A single 120GB SSD (which model?) as journal for 10 HDDs will be definitely be the limiting factor when it comes to write speed, but should handle the IOPS hopefully well enough. Christian -- Christian Balzer Network/Systems Engineer chibi at gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/