Re: VM disk operation blocked during OSDs failures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Christian,

I'm using a pool with size 3, min_size 1.

I can see the cluster serving I/O in a degraded after the OSD is marked down, but the problem we have is in the interval between the OSD failure event and the moment when that OSD is marked down.

In that interval (which can take up to 10 minutes) all the I/O operations directed to that OSD are blocked, thus all the virtual machines using the RBDs provided by the cluster hang, until the failed OSD is finally marked down.

Is this the expected operation of the cluster during failure?

Is it possible to make that time shorter so the I/O operations don't get blocked for so long?

Thanks,

On 11/04/2016 07:25 PM, Christian Wuerdig wrote:
What are your pool size and min_size settings? An object with less than min_size replicas will not receive I/O (http://docs.ceph.com/docs/jewel/rados/operations/pools/#set-the-number-of-object-replicas). So if size=2 and min_size=1 then an OSD failure means blocked operations to all objects located on the failed OSD until they have been replicated again.

On Sat, Nov 5, 2016 at 9:04 AM, fcid <fcid@xxxxxxxxxxx> wrote:
Dear ceph community,

I'm working in a small ceph deployment for testing purposes, in which i want to test the high availability features of Ceph and how clients are affected during outages in the cluster.

This small cluster is deployed using 3 servers on which are running 2 OSDs and 1 monitor each, and we are using it to serve Rados block devices for KVM hypervisors in other hosts. The ceph software was installed using ceph-deploy.

For HA testing we are simulating disk failures by physically detaching OSD disks from servers and also by eliminating the power source from servers we want to fail.

I have some doubts regarding the behavior during OSD and disk failures under light workloads.

During disk failures, the cluster takes a long time to promote the secondary OSD to primary, thus blocking all the disk operations of virtual machines using RBD until the cluster map is updated with the failed OSD (which can take up to 10 minutes in our cluster). Is this the expected behavior of the OSD cluster? or should it be transparent to clients when the disks fails?

Thanks in advance, kind regards.

Configuration and version of our ceph cluster:

root@ceph00:~# cat /etc/ceph/ceph.conf
[global]
fsid = 440fce60-3097-4f1c-a489-c170e65d8e09
mon_initial_members = ceph00
mon_host = 192.168.x1.x1
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = 192.168.x.x/x
cluster network = y.y.y.y/y
[osd]
osd mkfs options = -f -i size=2048 -n size=64k
osd mount options xfs = inode64,noatime,logbsize=256k
osd journal size = 20480
filestore merge threshold = 40
filestore split multiple = 8
filestore xattr use omap = true

root@ceph00:~# ceph -v
ceph version 10.2.3

--
Fernando Cid O.
Ingeniero de Operaciones
AltaVoz S.A.
 http://www.altavoz.net
Viña del Mar, Valparaiso:
 2 Poniente 355 of 53
 +56 32 276 8060
Santiago:
 San Pío X 2460, oficina 304, Providencia
 +56 2 2585 4264

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Fernando Cid O.
Ingeniero de Operaciones
AltaVoz S.A.
 http://www.altavoz.net
Viña del Mar, Valparaiso:
 2 Poniente 355 of 53
 +56 32 276 8060
Santiago:
 San Pío X 2460, oficina 304, Providencia
 +56 2 2585 4264 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux