Re: VM disk operation blocked during OSDs failures

fcid <fcid@xxxxxxxxxxx> · Mon, 7 Nov 2016 10:44:04 -0300



    Thanks Christian,
    I'm using a pool with size 3, min_size 1.

    
    I can see the cluster serving I/O in a degraded after the OSD is
      marked down, but the problem we have is in the interval between
      the OSD failure event and the moment when that OSD is marked down.
    In that interval (which can take up to 10 minutes) all the I/O
      operations directed to that OSD are blocked, thus all the virtual
      machines using the RBDs provided by the cluster hang, until the
      failed OSD is finally marked down.
    Is this the expected operation of the cluster during failure?

    
    Is it possible to make that time shorter so the I/O operations
      don't get blocked for so long?
    Thanks,

    
    On 11/04/2016 07:25 PM, Christian
      Wuerdig wrote:

    
      What are your pool size and min_size settings? An
        object with less than min_size replicas will not receive I/O (http://docs.ceph.com/docs/jewel/rados/operations/pools/#set-the-number-of-object-replicas).
        So if size=2 and min_size=1 then an OSD failure means blocked
        operations to all objects located on the failed OSD until they
        have been replicated again.

      
        On Sat, Nov 5, 2016 at 9:04 AM, fcid <fcid@xxxxxxxxxxx>
          wrote:

          Dear ceph
            community,

            
            I'm working in a small ceph deployment for testing purposes,
            in which i want to test the high availability features of
            Ceph and how clients are affected during outages in the
            cluster.

            
            This small cluster is deployed using 3 servers on which are
            running 2 OSDs and 1 monitor each, and we are using it to
            serve Rados block devices for KVM hypervisors in other
            hosts. The ceph software was installed using ceph-deploy.

            
            For HA testing we are simulating disk failures by physically
            detaching OSD disks from servers and also by eliminating the
            power source from servers we want to fail.

            
            I have some doubts regarding the behavior during OSD and
            disk failures under light workloads.

            
            During disk failures, the cluster takes a long time to
            promote the secondary OSD to primary, thus blocking all the
            disk operations of virtual machines using RBD until the
            cluster map is updated with the failed OSD (which can take
            up to 10 minutes in our cluster). Is this the expected
            behavior of the OSD cluster? or should it be transparent to
            clients when the disks fails?

            
            Thanks in advance, kind regards.

            
            Configuration and version of our ceph cluster:

            
            root@ceph00:~# cat /etc/ceph/ceph.conf

            [global]

            fsid = 440fce60-3097-4f1c-a489-c170e65d8e09

            mon_initial_members = ceph00

            mon_host = 192.168.x1.x1

            auth_cluster_required = cephx

            auth_service_required = cephx

            auth_client_required = cephx

            public network = 192.168.x.x/x

            cluster network = y.y.y.y/y

            [osd]

            osd mkfs options = -f -i size=2048 -n size=64k

            osd mount options xfs = inode64,noatime,logbsize=256k

            osd journal size = 20480

            filestore merge threshold = 40

            filestore split multiple = 8

            filestore xattr use omap = true

            
            root@ceph00:~# ceph -v

            ceph version 10.2.3

            
            -- 

            Fernando Cid O.

            Ingeniero de Operaciones

            AltaVoz S.A.

             http://www.altavoz.net

            Viña del Mar, Valparaiso:

             2 Poniente 355 of 53

             +56 32 276 8060

            Santiago:

             San Pío X 2460, oficina 304, Providencia

             +56 2 2585 4264

            
            _______________________________________________

            ceph-users mailing list

            ceph-users@xxxxxxxxxxxxxx

            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

          
    -- 
Fernando Cid O.
Ingeniero de Operaciones
AltaVoz S.A.
 http://www.altavoz.net
Viña del Mar, Valparaiso:
 2 Poniente 355 of 53
 +56 32 276 8060
Santiago:
 San Pío X 2460, oficina 304, Providencia
 +56 2 2585 4264 
  

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com