Re: VM disk operation blocked during OSDs failures

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 7 Nov 2016 06:51:44 -0800

On Mon, Nov 7, 2016 at 5:44 AM, fcid <fcid@xxxxxxxxxxx> wrote:
> Thanks Christian,
>
> I'm using a pool with size 3, min_size 1.
>
> I can see the cluster serving I/O in a degraded after the OSD is marked
> down, but the problem we have is in the interval between the OSD failure
> event and the moment when that OSD is marked down.
>
> In that interval (which can take up to 10 minutes) all the I/O operations

This is your problem — it shouldn't take that long to mark an OSD
down. You might be having issues due to the very small cluster size,
in which case playing with the "mon osd min down reporters" parameter
might be useful. It's been discussed frequently on the list if you
check the archives.

Since you're running Jewel there's also a new "mon osd reporter
subtree level" that defaults to "host" and you might try setting it to
"device", but the logic there is new to me; maybe Xiaoxi can enlighten
us.
-Greg

> directed to that OSD are blocked, thus all the virtual machines using the
> RBDs provided by the cluster hang, until the failed OSD is finally marked
> down.
>
> Is this the expected operation of the cluster during failure?
>
> Is it possible to make that time shorter so the I/O operations don't get
> blocked for so long?
>
> Thanks,
>
> On 11/04/2016 07:25 PM, Christian Wuerdig wrote:
>
> What are your pool size and min_size settings? An object with less than
> min_size replicas will not receive I/O
> (http://docs.ceph.com/docs/jewel/rados/operations/pools/#set-the-number-of-object-replicas).
> So if size=2 and min_size=1 then an OSD failure means blocked operations to
> all objects located on the failed OSD until they have been replicated again.
>
> On Sat, Nov 5, 2016 at 9:04 AM, fcid <fcid@xxxxxxxxxxx> wrote:
>>
>> Dear ceph community,
>>
>> I'm working in a small ceph deployment for testing purposes, in which i
>> want to test the high availability features of Ceph and how clients are
>> affected during outages in the cluster.
>>
>> This small cluster is deployed using 3 servers on which are running 2 OSDs
>> and 1 monitor each, and we are using it to serve Rados block devices for KVM
>> hypervisors in other hosts. The ceph software was installed using
>> ceph-deploy.
>>
>> For HA testing we are simulating disk failures by physically detaching OSD
>> disks from servers and also by eliminating the power source from servers we
>> want to fail.
>>
>> I have some doubts regarding the behavior during OSD and disk failures
>> under light workloads.
>>
>> During disk failures, the cluster takes a long time to promote the
>> secondary OSD to primary, thus blocking all the disk operations of virtual
>> machines using RBD until the cluster map is updated with the failed OSD
>> (which can take up to 10 minutes in our cluster). Is this the expected
>> behavior of the OSD cluster? or should it be transparent to clients when the
>> disks fails?
>>
>> Thanks in advance, kind regards.
>>
>> Configuration and version of our ceph cluster:
>>
>> root@ceph00:~# cat /etc/ceph/ceph.conf
>> [global]
>> fsid = 440fce60-3097-4f1c-a489-c170e65d8e09
>> mon_initial_members = ceph00
>> mon_host = 192.168.x1.x1
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>> public network = 192.168.x.x/x
>> cluster network = y.y.y.y/y
>> [osd]
>> osd mkfs options = -f -i size=2048 -n size=64k
>> osd mount options xfs = inode64,noatime,logbsize=256k
>> osd journal size = 20480
>> filestore merge threshold = 40
>> filestore split multiple = 8
>> filestore xattr use omap = true
>>
>> root@ceph00:~# ceph -v
>> ceph version 10.2.3
>>
>> --
>> Fernando Cid O.
>> Ingeniero de Operaciones
>> AltaVoz S.A.
>>  http://www.altavoz.net
>> Viña del Mar, Valparaiso:
>>  2 Poniente 355 of 53
>>  +56 32 276 8060
>> Santiago:
>>  San Pío X 2460, oficina 304, Providencia
>>  +56 2 2585 4264
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Fernando Cid O.
> Ingeniero de Operaciones
> AltaVoz S.A.
>  http://www.altavoz.net
> Viña del Mar, Valparaiso:
>  2 Poniente 355 of 53
>  +56 32 276 8060
> Santiago:
>  San Pío X 2460, oficina 304, Providencia
>  +56 2 2585 4264
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com