Re: periodically delays when one of mons dies

Greg Farnum <gregory.farnum@xxxxxxxxxxxxx> · Thu, 22 Mar 2012 10:34:42 -0700

On Wednesday, March 21, 2012 at 8:30 AM, ruslan usifov wrote:
> Hello
>  
> I'm new to ceph, and perhaps misunderstand some things.
>  
> I have test configuration with 3 wmvare machines (i test RBD). My setup
> consist of:
>  
> 3: mons
> 3: osd
>  
>  
> When i kill one mon (simulate fail), time to time (periodicaly) i got little
> delays when work with RBD device, perhaps this happens when client try
> failed mon

That's probably the case — the tools generally pick a random monitor from the list and time out after 15 seconds if it's not responding. If you know a monitor is down you can specify one of the others to connect to with the -m option.

> , is it possible switch off this failed mon until it fully
> restore.

You could take it out of the daemon's config file, but there's no way for new daemons to avoid trying to talk to down monitors which are in their config (unless you explicitly specify the mon to connect to, as I said above) — the monitor is the first part of the system they talk to, so there's not a way for Ceph itself to propagate information about down mons.

> perhaps pacemaker will help in this case + failover ip + somthing
> like proxy which known about mon configuration (i try haproxy but without
> success: 1 first of all haproxy doesn't know about live or fail mons sof
> delays will happens also ceph doesn't allow this scheme - ie ceph client
> check ip address on which it connect and what send to hip mon, and in schem
> with proxy this values doesn't coincide)

Not quite sure what you're saying here…
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html