Re: cluster unavailable for 20 mins when downed server was reintroduced

Sean Purdy <s.purdy@xxxxxxxxxxxxxxxx> · Mon, 21 Aug 2017 17:24:37 +0100

Hi,

On Thu, 17 Aug 2017, Gregory Farnum said:
> On Wed, Aug 16, 2017 at 4:04 AM Sean Purdy <s.purdy@xxxxxxxxxxxxxxxx> wrote:
> 
> > On Tue, 15 Aug 2017, Gregory Farnum said:
> > > On Tue, Aug 15, 2017 at 4:23 AM Sean Purdy <s.purdy@xxxxxxxxxxxxxxxx>
> > wrote:
> > > > I have a three node cluster with 6 OSD and 1 mon per node.
> > > >
> > > > I had to turn off one node for rack reasons.  While the node was down, the
> > > > cluster was still running and accepting files via radosgw.  However, when I
> > > > turned the machine back on, radosgw uploads stopped working and things like
> > > > "ceph status" starting timed out.  It took 20 minutes for "ceph status" to
> > > > be OK.

> Did you try running "ceph -s" from more than one location? If you had a
> functioning quorum that should have worked. And any live clients should
> have been able to keep working.

I tried from more than one location, yes.

> > Timing went like this:
> >
> > 11:22 node boot
> > 11:22 ceph-mon starts, recovers logs, compaction, first BADAUTHORIZER
> > message
> > 11:22 starting disk activation for 18 partitions (3 per bluestore)
> > 11:23 mgr on other node can't find secret_id
> > 11:43 bluefs mount succeeded on OSDs, ceph-osds go live
> > 11:45 last BADAUTHORIZER message in monitor log
> > 11:45 this host calls and wins a monitor election, mon_down health check
> > clears
> > 11:45 mgr happy
> >
> 
> The timing there on the mounting (how does it take 20 minutes?!?!?) and
> everything working again certainly is suspicious. It's not the direct cause
> of the issue, but there may be something else going on which is causing
> both of them.
> 
> All in all; I'm confused.

I tried again today, having a node down for an hour.  This might be a different set of questions.

This time, after the store came up, OSDs caught up quickly.

But the monitor process on the rebooted node took 25 minutes to come back into quorum.  Is this normal?

2017-08-21 16:10:45.243323 7f3fb62b2700  0 mon.store03@2(synchronizing).data_health(0) update_stats avail 94% total 211 GB, used 914 MB, avail 200 GB
...
2017-08-21 16:38:45.251345 7f3fb62b2700  0 mon.store03@2(peon).data_health(298) update_stats avail 94% total 211 GB, used 1229 MB, avail 199 GB

What is the monitor process doing this time?  It didn't seem to be maxing out network, CPU or disk.

During this time, e.g. "ceph mon stat" on any node took 6 to 15s to return.  Which I presume is a function of "mon client hunt interval".  But still seems long.

However, radosgw file transactions seemed to work fine during the entire process.  So it's probably working as designed.

Mon 21 Aug 16:30:06 BST 2017                                                                                                                   
e5: 3 mons at {store01=172.16.0.43:6789/0,store02=172.16.0.44:6789/0,store03=172.16.0.45:6789/0}, election epoch 294, leader 0 store01, quorum 0,1 store01,store02

real    0m8.456s
user    0m0.304s
sys     0m0.024s

Thanks for feedback, I'm still new to this.

Sean Purdy
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com