Re: cosd multi-second stalls cause "wrongly marked me down"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 11 Apr 2011, Jim Schutt wrote:
> > > > I guess the other thing that would help to confirm this is to just halve
> > > > the number of OSDs on your machines in a test and see if the problem
> > > > goes
> > > > away.
> > > I was going to try this first, exactly because it seems like
> > > a definitive test.
> > > 
> > > > > If my analysis above is correct, do you think anything
> > > > > can be gained by running the heartbeat and heartbeat
> > > > > dispatcher threads as SCHED_RR threads?  Since tick() runs
> > > > > heartbeat_check(), that would also need to be SCHED_RR,
> > > > > or the heartbeats could arrive on time, but not checked
> > > > > until it was too late.
> > 
> > Thanks for the ideas. However, I doubt that making the OSD::tick()
> > thread SCHED_RR would really work.
> > 
> > The OSD::tick() code is taking locks all over the place. Since a bunch
> > of other threads besides the tick thread can be holding those locks,
> > this would soon result in priority inversion. Not to mention,
> > heartbeat_messenger has its own thread(s) which actually perform the
> > work of sending the heartbeat messages.
> 
> Yes, I think I understand.

We could set the priority for those threads as well, but I'm not sure that 
really addresses the problem: we may end up with a situation where cosd is 
responding to heartbeats but not doing useful work.  At some point you 
have to consider highly degraded service a failure.

Let's see if we can fix it without adjusting priorities first!

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux