Re: cosd multi-second stalls cause "wrongly marked me down"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2011-02-18 at 12:07 -0700, Colin McCabe wrote:
> > Anybody else have ideas?  :/
> >
> 
> At the risk of asking a dumb question, does syslog have messages from
> ntpd about date/time adjustments?

No, just a few messages like this:

Feb 18 08:34:43 an12 ntpd[4939]: kernel time sync enabled 0001
Feb 18 09:08:53 an12 ntpd[4939]: kernel time sync enabled 4001
Feb 18 10:34:16 an12 ntpd[4939]: kernel time sync enabled 0001
Feb 18 11:25:30 an12 ntpd[4939]: kernel time sync enabled 4001
Feb 18 11:42:36 an12 ntpd[4939]: kernel time sync enabled 0001
Feb 18 12:33:52 an12 ntpd[4939]: kernel time sync enabled 4001
Feb 18 13:08:00 an12 ntpd[4939]: kernel time sync enabled 0001

> 
> Also, what is the load on the system like? dout will write to a
> logfile using write(2), which of course is a blocking system call. 28
> seconds is a long time for a write to block, but things can get weird
> under heavy I/O.

During my tests my systems have high iowait  (40-80%), but CPU
utilization is relatively low.  However, my ceph logs are going 
to a dedicated disk via a dedicated I/O controller; i.e. my 
ceph data disks are FC and my log disk is a local SATA disk.

-- Jim

> 
> A more general problem is priority inversion through the dout lock.
> For example, some other piece of code could be doing something like this
> 
> dout << my_happy_printer_function() << dendl
> 
> If my_happy_printer_function takes some heavily contended locks, this
> will turn into
> 1. lock dout lock
> 2. lock heavily contended locks
> 3. write(2) to logfile
> 4. unlock ""
> 5. unlock dout lock
> 
> This is one reason why I want to get rid of the dout lock (although
> not the only one.)
> 
> Colin
> 
> 
> On Thu, Feb 17, 2011 at 11:13 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > On Thu, 17 Feb 2011, Jim Schutt wrote:
> >> Check out the result:
> >>
> >> osd.68.log:256027:2011-02-17 15:44:50.141378 7fd42ad57940 osd68 5 tick
> >> osd.68.log:256028:2011-02-17 15:44:50.141464 7fd42ad57940 osd68 5 tick getting read lock on map_lock
> >> osd.68.log:256029:2011-02-17 15:44:50.141472 7fd42ad57940 osd68 5 tick got read lock on map_lock
> >> osd.68.log:256031:2011-02-17 15:44:50.141612 7fd42ad57940 osd68 5 tick sending mon report
> >> osd.68.log:256032:2011-02-17 15:44:50.141619 7fd42ad57940 osd68 5 tick removing stray pgs
> >> osd.68.log:256033:2011-02-17 15:44:50.141626 7fd42ad57940 osd68 5 tick sending log to logclient
> >> osd.68.log:256034:2011-02-17 15:44:50.141633 7fd42ad57940 osd68 5 tick arming timer for next tick     <==
> >> osd.68.log:256277:2011-02-17 15:45:18.481656 7fd42ad57940 osd68 5 tick checking dispatch queue status <== 28 second gap
> >> osd.68.log:256278:2011-02-17 15:45:18.481669 7fd42ad57940 osd68 5 tick done
> >> osd.68.log:256279:2011-02-17 15:45:18.481691 7fd42ad57940 osd68 5 tick
> 


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux