Re: cosd multi-second stalls cause "wrongly marked me down"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2011-03-02 at 22:03 -0700, Sage Weil wrote:
> > I'm not sure how to track down what's happening here...
> 
> Hmm.  I'm not able to reproduce this here (tho I only have ~15 nodes 
> available at the moment).  Seeing the last bit of the logs on the crashed 
> nodes will help.
> 

So this might be interesting.  In my last email, osd.15.log ended with

2011-03-03 08:35:29.933436 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).do_sendmail short write did 195207, still have 91335


It occurred to me you might like to know what thread
7fb3d545c940 was doing when it got that short write:

# grep 7fb3d545c940 osd.15.log | tail
2011-03-03 08:32:33.108190 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer encoding 45 0x7fb3c4ad6970 pg_stats(1228 pgs v 6) v1
2011-03-03 08:32:33.114972 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer sending 45 0x7fb3c4ad6970
2011-03-03 08:32:33.115001 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).write_message 0x7fb3c4ad6970
2011-03-03 08:34:01.154979 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer: state = 2 policy.server=0
2011-03-03 08:34:01.154991 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).write_keepalive
2011-03-03 08:34:01.155010 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).write_ack 29
2011-03-03 08:34:01.155041 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer encoding 46 0x7fb3c4b9fd90 pg_stats(1228 pgs v 6) v1
2011-03-03 08:34:01.163035 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).writer sending 46 0x7fb3c4b9fd90
2011-03-03 08:34:01.163069 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).write_message 0x7fb3c4b9fd90
2011-03-03 08:35:29.933436 7fb3d545c940 -- 172.17.40.22:6821/27793 >> 172.17.40.34:6789/0 pipe(0x7fb3c4001270 sd=12 pgs=2580 cs=1 l=1).do_sendmail short write did 195207, still have 91335

I assume this means the short write happened on sending
pg_stats? 172.17.40.34 is where my monitor is running.

-- Jim



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux