On Wed, May 7, 2014 at 5:52 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > [CCing Greg and ceph-devel] > > On Wed, 7 May 2014, Guang Yang wrote: >> Hi Sage, >> Sorry to bother you directly, I am debugging / fixing issue >> http://tracker.ceph.com/issues/8232, during which time I studied the >> messenger component of CEPH. With more understanding of the messenger >> component, I started getting confused by the fix of issue 6992 >> (http://tracker.ceph.com/issues/6992) in terms of how it could help to solve >> the problem (though I fully agree we should stop accepter first and then >> clear all PIPEs). >> >> Looking at the logs posted along with the issue 6992, the failure happened >> at pipe::writter::connect side (it should be a brand new connect instead of >> a re-connect as cs = 0, pgs = 0) after a rebind, and the failed one already >> has the updated local address, it is confused to me how the connection could >> be established? As there is connect_seq check at the remote side which is >> likely to fail for this connection attempt (which is a positive value), >> unless there is some race at remote side updating connet_seq and in_seq. >> >> Am I missing something obviously on this? > > Honestly I haven't looked closely at that old log; I would focus on the > new log. Likewise, I didn't compare the patch to the bug. The patch was definitely fixing a race with incoming connections, but it's possible it's mismatched with the backtrace or something. > > Looking at it now (for the first time, sorry), the last line is > > -2> 2014-05-04 06:16:07.957897 7f71063ee700 2 -- > 10.193.207.180:6884/1037605 >> 10.193.207.183:6958/5001307 pipe(0x1cfa1400 > sd=132 :60749 s=1 pgs=0 cs=0 l=0 c=0x12b4e160). got newly_acked_seq 10 vs > out_seq 0 > > If I'm reading it right, that's an outgoing connection with the same > source as the mark_down_all. If it existed before the mark_down_all, > something is really broken because mark_dwon_all should have set it to > STATE_CLOSED and the connect() function checks the state. Which makes me > think that it was initiated after. *smacks forehead* For some reason I've been thinking it was a broken incoming connection rather than an outgoing one. But of course it's not, so... > My guess is that this is a race in OSD.cc. We do the rebind() stuff, but > only a bit further down do consume_map() which publishes the map to the > OSDService with all kind of complicated handoff. I'm forgetting right now > how this is supposed to work, but my guess is that this is the heart of > the problem: some random PG thread is trying to send to an OSD using the > older map and grabs the older OSDMap ref and opens the connection > *after* we do the rebind() and mark_down_all(). > > Does this sound plausible? Well, I don't think it's supposed to happen — we hold a write on the map_lock during this time period and update the OSDSuperblock epochs before dropping it (and then doing consume_map()). But it's not pre-publishing and it might not even be that complicated a race, but just something wrong in the main thread. Sadly the msgr_crash.log file we have right now doesn't have enough info for me to identify the issue and code inspection doesn't sound like any fun, so a good log still sounds like the best way to identify the issue. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html