Re: [PATCH 2/2] libceph: fix handle_timeout() racing with con_work()/try_write()

"Jim Schutt" <jaschut@xxxxxxxxxx> · Wed, 18 May 2011 08:34:12 -0600

Sage Weil wrote:
On Tue, 17 May 2011, Sage Weil wrote:
On Tue, 17 May 2011, Jim Schutt wrote:
Hi Sage,

Sage Weil wrote:
Hi Jim,
This picks up the second resent message.  But and OPENING is set.. so how is
it that we were in try_write but we didn't pass through the OPENING test in
con_work?

con_work()/try_write() start again on new message.
Note that con_work() did not notice that the connection was marked to
be closed/reopened!  This is where things went bad.
Note the connection state: 8220 = 0x201c, and 0x2000 is OPENING.
[ 2412.643178] libceph:   messenger.c:1969 : con_work ffff8801955c0030
OPENING
[ 2412.643182] libceph:   messenger.c:292  : con_close_socket on
ffff8801955c0030 sock ffff8801825a8c00
[ 2412.643197] libceph:   messenger.c:162  : ceph_state_change
ffff8801955c0030 state = 2076 sk_state = 4
Ok, now we see it, but this should have executed before we ever got to
try_write!

There are a handful of places that drop and re-take con->mutex... I think it
has to be one of those.  They are:

- prepare_connect_authorizer (before calling ->prepare_authorizer callback)
- fail_protocol (before calling ->bad_proto callback)
- process_connect RESETSESSION case (before calling ->reset callback)
- process_message (before calling ->dispatch callback)

I think after retaking the lock in each of these cases we need to test for
the CLOSED or OPENING bits and, if set, bail out so that con_work can
restart.

Does that make sense?  Can you confirm which case of lock droppage inside
con_work is at fault?  (Maybe adding a printk at the top of con_work after
taking the lock too would make it more clear what's going on; currently it
only prints in all the random test_bit cases).
So I've got logs now that show that connection state changes
across a call to process_message.  That seems like it proves
the concept.

I'm thinking something like the below... does that seem reasonable?  I only
did the prepare_authorizer case to start, but the others should be fixed up
too.  Compile tested only.

Is there any reason not to fix all the cases at once, in try_read?
It's less code, but far away from where the mutex dropping occurs,
so we add a comment:
That covers the process_message and process_connect cases, provided you 
also set ret properly, and do the con_close_socket work that the con_work 
checks would have done.  e.g.,

I pushed a patch to the msgr_race branch that catches all four cases (I 
think).  Does the fix make sense given what you saw?

Yes, I think so.

In every case I logged I saw that prepare_write_message was
getting called at the wrong time - I finally am understanding
that happened because reset_connection was able to run when
con->mutex was dropped, and it thus resets con->out_msg at
the wrong time.

I'll test your fix out today and let you know if I see
any more of this.

-- Jim

Thanks!
sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html