Re: Questions about Accepter::stop()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 13 Aug 2016, Willem Jan Withagen wrote:
> On 13-8-2016 00:31, Willem Jan Withagen wrote:
> > On 12-8-2016 23:40, Willem Jan Withagen wrote:
> >> On 12-8-2016 22:58, Sage Weil wrote:
> >>> On Fri, 12 Aug 2016, Willem Jan Withagen wrote:
> >>>> Hi,
> >>>>
> >>>> Still working on finding out why my OSD is not comming back up.
> >>>> Looking at the OSD it seems to recover, but it is not reported back to
> >>>> the other OSD and mons.
> >>>>
> >>>> Below some of the code from
> >>>> 	./src/msg/simple/Accepter.cc
> >>>>
> >>>> Turns out that the thread freezes on the join, and the complicating
> >>>> factor is that shoutdown always reports that
> >>>>   accepter.stop shutdown failed:  errno 57 (57) Socket is not connected
> >>>>
> >>>> Then the code goes into the join, and gets stuck in there.
> >>>>
> >>>> So I've execluded that part of the code, and the close section.
> >>>>
> >>>> That seems to work, but I would very much some more opinions on this.
> >>>> Original code was doen by Sage, but John Spray added a bit of exclusion
> >>>> on the join()
> >>>>
> >>>> And even with this change I cannot complete
> >>>> 	cephtool-test-mon.sh
> >>>> But I'm getting a lot further down the test.
> >>>
> >>> This is the thread we need to wake up in Accepter::entry():
> >>>
> >>>     ldout(msgr->cct,20) << "accepter calling poll" << dendl;
> >>>     int r = poll(&pfd, 1, -1);
> >>>     if (r < 0)
> >>>       break;
> >>>     ldout(msgr->cct,20) << "accepter poll got " << r << dendl;
> >>>
> >>>     if (pfd.revents & (POLLERR | POLLNVAL | POLLHUP))
> >>>       break;
> >>>
> >>>     ldout(msgr->cct,10) << "pfd.revents=" << pfd.revents << dendl;
> >>>     if (done) break;
> >>>
> >>> It shutdown(2) isn't the "right" (portable) way to kick the thread blocked 
> >>> on poll(2) on an accept socket, maybe there is some other socket call that 
> >>> is more appropriate?  It just needs to wake up poll so that we either see 
> >>> an error event queued or done == true.
> >>
> >> Yup,  that is what I see in the Linux code.
> >> Poll returns with revent = 16 = POLLHUP.
> >>
> >> Now I'm sort of wondering what I can do with a socket that is already
> >> disconnected.... Somebody has to have disconnected the connection.
> >> And why the poll waiting on it does not report that....
> >> perhaps calling close on it does signal the HUP.
> >>
> >> SHUT_RDWR has a few comments (at least in the FreeBSD manpage) but they
> >> do not seem to fit this case.
> >>
> >> Any idea oh who would have disconnected this socket?
> >>
> >> Back to reading more manual pages. And trying to figure out the state
> >> machine of a socket. :(
> > 
> > Right,
> > 
> > If I start closing the socket, I'm getting revent = 32.
> > Which is POLLINVAL
> > 	Invalid request: fd not open (output only).
> > 
> > Available both on Linux and FreeBSD. On FreeBSD it is always to get that
> > in revents, even if not asked for. 8-;
> > So I guess adding that to the break expression is useful.
> > 
> > And tack the closing somewhere into the stop. And make sure that join()
> > is called.
> 
> Hi Sage,
> 
> Got a pull in #10720 that does work on my end...
> I left the close at the end in so other OSes get the change to close the
> socket when shutdown triggered the poll().

I think the reason we don't use close(2) in the normal case is that it is 
racy becaues the fd value may be reused by another thread immediately 
after we close it, and at close time we can't be certain that the other 
thread is already blocked in poll(2) (it might be about to call it).

say fd == 2.  we start shutdown from T1:
T1: close(fd)
T3: other = open(...)     (other = 2)   (totally unrelated thread, maybe doing file io)
T2: accept(fd)

It seems like there should be a better way?

sage


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux