Re: Questions about Accepter::stop()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13-8-2016 00:31, Willem Jan Withagen wrote:
> On 12-8-2016 23:40, Willem Jan Withagen wrote:
>> On 12-8-2016 22:58, Sage Weil wrote:
>>> On Fri, 12 Aug 2016, Willem Jan Withagen wrote:
>>>> Hi,
>>>>
>>>> Still working on finding out why my OSD is not comming back up.
>>>> Looking at the OSD it seems to recover, but it is not reported back to
>>>> the other OSD and mons.
>>>>
>>>> Below some of the code from
>>>> 	./src/msg/simple/Accepter.cc
>>>>
>>>> Turns out that the thread freezes on the join, and the complicating
>>>> factor is that shoutdown always reports that
>>>>   accepter.stop shutdown failed:  errno 57 (57) Socket is not connected
>>>>
>>>> Then the code goes into the join, and gets stuck in there.
>>>>
>>>> So I've execluded that part of the code, and the close section.
>>>>
>>>> That seems to work, but I would very much some more opinions on this.
>>>> Original code was doen by Sage, but John Spray added a bit of exclusion
>>>> on the join()
>>>>
>>>> And even with this change I cannot complete
>>>> 	cephtool-test-mon.sh
>>>> But I'm getting a lot further down the test.
>>>
>>> This is the thread we need to wake up in Accepter::entry():
>>>
>>>     ldout(msgr->cct,20) << "accepter calling poll" << dendl;
>>>     int r = poll(&pfd, 1, -1);
>>>     if (r < 0)
>>>       break;
>>>     ldout(msgr->cct,20) << "accepter poll got " << r << dendl;
>>>
>>>     if (pfd.revents & (POLLERR | POLLNVAL | POLLHUP))
>>>       break;
>>>
>>>     ldout(msgr->cct,10) << "pfd.revents=" << pfd.revents << dendl;
>>>     if (done) break;
>>>
>>> It shutdown(2) isn't the "right" (portable) way to kick the thread blocked 
>>> on poll(2) on an accept socket, maybe there is some other socket call that 
>>> is more appropriate?  It just needs to wake up poll so that we either see 
>>> an error event queued or done == true.
>>
>> Yup,  that is what I see in the Linux code.
>> Poll returns with revent = 16 = POLLHUP.
>>
>> Now I'm sort of wondering what I can do with a socket that is already
>> disconnected.... Somebody has to have disconnected the connection.
>> And why the poll waiting on it does not report that....
>> perhaps calling close on it does signal the HUP.
>>
>> SHUT_RDWR has a few comments (at least in the FreeBSD manpage) but they
>> do not seem to fit this case.
>>
>> Any idea oh who would have disconnected this socket?
>>
>> Back to reading more manual pages. And trying to figure out the state
>> machine of a socket. :(
> 
> Right,
> 
> If I start closing the socket, I'm getting revent = 32.
> Which is POLLINVAL
> 	Invalid request: fd not open (output only).
> 
> Available both on Linux and FreeBSD. On FreeBSD it is always to get that
> in revents, even if not asked for. 8-;
> So I guess adding that to the break expression is useful.
> 
> And tack the closing somewhere into the stop. And make sure that join()
> is called.

Hi Sage,

Got a pull in #10720 that does work on my end...
I left the close at the end in so other OSes get the change to close the
socket when shutdown triggered the poll().

--WjW


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux