Re: Not matching event states in ./msg/async/AsyncConnection.cc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 22-6-2016 16:36, Haomai Wang wrote:
> Oh, sorry. I still realize you are testing on kqueue event backend.
> 
> I submit a pr to fix this. plz help to verify whether it works for you
> since I don't have bsd handy...
> 
> https://github.com/ceph/ceph/pull/9869

Yup,
Saw this flying by.
Got to finish the running 'make check', and then I'll put your stuff in
and let you know.

--WjW

> On Wed, Jun 22, 2016 at 9:47 PM, Willem Jan Withagen <wjw@xxxxxxxxxxx> wrote:
>> This time also to the list.
>>
>> On 20-6-2016 04:40, Haomai Wang wrote:
>>> On Mon, Jun 20, 2016 at 5:28 AM, Willem Jan Withagen <wjw@xxxxxxxxxxx> wrote:
>>>> Hi,
>>>>
>>>> I 've just rebased, but now I'm getting BUG warnings in del_event()
>>>> calls. And I suspect that it is because of the snippet below...
>>>>
>>>> In STATE_CONNECTING EVENT_READABLE is set on sd, but we try a bit latter
>>>> to remove EVENT_WRITABLE. Which abort since asserts were added to the
>>>> kevent code.
>>>>
>>>> Could it be that in STATE_CONNECTING_RE EVENT_READABLE needs to be removed?
>>>
>>> https://github.com/ceph/ceph/pull/9086/commits/a74ce419133881ff8618733a0501c4a47e1368e3
>>
>> 'mmmm,
>>
>> That piece of code is already in the code that I'm testing.
>> So that could be the source of problems. I don't know.
>> Not going to pretend I understand anything of the statmachine for
>> setting up connections.
>>
>>>> ./msg/async/AsyncConnection.cc at line 1014
>>>>     case STATE_CONNECTING_RE:
>>>>       {
>>>>         r = net.reconnect(get_peer_addr(), sd);
>>>>         if (r < 0) {
>>>>           ldout(async_msgr->cct, 1) << __func__ << " reconnect failed "
>>>> << dendl;
>>>>           goto fail;
>>>>         } else if (r > 0) {
>>>>           ldout(async_msgr->cct, 10) << __func__ << " nonblock connect
>>>> inprogress" << dendl;
>>>>           center->create_file_event(sd, EVENT_WRITABLE, read_handler);
>>>>           break;
>>>>         }
>>>>
>>>>       lderr(async_msgr->cct) << __func__ << ":" <<__LINE__
>>>>         << " delete_file_event(" << sd << ", EVENT_WRITABLE)"
>>>>         << dendl;
>>>>         center->delete_file_event(sd, EVENT_WRITABLE);
>>>>         state = STATE_CONNECTING_WAIT_BANNER;
>>>>         break;
>>>>       }
>>
>> If I put debug code in Event and {Epoll,Kqueue}Event, and the trace for
>> this file handle is:
>> 2016-06-22 12:54:59.183258 811653300 -1 Event(0x81164f448
>> owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
>> started fd=13 mask=1 original mask is 0
>> 2016-06-22 12:54:59.183279 811653300 -1 KqueueDriver.add_event add event
>> fd = 13 to kqfd = 10 cur_mask = 0 add_mask = 1
>> 2016-06-22 12:54:59.183286 811653300 -1 Event(0x81164f448
>> owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
>> end fd=13 mask=1 original mask is 1
>> 2016-06-22 12:54:59.183293 811653300 -1 Event(0x81164f448
>> owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event
>> started fd=13 mask=2 original mask is 1
>> 2016-06-22 12:54:59.183304 811653300 -1 KqueueDriver.del_event delete
>> event fd = 13 to kqfd = 10 cur_mask = 1 delmask = 2
>> 2016-06-22 12:54:59.183308 811653300 -1 KqueueDriver.del_event unable to
>> delete event: (2) No such file or directory. for handle: 13 kqfd handle: 10
>>
>> And then we hit assert(BUG)
>>
>> Looking at the e_poll output from Linux, it looks like:
>> 2016-06-22 15:28:15.237564 7ff58d7fa700 -1 Event(0x7ff5900b1038
>> owner=140692617668352 nevent=5000 time_id=1).create_file_event create
>> event started fd=13 mask=1 original mask is 0
>> 2016-06-22 15:28:15.237595 7ff58d7fa700 -1 Event(0x7ff5900b1038
>> owner=140692617668352 nevent=5000 time_id=1).create_file_event create
>> event end fd=13 mask=1 original mask is 1
>> 2016-06-22 15:28:15.237609 7ff58d7fa700 -1 Event(0x7ff5900b1038
>> owner=140692617668352 nevent=5000 time_id=1).delete_file_event delete
>> event started fd=13 mask=2 original mask is 1
>> 2016-06-22 15:28:15.237613 7ff58d7fa700 -1 EpollDriver.del_event delete
>> event fd = 13 to epfd = 10 cur_mask = 1 delmask = 2
>>
>> The access pattern is the same, but that does not generate an error.
>>
>> I think the error message actually means that there is no event in the
>> Keventqueue, that actually matches. And that is because EVENT_WRITABLE
>> is asked to be removed, but the event actually only has EVENT_READ
>> activated. So the EV_DELETE fails.
>>
>> I have not tried this trace on Linux to see what is the result there.
>> Perhaps Linux e_poll stuff does not really care if you want a
>> non-existent filter?
>> But the assert is rather detrimental on FreeBSD code :(
>>
>> But the other strange part for me is that the code tries to remove
>> events that are really not inserted....
>> Wether that is a problem in the current code? I do not know, but it
>> certainly strange.
>>
>> --WjW
>>
>>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux