RE: [ceph-users] Help needed porting Ceph to RSockets

"Hefty, Sean" <sean.hefty@xxxxxxxxx> · Wed, 14 Aug 2013 17:04:10 +0000

> The first question I would have is: why is the rpoll() split into
> these two pieces? There must have been some reason to do a busy
> loop on some local state information rather than just call the
> real poll() directly.

As Scott mentioned in his email, this is done for performance reasons.  The cost of always dropping into the kernel is too high for HPC.

> I am looking at a multithreaded application here, and I believe that
> the race is between thread A calling the rpoll() for POLLIN event and
> thread B calling the shutdown(SHUT_RDWR) for reading and writing of
> the (r)socket almost immediately afterwards.

Ah - this is likely the issue.  I did not assume that rshutdown() would be called simultaneously with rpoll().  I need to think about how to solve this, so that rpoll() unblocks.

> I think that the shutdown itself does not cause a POLLHUP event to be
> generated from the kernel to interupt the real poll().
> (BTW: which kernel module implements the poll() for rsockets?
> Is that ib_uverbs.ko with ib_uverbs_poll_cq()?)

The POLLHUP event in rsockets is just software indicating that such an 'event' occurred - basically when a call to rpoll() detects that the rsocket state is disconnected.

I believe that the real poll() call traps into ib_uverbs_event_poll() in the kernel.  The fd associated with the poll call corresponds to a 'completion channel', which is used to report events which occur on a CQ.  Connection related events don't actually go to that fd - only completions for data transfers.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html