Re: [ceph-users] Help needed porting Ceph to RSockets

"Atchley, Scott" <atchleyes@xxxxxxxx> · Wed, 14 Aug 2013 09:05:11 -0400

On Aug 14, 2013, at 3:21 AM, Andreas Bluemle <andreas.bluemle@xxxxxxxxxxx> wrote:

> Hi,
> 
> maybe some information about the environment I am
> working in:
> 
> - CentOS 6.4 with custom kernel 3.8.13
> - librdmacm / librspreload from git, tag 1.0.17
> - application started with librspreload in LD_PRELOAD environment
> 
> Currently, I have increased the value of the spin time by setting the
> default value for polling_time in the source code.
> 
> I guess that the correct way to do this is via
> configuration in /etc/rdma/rsocket/polling_time?
> 
> Concerning the rpoll() itself, some more comments/questions
> embedded below.
> 
> On Tue, 13 Aug 2013 21:44:42 +0000
> "Hefty, Sean" <sean.hefty@xxxxxxxxx> wrote:
> 
>>>> I found a workaround for my (our) problem: in the librdmacm
>>>> code, rsocket.c, there is a global constant polling_time, which
>>>> is set to 10 microseconds at the moment.
>>>> 
>>>> I raise this to 10000 - and all of a sudden things work nicely.
>>> 
>>> I am adding the linux-rdma list to CC so Sean might see this.
>>> 
>>> If I understand what you are describing, the caller to rpoll()
>>> spins for up to 10 ms (10,000 us) before calling the real poll().
>>> 
>>> What is the purpose of the real poll() call? Is it simply a means
>>> to block the caller and avoid spinning? Or does it actually expect
>>> to detect an event?
>> 
>> When the real poll() is called, an event is expected on an fd
>> associated with the CQ's completion channel. 
> 
> The first question I would have is: why is the rpoll() split into
> these two pieces? There must have been some reason to do a busy
> loop on some local state information rather than just call the
> real poll() directly.

Sean can answer specifically, but this is a typical HPC technique. The worst thing you can do is handle an event and then block when the next event is available. This adds 1-3 us to latency which is unacceptable in HPC. In HPC, we poll. If we worry about power, we poll until we get no more events and then we poll a little more before blocking. Determining the "little more" is the fun part. ;-) 

Scott--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html