Re: Further thoughts on uAPI

Doug Ledford <dledford@xxxxxxxxxx> · Tue, 26 Apr 2016 12:38:00 -0400

On 4/26/2016 10:58 AM, Jason Gunthorpe wrote:
> On Tue, Apr 26, 2016 at 10:19:37AM -0400, Doug Ledford wrote:
>> For certain operations that have lots of optional items (work requests
>> for one, work completions for another)
> 
> FWIW, I think we had a general consensus to take a different approach.
> 
> Basically, the 'common' uAPI does not care about micro-performance.
> 
> Drivers have to implement hardware-specific driver calls to micro-optimize
> their own high speed paths, and that would be done specifically with a
> single hardware in mind.
> 
> This is already done by the majority of drivers for wc/wr processing
> (IIRC, only qib calls to the kernel for this)
> 
> If we do provide a common wr/wc API then it can just be designed
> inefficiently around the netlink attribute architecture, uncaring
> about performance because nothing should use it. I'd prefer not to
> implement it at all...

We're talking about two different things.  I had the actual user space
API on my mind when I wrote what I wrote (aka, libibverbs).  If we are
going to talk about the verbs 2.0 kernel interface, then it makes sense
to me to keep the user space API firmly in mind too.  Although it would
be great if the user space verbs never changed a bit, that isn't
entirely possible.  The timestamp changes that are still waiting are an
example.  Currently, I'm not real happy with how the extension mechanism
in libibverbs has played out.  The intent was good, the reality is
clunky IMO.

> This same basic idea flows over to other parts, eg if a driver has
> special support for a specific work load (say fast creation of IB UD
> AHs) then it can have a high speed driver-specific call to do that
> work completely micro-optimized using data formed *exactly* the way
> the hardware needs.
> 
>> base struct plus the length of all optional structs, and the order of
>> the optional structs matches their bit order from lowest to highest in
>> the magic element.  It's not quite as free form as the patches for
>> timestamp support were, but still allows the structs some flexibility in
>> what is included and what isn't.
> 
> Mellanox has a patch series that tries to do exactly this for the wc
> in libibverbs - it is quite ugly, and the the benchmarks showed worse
> performance compared to the current technique.

Right, but it completely rewrote the struct from scratch for each WC.
That's different than what I posted, which was more along the lines of
functional groupings.  It reduces the number of conditionals while still
reducing the overall struct size for anything other than "we have every
option turned on" case.

There are only a few options for how to expose these things to user
space (using the example I gave as a further talking point):

1) Grow struct ibv_wc for every new option.  This totally prevents any
ability to either remove items or reorder items in this struct.  It also
bloats the size of the user visible struct for the common case.

2) Do away with direct variable access and go to indirect variable
access via accessor functions.  This will work, and has the advantage
that accessor functions can work directly with the hardware specific
structs, thereby eliminating a copy from the hardware struct to the user
visible API struct.  It has the disadvantage that every item you wish to
access will require an indirect function call from a table.

3) Do like the Mellanox patches did and completely rewrite the
completion struct for each completion.  Changing ordering and everything
else based on what's there.  As you pointed out, this had performance
issues.

4) What I wrote, which was intended to be a compromise between #1 and #3
to hopefully help with performance issues.

> For the reasons above I would prefer to stick entirely with the
> netlink attribute format or very similar as the main mechanism.

I don't intend to expose anything netlink to libibverbs users ;-)

Like I said, we're talking about different things.

Attachment:
signature.asc

Description: OpenPGP digital signature