Re: [RFC][PATCH] ipc: Remove IPCMNI

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Thu, 29 Mar 2018 12:32:54 -0700

On Thu, Mar 29, 2018 at 08:07:44PM +0200, Manfred Spraul wrote:
> Hello Mathew,
> 
> On 03/29/2018 12:56 PM, Matthew Wilcox wrote:
> > On Thu, Mar 29, 2018 at 10:47:45AM +0200, Manfred Spraul wrote:
> > > > > > > > This can be implemented trivially with the current code
> > > > > > > > using idr_alloc_cyclic.
> > > Is there a performance impact?
> > > Right now, the idr tree is only large if there are lots of objects.
> > > What happens if we have only 1 object, with id=INT_MAX-1?
> > The radix tree uses a branching factor of 64 entries (6 bits) per level.
> > The maximum ID is 31 bits (positive signed 32-bit integer).  So the
> > worst case for a single object is 6 pointer dereferences to find the
> > object anywhere in the range (INT_MAX/2 - INT_MAX].  That will read 12
> > cachelines.  If we were to constrain ourselves to a maximum of INT_MAX/2
> > (30 bits), we'd reduce that to 5 pointer dereferences and 10 cachelines.
> I'm concerned about the up to 6 branches.
> But this is just guessing, we need a test with a realistic workload.

Yes, and once there's a realistic workload, I'll be happy to prioritise
adapting the data structure to reduce the pointer chases.

FWIW, the plan is this:

There's currently an unused 32-bit field (on 64-bit machines) which I plan
to make the 'base' field.  So at each step down the tree, one subtracts
that field from the index in order to decide which slot to look at next.
Something like this:

index=0x3000'0000
(root) -> order=24
          offset=48 -> order=18
                       offset=0 -> order=12
                                   offset=0 -> order=6
                                               offset=0 -> order=0
                                                           offset=0 -> data

compresses to a single node:
(root) -> order=0
          base=3000'0000
          offset=0 -> data

If one has one entry at 5 and another entry at 0x3000'0000, the tree
looks like this (three nodes):

(root) -> order=24
          base=0
          offset=0  -> order=0
                       base=0
                       offset=5 -> entry1
          offset=48 -> order=0
                       base=0
                       offset=0 -> entry2

The trick is making sure that looking up offset 0x300'1000 returns NULL
and not entry2, but I can make that work.

An alternative is to go to something a little more libJudy and have
mutating internal nodes in the tree that can represent this kind of
situation in a more compact form.  There's a tradeoff to be made between
simplicity of implementation, cost of insertion, cost of lookup and
memory consumption.  I don't know where the right balance point is yet.