Re: [PATCH v1 0/4] Context domains

David Miller <davem@xxxxxxxxxxxxx> · Fri, 21 Jul 2017 04:06:55 +0100 (WEST)

From: David Miller <davem@xxxxxxxxxxxxx>
Date: Fri, 21 Jul 2017 03:50:05 +0100 (WEST)

> Having to allocate a full trap frame just to TLB flush one page or an
> MM is a serious regression.
> 
> Next, allocating a whole new data structure and clearing it out on
> every new address creation is going to be a significant new cost as
> well.

So, just thinking out loud:

1) You can retain the cross call TLB flush assembler by passing in the
   appropriate context value for each individual cpu from the cross
   call dispatcher.

2) If you have some constant bounds on the upper number of context
   domains, you can simply inline them into the existing mmu_context
   structure.  This avoids the memory allocation per mm creation.

You can also make the context domain salting extremely cheap.
Perhaps something like "(cpuid>>x) & y".

No, you won't map cores to context domains so precisely like the
code does now, but you will make up for it in code simplicity and
overall new costs added by these changes for the more common
cases.

I suggest "(cpuid>>x) & y" and a very small number of context domains
(which determines 'y') because we don't need something perfect, we
need something which divides the problem by some order of magnitude.

The hash of locks caught my eye as well.  I think you don't need that
and we really steer clear of hashed spinlock tables in the Linux
kernel because they never scale properly.

Instead, I think you can use something like RCU to provide the
necessary synchronization.  So you could first make sure X isn't
referenced on the local cpu any more, and then do call_rcu() to do the
actual clearing of the bitmap which allows X to be allocated again.

Just some ideas...
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html