On Wed, May 07, 2008 at 10:36:44AM -0700, Brad Boyer wrote:
On Wed, May 07, 2008 at 01:53:26PM +0200, Richard Zidlicky wrote:
For the multi-CPU case I think it is difficult to avoid a special
per CPU page holding the current TLS pointer. Am I missing something?
I have to admit I didn't consider true SMP. I suppose it is best if we
do allow for that in the design even if the current kernel doesn't
really have the support to do SMP on m68k.
the coldfire variant has it, not sure if it is used anywhere.
This page could in principle be shared by all processes per CPU so it is
mostly irrelevant whether it can be swapped out. Either way it seems to
involve some interesting VM/MM magic.
I wonder if the TLS pointer is the only thing that would be variable between
processes?
I'm not familiar enough with the low level m68k mm code to be sure, but
could we lazily populate the mmu entry for these pages and only make
sure they are accurate after a reference?
this should work.
This relies on it being
very easy to invalidate mappings during a context switch while not
always putting back all the previous mappings for the process being
woken up.
On process switch it might be doable for free, lot more complicated and
obviously implicating some TLB flush on thread context switch.
However I can't even find where it could be hooked into the m68k code.
If you looked at how other architrectures do it you probably have
a better idea than me.
If that is the case, we could just check on the page fault
and update it if needed. We could keep track in the kernel of which
thread is really setup in the page. I know something similar is used
for FPU state saving on some architectures.
I think we have to do something during context switch just to prevent
the possibility of leaking information and allowing interference as
mentioned in another message in the thread.
interference could be prevented, the shared page I mentioned in my other
mail should be obviously writable only by trusted code. The leaking issue
is indeed worrying - we would leak PID, TLS pointer and stack bounds of the
last thread that caused an update to that page.
However it is not necessary to have the page shared, that was in the first
place only my idea to avoid problems with large number of unswappable pages.
On second thought I see better solutions.
So assume nothing is done on context switch, there is one tls_info struct
per process storing TLS pointer and stack bounds of last thread which updated
that info. A thread needing TLS lookup could easilly determine if that info
requires update merely checking sp against the values. Because nothing is
done on context switch the page can be swapped out at will.
This would keep cost of context switch zero at the cost of some additional
overhead for TLS lookups.
Unless I am missing something updating the TLS info should in principle be
doable reasonably fast in userspace, something like
tls_info=hash_lookup_tls(stack_pointer>>PAGE_SIZE)
The hash table entries could be populated at thread create time since at this
point all information such as stack extent of the new thread is available
without further kernel help.
If some thread would do really fancy things like allocating stacks on heap it
could be still handled reasonably cheaply. It would break horribly it the thread
would free its malloced stack and another thread would malloc the same area as
its stack.
To sumarise this would be zero cost per context switch, a few machine code
instructions per TLS lookup if the tls_info is curently valid and a hash table
lookup when it needs update.
Can a kernel based solution be cheaper?
Or should there be a hybrid solution to keep the zero cost context switch but
avoid the exotic stack malloc problem?
For the multi-CPU case the problem is more complicated - can't use the tls_info
without having a special per-CPU page (private or not..). In principle the
hash table lookup would still work without special pages though the speed hit
would be more significant as the full hast table lookup is now needed for
ever tls lookup.
Then again even with kernel support I do not see how to avoid a per-CPU special
page.
Quite a few possibilities and tradeoffs, I hope it gets some discussion?
For me it is rather difficult to judge the cost of the MM magic that would
have to be done on context switch.
Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html