On 10/06/2014 11:05 AM, David Daney wrote:
On 10/03/2014 08:17 PM, Leonid Yegoshin wrote:
Historically, during FPU emulation MIPS runs live BD-slot instruction
in stack.
This is needed because it was the only way to correctly handle branch
exceptions with unknown COP2 instructions in BD-slot. Now there is
an eXecuteInhibit feature and it is desirable to protect stack from
execution
for security reasons.
This patch moves FPU emulation from stack area to VDSO-located page
which is set
write-protected for application access. VDSO page itself is now
per-thread and
it's addresses and offsets are stored in thread_info.
Small stack of emulation blocks is supported because nested traps are
possible
in MIPS32/64 R6 emulation mix with FPU emulation.
Can you explain how this per-thread mapping works.
I am especially interested in what happens when a different thread
from the thread using the special mapping, issues flush_tlb_mm(), and
invalidates the TLBs on all CPUs. How does the TLB entry for the
special mapping survive this?
This patch works as long as 'install_special_mapping()' doesn't change
PTE itself but installs Page Fault handler. It is the only hidden
dependency from common Linux code.
MIPS code allocates a page (copy of a standard 'VDSO' page) and links it
to thread_info and handles all allocation/deallocation/thread creation
via arch hooks. It does it only for thread which have a memory map, not
for kernel threads. Oh, it does all stuff only if CPU has RI/XI
capability - the HW execute inhibit feature, otherwise it works as is
done today.
It still does attachment of a standard 'VDSO' page to memory map for
accounting purpose, so /proc/.../maps shows [VDSO] page. However the new
(per-thread) page is actually a shadow.
Then TLB refill happens it loads an empty PTE and subsequent TLBL (TLB
load Page Fault) comes to MIPS C-code which recognizes 'VDSO' address
and asks install_vdso_tlb() to fill TLB directly and marks ASID of it in
memory map for this CPU.
At process (read - thread) reschedule there is a check that on this CPU
some previous thread of the same memory map loads TLB via comparing
ASIDs. If that happend and ASIDs are the same, then local_flush_tlb_page
is called to eliminate this TLB because it has the same ASID but can
have a different per-thread page.
Because PTE stays as 0x00..00 and never changes then this activity
starts again after eviction of TLB due to some reason - either
flush_tlb_mm(), either other flush or either eviction due to TLB array
HW or SW replacements, but only if page is demanded again.
Now, the emulation part: some stack of emulation blocks can be used
from top of page. Each time during emulation of FPU instruction from
BD-slot it takes a kernel VA of page and puts that into stack but
changes a thread EPC to user VA of that block. It uses a cache flush via
different addresses here (D-cache via kernel VA and I-cache via user VA)
in case of cache aliasing and new functions is needed to avoid a huge
performance loss from flush_cache_page(). It uses a regular
flush_cache_sigtramp() in absence of cache aliasing because in some
systems it can be much faster (via SYNCI).
Stack of emulation blocks is needed because I work on MIPS32/64 R6
architecture kernel and there is a need for emulation of some removed
MIPS R2 instructions. And a reentry of emulation may happens in some
rare cases - FPU emulation and MIPS R2 emulation subsystems are
different pieces.
Note: After Peter Zijlstra note about performance I am thinking about
adding the check of situation then the same single thread is rescheduled
again on the same CPU and don't flush TLB in this case. It just requires
yet another array of process-ids or 'VDSO' pages - one element per CPU
and I am weighting it against schedule time interval. Today array is max
8 elements for MIPS but it can change in future. There is also a
possibility to write a special TLB flush function which compares TLB
element address with page address and skips TLB element eviction if
address compares.
- Leonid.