On 2018-10-16 1:34 AM, Helge Deller wrote:
On 15.10.2018 23:11, James Bottomley wrote:
On Sun, 2018-10-14 at 20:34 +0200, Helge Deller wrote:
This patch adds the necessary code to patch a running SMP kernel
at runtime to improve performance when running on a single CPU.
The current implementation offers two patching variants:
- Unwanted assembler statements like locking functions are
overwritten
with NOPs. When multiple instructions shall be skipped, one branch
instruction is used instead of multiple nop instructions.
This seems like a good idea because our spinlocks are particularly
heavyweight.
- Some pdtlb and pitlb instructions are patched to become pdtlb,l and
pitlb,l which only flushes the CPU-local tlb entries instead of
broadcasting the flush to other CPUs in the system and thus may
improve performance.
I really don't think this matters: on a UP system, ptdlb,l and pdtlb
are the same instruction because the CPU already knows is has no
internal CPU bus to broadcast the purge over so it in effect executes a
pdtlb,l regardless.
I'd be happy to drop this part again.
But is that true on a SMP system, where one has booted with maxcpus=1, too?
I would like to see what happens on panama. Panama is a rp3410.
Currently, it takes
approximately 4042 cycles to flush one page (4096 bytes). This is way
more than the number
of cycles that I see on my rp3440. My c3750 takes 450 cycles per page
with patch. It could
be ptdlb,l and pdtlb are equivalent on c3750.
Is there something wrong with SMP on panama?
Oct 4 02:27:56 panama kernel: [ 0.061736] smp: Bringing up secondary
CPUs ...
Oct 4 02:27:56 panama kernel: [ 0.061897] smp: Brought up 3 nodes, 1 CPU
I know replacing "sync and normal store" with ordered store in spin lock
release makes a
significant difference in the above timing. Plan to send patch tonight.
Dave
--
John David Anglin dave.anglin@xxxxxxxx