[PATCH] (with benchmarks) binary patching of paravirt_ops call sites

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

	Sorry for the delay.  This implements binary patching of call sites for
interrupt-related paravirt ops, since no-doubt Andi wasn't the only one
to believe this approach is slow.

	The benchmarks were done on a UP 3GHz Pentium 4 with 512MB of RAM.
2.6.17-rc4 vs 2.6.17-rc4 with CONFIG_PARAVIRT=y vs 2.6.17-rc4
CONFIG_PARAVIRT=y with patch.  Summary: with binary patching, the
difference from CONFIG_PARAVIRT=n is in the noise (with the possible
exception of lmbench's exec).

Full results can be found in subdirs of
http://kernel.org/pub/linux/kernel/people/rusty/Paravirt/stats

The patch sites are 10-12 bytes long; we can shave more bytes off by
telling GCC we clobber regs.  The Xen SMP cli() code might actually need
more bytes than this, but I haven't measured it yet.

Kernel compile system time:
  normal 40.73
  paravirt 41.0633 [.818%]
  paravirt-patch 39.99 [-1.816%]
Kernel compile wall time:
  normal 84.1506
  paravirt 84.7539 [.716%]
  paravirt-patch 84.4994 [.414%]
Tbench:
  normal 73.473
  paravirt 73.7777 [.414%]
  paravirt-patch 73.8841 [.559%]
Dbench:
  normal 264.418
  paravirt 262.445 [-.746%]
  paravirt-patch 262.26 [-.816%]
Ubench:
  normal 2.95643
  paravirt 2.89111 [-2.209%]
  paravirt-patch 2.89052 [-2.229%]
Lmbench null syscall:
  normal 0.37
  paravirt 0.38 [2.702%]
  paravirt-patch 0.37 [0%]
Lmbench null I/O:
  normal 0.452
  paravirt 0.46 [1.769%]
  paravirt-patch 0.452 [0%]
Lmbench fork:
  normal 73.14
  paravirt 73.725 [.799%]
  paravirt-patch 72.86 [-.382%]
Lmbench exec:
  normal 348.8
  paravirt 365.75 [4.859%]
  paravirt-patch 361.4 [3.612%]
Lmbench sh:
  normal 6000
  paravirt 6072.25 [1.204%]
  paravirt-patch 6027.6 [.460%]
Lmbench 2p/0k context switch:
  normal 1.29
  paravirt 1.395 [8.139%]
  paravirt-patch 1.288 [-.155%]
Lmbench pipe latency:
  normal 5.026
  paravirt 5.20875 [3.636%]
  paravirt-patch 4.9856 [-.803%]
Lmbench UNIX socket latency:
  normal 7.86
  paravirt 8.5375 [8.619%]
  paravirt-patch 7.864 [.050%]
Lmbench UDP latency:
  normal 10.8
  paravirt 12.275 [13.657%]
  paravirt-patch 11.06 [2.407%]
Lmbench TCP latency:
  normal 13.28
  paravirt 14.45 [8.810%]
  paravirt-patch 13.34 [.451%]
Lmbench TCP connection latency:
  normal 45.48
  paravirt 51.1 [12.357%]
  paravirt-patch 45.52 [.087%]
Lmbench pipe bandwidth:
  normal 2522.2
  paravirt 2335.5 [-7.402%]
  paravirt-patch 2401 [-4.805%]
Lmbench UNIX socket bandwidth:
  normal 2935
  paravirt 2617 [-10.834%]
  paravirt-patch 2788.2 [-5.001%]
Lmbench TCP socket bandwidth:
  normal 627.8
  paravirt 670 [6.721%]
  paravirt-patch 635.2 [1.178%]

===
Name: Binary patch over critical fastpaths
Depends: Paravirt/function-call-abstraction.patch.gz
Status: Tested on 2.6.17-rc4
Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>

It turns out that the most called ops, by several orders of magnitude,
are the interrupt manipulation ops.  These are obvious candidates for
patching, so mark them up and create infrastructure for it.

The method used is that the ops structure has a patch function, which
is called for each place which needs to be patched: this returns a
number of instructions (the rest are NOP-padded).

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.6.17-rc4-function-call-abstraction/arch/i386/kernel/alternative.c working-2.6.17-rc4-bench/arch/i386/kernel/alternative.c
--- working-2.6.17-rc4-function-call-abstraction/arch/i386/kernel/alternative.c	2006-05-16 10:50:48.000000000 +1000
+++ working-2.6.17-rc4-bench/arch/i386/kernel/alternative.c	2006-05-23 13:10:26.000000000 +1000
@@ -3,6 +3,7 @@
 #include <linux/list.h>
 #include <asm/alternative.h>
 #include <asm/sections.h>
+#include <asm/paravirt.h>
 
 #define DEBUG 0
 #if DEBUG
@@ -283,6 +284,35 @@ void alternatives_smp_switch(int smp)
 	spin_unlock_irqrestore(&smp_alt, flags);
 }
 
+#ifdef CONFIG_PARAVIRT
+void apply_paravirt(struct paravirt_patch *start, struct paravirt_patch *end)
+{
+	unsigned char **noptable = find_nop_table();
+	struct paravirt_patch *p;
+	int diff, i, k;
+
+	for (p = start; p < end; p++) {
+		unsigned int used;
+		used = paravirt_ops.patch(p->instrtype, p->instr, p->len);
+		/* Pad the rest with nops */
+		diff = p->len - used;
+		for (i = used; diff > 0; diff -= k, i += k) {
+			k = diff;
+			if (k > ASM_NOP_MAX)
+				k = ASM_NOP_MAX;
+			memcpy(p->instr + i, noptable[k], k);
+		}
+	}
+}
+extern struct paravirt_patch __start_parainstructions[],
+	__stop_parainstructions[];
+#else
+void apply_paravirt(struct paravirt_patch *start, struct paravirt_patch *end)
+{
+}
+extern char __start_parainstructions[], __stop_parainstructions[];
+#endif	/* CONFIG_PARAVIRT */
+
 void __init alternative_instructions(void)
 {
 	apply_alternatives(__alt_instructions, __alt_instructions_end);
@@ -318,4 +354,6 @@ void __init alternative_instructions(voi
 					    _text, _etext);
 		alternatives_smp_switch(0);
 	}
+
+	apply_paravirt(__start_parainstructions, __stop_parainstructions);
 }
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.6.17-rc4-function-call-abstraction/arch/i386/kernel/module.c working-2.6.17-rc4-bench/arch/i386/kernel/module.c
--- working-2.6.17-rc4-function-call-abstraction/arch/i386/kernel/module.c	2006-05-16 10:50:48.000000000 +1000
+++ working-2.6.17-rc4-bench/arch/i386/kernel/module.c	2006-05-22 15:21:55.000000000 +1000
@@ -108,7 +108,8 @@ int module_finalize(const Elf_Ehdr *hdr,
 		    const Elf_Shdr *sechdrs,
 		    struct module *me)
 {
-	const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL;
+	const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
+		*para = NULL;
 	char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
 
 	for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) { 
@@ -118,6 +119,8 @@ int module_finalize(const Elf_Ehdr *hdr,
 			alt = s;
 		if (!strcmp(".smp_locks", secstrings + s->sh_name))
 			locks= s;
+		if (!strcmp(".parainstructions", secstrings + s->sh_name))
+			para = s;
 	}
 
 	if (alt) {
@@ -132,6 +135,10 @@ int module_finalize(const Elf_Ehdr *hdr,
 					    lseg, lseg + locks->sh_size,
 					    tseg, tseg + text->sh_size);
 	}
+	if (para) {
+		void *aseg = (void *)alt->sh_addr;
+		apply_paravirt(aseg, aseg + alt->sh_size);
+	}
 	return 0;
 }
 
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.6.17-rc4-function-call-abstraction/arch/i386/kernel/paravirt.c working-2.6.17-rc4-bench/arch/i386/kernel/paravirt.c
--- working-2.6.17-rc4-function-call-abstraction/arch/i386/kernel/paravirt.c	2006-05-23 14:01:46.000000000 +1000
+++ working-2.6.17-rc4-bench/arch/i386/kernel/paravirt.c	2006-05-22 15:21:55.000000000 +1000
@@ -336,8 +336,35 @@ static void nopara_set_iopl_mask(unsigne
 extern void nopara_iret(void);
 extern void nopara_irq_enable_sysexit(void);
 
+/* Simple instruction patching code. */
+static struct native_insns
+{
+	unsigned int len;
+	const char *insns;
+} native_insns[] = {
+	[PARAVIRT_IRQ_DISABLE] = { 1, "\xFA" /* cli */ },
+	[PARAVIRT_IRQ_ENABLE] = { 1, "\xFB" /* sti */ },
+	[PARAVIRT_RESTORE_FLAGS] = { 2, "\x50\x9D" /* push %eax; popf */ },
+	[PARAVIRT_SAVE_FLAGS] = { 2, "\x9c\x58" /* pushf; pop %eax */ },
+};
+
+static unsigned nopara_patch(unsigned int type, void *firstinsn, unsigned len)
+{
+	/* Don't touch it if we don't have a replacement */
+	if (type >= ARRAY_SIZE(native_insns) || !native_insns[type].insns)
+		return len;
+
+	/* Similarly if we can't fit replacement. */
+	if (len < native_insns[type].len)
+		return len;
+
+	memcpy(firstinsn, native_insns[type].insns, native_insns[type].len);
+	return native_insns[type].len;
+}
+
 struct paravirt_ops paravirt_ops = {
 	.kernel_rpl = 0,
+	.patch = nopara_patch,
 	.cpuid = nopara_cpuid,
 	.get_debugreg = nopara_get_debugreg,
 	.set_debugreg = nopara_set_debugreg,
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.6.17-rc4-function-call-abstraction/arch/i386/kernel/vmlinux.lds.S working-2.6.17-rc4-bench/arch/i386/kernel/vmlinux.lds.S
--- working-2.6.17-rc4-function-call-abstraction/arch/i386/kernel/vmlinux.lds.S	2006-05-16 10:50:48.000000000 +1000
+++ working-2.6.17-rc4-bench/arch/i386/kernel/vmlinux.lds.S	2006-05-23 12:55:11.000000000 +1000
@@ -128,6 +128,12 @@ SECTIONS
   .altinstr_replacement : AT(ADDR(.altinstr_replacement) - LOAD_OFFSET) {
 	*(.altinstr_replacement)
   }
+  . = ALIGN(4);
+  __start_parainstructions = .;
+  .parainstructions : AT(ADDR(.parainstructions) - LOAD_OFFSET) {
+	*(.parainstructions)
+  }
+  __stop_parainstructions = .;
   /* .exit.text is discard at runtime, not link time, to deal with references
      from .altinstructions and .eh_frame */
   .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) { *(.exit.text) }
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.6.17-rc4-function-call-abstraction/include/asm-i386/alternative.h working-2.6.17-rc4-bench/include/asm-i386/alternative.h
--- working-2.6.17-rc4-function-call-abstraction/include/asm-i386/alternative.h	2006-05-16 10:51:38.000000000 +1000
+++ working-2.6.17-rc4-bench/include/asm-i386/alternative.h	2006-05-22 15:21:55.000000000 +1000
@@ -20,7 +20,8 @@ extern void alternatives_smp_module_add(
 					void *text, void *text_end);
 extern void alternatives_smp_module_del(struct module *mod);
 extern void alternatives_smp_switch(int smp);
-
+struct paravirt_patch;
+void apply_paravirt(struct paravirt_patch *start, struct paravirt_patch *end);
 #endif
 
 /*
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.6.17-rc4-function-call-abstraction/include/asm-i386/paravirt.h working-2.6.17-rc4-bench/include/asm-i386/paravirt.h
--- working-2.6.17-rc4-function-call-abstraction/include/asm-i386/paravirt.h	2006-05-23 14:01:46.000000000 +1000
+++ working-2.6.17-rc4-bench/include/asm-i386/paravirt.h	2006-05-23 12:59:48.000000000 +1000
@@ -3,11 +3,18 @@
 /* Various instructions on x86 need to be replaced for
  * para-virtualization: those hooks are defined here. */
 #include <linux/config.h>
+#include <linux/stringify.h>
 
 #ifndef CONFIG_PARAVIRT
 #include <asm/no_paravirt.h>
 #else
 
+/* These are the most common ops, so we want to be able to patch callers. */
+#define PARAVIRT_IRQ_DISABLE 0
+#define PARAVIRT_IRQ_ENABLE 1
+#define PARAVIRT_RESTORE_FLAGS 2
+#define PARAVIRT_SAVE_FLAGS 3
+
 #ifndef __ASSEMBLY__
 struct thread_struct;
 struct Xgt_desc_struct;
@@ -15,6 +22,8 @@ struct paravirt_ops
 {
 	unsigned int kernel_rpl;
 
+	unsigned (*patch)(unsigned int type, void *firstinsn, unsigned len);
+
 	void (*cpuid)(unsigned int *eax, unsigned int *ebx,
 		      unsigned int *ecx, unsigned int *edx);
 	
@@ -108,10 +117,6 @@ static inline void sync_core(void)
 #define read_cr4_safe(x) paravirt_ops.read_cr4_safe()
 #define write_cr4(x) paravirt_ops.write_cr4(x)
 
-#define __local_save_flags() paravirt_ops.save_fl()
-#define __local_irq_restore(f) paravirt_ops.restore_fl(f)
-#define local_irq_disable() paravirt_ops.irq_disable()
-#define local_irq_enable() paravirt_ops.irq_enable()
 #define safe_halt() paravirt_ops.safe_halt()
 #define halt() paravirt_ops.halt()
 #define wbinvd() paravirt_ops.wbinvd()
@@ -181,11 +186,85 @@ static inline void sync_core(void)
 #define write_gdt_entry(dt, entry, a, b) (paravirt_ops.write_gdt_entry((dt), (entry), ((u64)a) << 32 | b))
 #define write_idt_entry(dt, entry, a, b) (paravirt_ops.write_idt_entry((dt), (entry), ((u64)a) << 32 | b))
 #define set_iopl_mask(mask) (paravirt_ops.set_iopl_mask(mask))
+
+/* These all sit in the .parainstructions section to tell us what to patch. */
+struct paravirt_patch {
+	u8 *instr; 		/* original instructions */
+	u8 instrtype;		/* type of this instruction */
+	u8 len;			/* length of original instruction */
+	u16 pad;
+};
+
+#define paravirt_alt(insn_string, typenum)	\
+	"771:\n\t" insn_string "\n" "772:\n"	\
+	".section .parainstructions,\"a\"\n"	\
+	"  .align 4\n"				\
+	"  .long 771b\n"			\
+	"  .byte " __stringify(typenum) "\n"	\
+	"  .byte 772b-771b\n"			\
+	".previous"
+
+static inline unsigned long __local_save_flags(void)
+{
+	register unsigned long f asm("eax");
+	/* Asm is to place value in eax. */
+	__asm__ __volatile__(paravirt_alt("pushl %%ecx; pushl %%edx\n\t"
+					  "call *%1\n\t"
+					  "popl %%edx; popl %%ecx",
+					  PARAVIRT_SAVE_FLAGS)
+			     : "=r"(f): "m"(paravirt_ops.save_fl) : "memory");
+	return f;
+}
+
+static inline void __local_irq_restore(unsigned long f)
+{
+	/* We guarantee that f is in eax on entry to asm. */
+	register unsigned long eax asm("eax") = f;
+	__asm__ __volatile__(paravirt_alt("pushl %%ecx; pushl %%edx\n\t"
+					  "pushl %1; call *%0\n\t"
+					  "popl %1; popl %%edx; popl %%ecx",
+					  PARAVIRT_RESTORE_FLAGS)
+			     : : "m" (paravirt_ops.restore_fl), "r"(eax)
+			     : "memory");
+}
+
+static inline void local_irq_disable(void)
+{
+	__asm__ __volatile__(paravirt_alt("pushl %%ecx; pushl %%edx\n\t"
+					  "pushl %%eax; call *%0\n\t"
+					  "popl %%eax; popl %%edx; popl %%ecx",
+					  PARAVIRT_IRQ_DISABLE)
+			     : : "m" (paravirt_ops.irq_disable) : "memory");
+}
+
+static inline void local_irq_enable(void)
+{
+	__asm__ __volatile__(paravirt_alt("pushl %%ecx; pushl %%edx\n\t"
+					  "pushl %%eax; call *%0\n\t"
+					  "popl %%eax; popl %%edx; popl %%ecx",
+					  PARAVIRT_IRQ_ENABLE)
+			     : : "m" (paravirt_ops.irq_enable) : "memory");
+}
+
+#define CLI_STRING	paravirt_alt("pushl %eax; pushl %ecx; pushl %edx; call *paravirt_ops+PARAVIRT_irq_disable; popl %edx; popl %ecx; popl %eax", PARAVIRT_IRQ_DISABLE)
+#define STI_STRING	paravirt_alt("pushl %eax; pushl %ecx; pushl %edx; call *paravirt_ops+PARAVIRT_irq_enable; popl %edx; popl %ecx; popl %eax", PARAVIRT_IRQ_ENABLE)
+
 #else  /* ... __ASSEMBLY__ */
 
+#define PARA_PATCH(ptype, ops)			\
+771:;						\
+	ops;					\
+772:;						\
+	.section .parainstructions,"a";		\
+	 .align 4;				\
+	 .long 771b;				\
+	 .byte ptype;				\
+	 .byte 772b-771b;			\
+	.previous
+
 #define IRET		jmp *paravirt_ops+PARAVIRT_iret
-#define CLI		pushl %eax; pushl %ecx; pushl %edx; call *paravirt_ops+PARAVIRT_irq_disable; popl %edx; popl %ecx; popl %eax
-#define STI		pushl %eax; pushl %ecx; pushl %edx; call *paravirt_ops+PARAVIRT_irq_enable; popl %edx; popl %ecx; popl %eax
+#define CLI		PARA_PATCH(PARAVIRT_IRQ_DISABLE,pushl %eax; pushl %ecx; pushl %edx; call *paravirt_ops+PARAVIRT_irq_disable; popl %edx; popl %ecx; popl %eax)
+#define STI		PARA_PATCH(PARAVIRT_IRQ_ENABLE,pushl %eax; pushl %ecx; pushl %edx; call *paravirt_ops+PARAVIRT_irq_enable; popl %edx; popl %ecx; popl %eax)
 #define STI_SYSEXIT	jmp *paravirt_ops+PARAVIRT_irq_enable_sysexit
 #define GET_CR0		pushl %ecx; pushl %edx; call *paravirt_ops+PARAVIRT_read_cr0; popl %edx; popl %ecx
 
@@ -204,9 +283,6 @@ static inline void sync_core(void)
 	popl  %eax
 #endif	/* __ASSEMBLY__ */
 
-#define CLI_STRING	"pushl %eax; pushl %ecx; pushl %edx; call *paravirt_ops+PARAVIRT_irq_disable; popl %edx; popl %ecx; popl %eax"
-#define STI_STRING	"pushl %eax; pushl %ecx; pushl %edx; call *paravirt_ops+PARAVIRT_irq_enable; popl %edx; popl %ecx; popl %eax"
-
 #endif	/* PARAVIRT */
 
 #endif	/* __ASM_PARAVIRT_H */
-- 
 ccontrol: http://ccontrol.ozlabs.org


[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux