[PATCH] module: Fix performance regression on modules with large symbol tables

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Commit 554bdfe5acf3715e87c8d5e25a4f9a896ac9f014 (module: reduce string
table for loaded modules) introduced an optimization to shrink the size of
the resident string table.  Part of this involves calling bitmap_weight()
on the strmap bitmap once for each core symbol.  strmap contains one bit
for each byte of the module's strtab.

For kernel modules with a large number of symbols, the addition of the
bitmap_weight() operation to each iteration of the add_kallsyms() loop
resulted in a significant "insmod" performance regression from 2.6.31
to 2.6.32.  bitmap_weight() is expensive when the bitmap is large.

The proposed alternative optimizes the common case in this loop
(is_core_symbol() == true, and the symbol name is not a duplicate), while
penalizing the exceptional case of a duplicate symbol.

My test was run on a 600 MHz MIPS processor, using a kernel module with
15,000 "core" symbols and 10,000 symbols in .init.text.  .strtab takes up
250,227 bytes.

Original code: insmod takes 3.39 seconds
Patched code: insmod takes 0.07 seconds

Signed-off-by: Kevin Cernekee <cernekee@xxxxxxxxx>
---

Since the new code performs an exhaustive string compare search when it
encounters duplicate symbols inside a module (i.e. multiple symtab entries
referring to the same strtab index), I did some extra checking on my
Linux PC to see how common this is:

For modules other than nvidia, there were 35 duplicate symbols out of
9,956 total LKM symbols (0.4%).  This is with KALLSYMS and KALLSYMS_ALL
enabled.  Many were ".LCx" literal constants, and others were random
duplications of trace_kmalloc(), cache_put(), do_vfs_lock(), etc.
Probably caused by combining multiple *.o files into a single *.ko file.

The nvidia module has 29,296 total entries, and 3,045 duplicates (10%).
There were 597 instances of each of: _nv009058rm, _nv009059rm,
_nv009060rm, and _nv009061rm.

To make sure the degenerate case of nvidia.ko was still covered, I ran
additional tests with qemu-system-arm (ARM Versatile) on Linus' head of
tree:

Latest kernel (commit 15831714), 25,000 symbol test (as above): 4.5s

Latest kernel with 2,400 (16%) of my module's core symbols turned into
duplicates through hex editing: 4.4s

Patched kernel, 25,000 symbol test: 0.1s

Patched kernel, with 2,400 duplicate symbols: 0.8s

So, even a module with large numbers of duplicate symbols loads more
quickly with my patch, than without it.


 kernel/module.c |   26 ++++++++++++++++++--------
 1 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 93342d9..7f5dcbf 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2221,7 +2221,7 @@ static void layout_symtab(struct module *mod, struct load_info *info)
 
 static void add_kallsyms(struct module *mod, const struct load_info *info)
 {
-	unsigned int i, ndst;
+	unsigned int i, j, stridx = 1, ndst;
 	const Elf_Sym *src;
 	Elf_Sym *dst;
 	char *s;
@@ -2237,22 +2237,32 @@ static void add_kallsyms(struct module *mod, const struct load_info *info)
 		mod->symtab[i].st_info = elf_type(&mod->symtab[i], info);
 
 	mod->core_symtab = dst = mod->module_core + info->symoffs;
+	mod->core_strtab = s = mod->module_core + info->stroffs;
 	src = mod->symtab;
 	*dst = *src;
+	*s++ = 0;
 	for (ndst = i = 1; i < mod->num_symtab; ++i, ++src) {
 		if (!is_core_symbol(src, info->sechdrs, info->hdr->e_shnum))
 			continue;
 		dst[ndst] = *src;
-		dst[ndst].st_name = bitmap_weight(info->strmap,
-						  dst[ndst].st_name);
+		if (unlikely(!test_bit(src->st_name, info->strmap))) {
+			dst[ndst].st_name = 0;
+			for (j = 1; j < ndst; j++)
+				if (!strcmp(&mod->strtab[src->st_name],
+					    &mod->core_strtab[dst[j].st_name]))
+					dst[ndst].st_name = dst[j].st_name;
+		} else {
+			dst[ndst].st_name = stridx;
+			j = src->st_name;
+			clear_bit(j, info->strmap);
+			do {
+				*s = mod->strtab[j++];
+				stridx++;
+			} while (*s++);
+		}
 		++ndst;
 	}
 	mod->core_num_syms = ndst;
-
-	mod->core_strtab = s = mod->module_core + info->stroffs;
-	for (*s = 0, i = 1; i < info->sechdrs[info->index.str].sh_size; ++i)
-		if (test_bit(i, info->strmap))
-			*++s = mod->strtab[i];
 }
 #else
 static inline void layout_symtab(struct module *mod, struct load_info *info)
-- 
1.7.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux&nblp;USB Development]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite Secrets]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux