<rcpilot2010@xxxxxxxxx> wrote: >>>> I have my implementation of socket APIs, >>>> >>>> I sock_unregister(AF_INET); & sock_register(&inet_family_ops), this replaces >>>> kernel resident socket related calls with my socket related calls. My code >>>> is loaded as kernel module. >>>> >>>> My question, is Linux kernel able to call its own socket call more >>>> efficiently (less overhead, fewer CPU cycles) than mine ? code is running on >>>> Intel x86_64 arch. >>>> >>>> Any pointer is appreciated. >>> >>> >>> IMHO, strictly from module vs core kernel code perspective, there is >>> no speed difference. >>> >>> The reason is, once the kernel is loaded, it is the kernel itself. So >>> it is not separated into isolated segment or something like that. Yes >>> module is loaded into vmalloc-ed memory area, but that's it. >> >> Is it possible that since my code of socket API is located in >> different part of memory, so that it needs to make a long jump >> (results in d-cache miss) ? >> >> here is statistics: from per stats tool: >> >> Linux code: >> 112,880,163,053 instructions:HG # 0.72 insns per cycle >> >> My code: >> 32,074,097,170 instructions:HG # 0.34 insns per cycle > > Hi... > > please Cc: to kernelnewbies as well. Sorry about this, I thought I did. > as for d-cache miss, I think that's not due to long jump, but > accessing something not cache aligned. You mean cache aligned or alignment of address to 4 or 8 bytes ? even data alignment have much effect of the performance on Intel ? > Or maybe you can think something that can be pref etched. That way, you > utilize the processor pipeline better, but use it wisely How about TLB misses, as I understand main kernel is stored in permanently mapped area while module is stored in dynamically mapped area _______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies