On 02/24/2014 11:42 AM, David Laight wrote: ...
I'm sure it shouldn't be that expensive, you are implying that it spent about 70% of the time doing crc32.
In this scenario, the following perf log I get that shows where cycles are being spent on my machine: 65.95% netperf [kernel.kallsyms] [k] __crc32c_le 3.79% netperf [kernel.kallsyms] [k] memcpy 2.38% netperf [kernel.kallsyms] [k] copy_user_enhanced_fast_string 0.62% netperf [sctp] [k] sctp_datamsg_from_user 0.62% netperf [sctp] [k] sctp_sendmsg 0.55% netperf [kernel.kallsyms] [k] __slab_free 0.52% netperf [sctp] [k] sctp_outq_flush 0.50% netperf [kernel.kallsyms] [k] kfree 0.49% netperf [kernel.kallsyms] [k] cmpxchg_double_slab.isra.52 0.48% netperf [kernel.kallsyms] [k] kmem_cache_alloc 0.43% netperf [kernel.kallsyms] [k] __slab_alloc 0.42% netperf [kernel.kallsyms] [k] __copy_skb_header 0.41% netperf [kernel.kallsyms] [k] __alloc_skb
The loop should be dominated by the per-byte lookup in a 256 word table. With 4k data the table will soon be in the data cache. Unless it is (stupidly) generating the table on each call, or trying to use a crc32 instruction, faulting, and emulating it, I wouldn't really have expected more than a few % improvement.
-- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html