From: Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> Date: Mon, 5 Jun 2017 00:00:07 -0400 > True, we could save one more load, by patching tick_get_tick() but > that would save us only 3-5 instructions because with the changes done > in this patchset the extra load comes from the same cacheline as the > other two variables: offset and quotient. So overall, we still have 3 > loads as before, but they are much faster compared to what we have > now, where every load is from a different cacheline. But if you take things a step further, you can hide the other load costs in the time it takes the %stick register read to complete. So, for example, if we subsequently patch also sched_clock() in assembler it becomes: load timer_ticks_per_nsec_quotient, reg1 load timer_ticks_offset, reg2 rd %stick, reg3 ... All 3 values will be in the cpu by the time the %stick read completes. That is the fastest possible implementation of sched_clock(). -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html