Re: how to calculate time required for the execution of the entry point of the driver ?

"Mulyadi Santosa" <mulyadi.santosa@xxxxxxxxx> · Sun, 30 Nov 2008 13:43:23 +0700

Hi All

On Sat, Nov 29, 2008 at 4:12 AM, Michael Blizek
<michi1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> Hi!
>
> On 14:40 Fri 28 Nov     , yogeshwar sonawane wrote:
>> Hi all,
>>
>> 1) I have a driver code providing ioctl,read, write, mmap etc. entry points.
>> Now, i want to find the execution time of each entry point for comparison.
>>
>> How this can be done ?
>> for ex.
>>
>> take time_stamp1
>> call entry point
>> take time_stamp2
>>
>> The above code from a user application can measure the execution time
>> ?  Timing functions in user space are close to the real time ?
>>
>>
>> 2) Which timing functions should be used to get the, close to correct
>> time, in user & kernel space ?
>> Mainly the purpose is to calculate timing interval.
>
>
> I would call the syscalls multiple times. This way, there is no requirement for
> a ultra high resolution clock. gettimeofday() will probably be enough. But
> there are some other pitfalls:
>
> - The time the ioctl, ... takes depends on whether the required things are
> already cached in the CPU cache or not. If you want to measure how long it
> takes with a cold cache, you can try to trash it between runs. But then you
> cannot easily call gettimeofday() before and after a loop of the ioctl(). You
> can call gettimeofday(), ioctl(), gettimeofday(), trash the cpu cache, then
> do it again. But
> - gettimeofday() itself takes some time. This means even if the resolution is
> high enough or if you average of a lot samples, this time always adds to the
> result. If you trash the CPU cache, you may also alter the time gettimeofday()
> takes. You may want to offset this by calling a gettimeofday() and ignoging the
> result to get the data into the cache. You can also try to offset the
> gettimeofday() times by measuring the time gettimeofday() takes and
> substracting this of the result.
> - Do not run any other process to make sure that the benchmark process does get
> the cpu.
> - Something else may come up.

Michael got some points. And since I believe "timing" is very broad
subject to cover, I'd like to share my thougths too. I've done few
timing for personal benchmarking and here's my conclusion:

1. run your timing test in run level 1. This way, you won't be hogged
down with daemon running in the background which could interfere.

2. I suggest to use TSC (TimeStamp Clock) if you use x86 (32 and 64
bit). I don't say gettimeofday() won't meet your need, but I think
reading TSC register is the fastest code path you can get to measure
timing.

3. beware of preemption. Your timing could be preempted anywhere. I'd
suggest to disable full kernel level preemption and enable voluntary
preemption instead. Or better, disable kernel level preemption at all.
This way, kernel code path goes uninterrupted by another non interrupt
handler code path.

4. HZ could mean bring hassles too. Try to use lowest HZ as possible.
This fit nicely if you use TSC, since gettimeofday relies on jiffies
counting and that indirectly relies on timer tick.

All in all, like other statistics work, use as many samples as you can
and show the reader the standart deviation etc etc.

regards,

Mulyadi.

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ