Hi Peter, The post: http://aufather.wordpress.com/2010/09/08/high-performance-time-measuremen-in-linux Is very good reference. The author is clever and It answers my questions. Thanks! :-) I also found old school document from 1997 made by Intel which was proud of its brand new Pentium-II. See: http://www.ccsl.carleton.ca/~jamuir/rdtscpm1.pdf The section 3 describes how to deal with out of order execution and L1 cache. Thanks! Peter On Mon, Oct 3, 2011 at 9:57 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote: > On Tue, Oct 4, 2011 at 2:57 AM, Peter Senna Tschudin > <peter.senna@xxxxxxxxx> wrote: >> Hi Peter, >> >> Thanks for the repply. I've realized that I have no need to transform >> the arbitrary number in something like seconds because I'm interested >> in comparing them. >> >> Is it safe to say that if I do not make the division by >> CPU_THOUSAND_HZ I have the number of clock cycles that were "spent" >> between the calls to getticks()(including some for getticks() itself)? >> >> Please see below. >> >> Thank you! >> >> Peter >> >> On Mon, Oct 3, 2011 at 1:17 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote: >>> why not u put a sleep(1) here like this: >>> >>>> ticks tickBegin, tickEnd; >>>> tickBegin = getticks(); >>>> >>> >>> sleep(1); >>> >>>> >>>> tickEnd = getticks(); >>>> double time = (tickEnd-tickBegin)/CPU_THOUSAND_HZ; >>>> >>> Then u know that it is reading the TSC values for 1 sec. And by >>> running the same program on different system u will get different >>> "time" values, and then u divide by that values for THAT system - so >>> that eventually running the same program on different system will get >>> u the same difference of ticks, which in our present case is "1". >>> After this "normalization", you can run your system with any timing >>> difference, and maximum achievable resolution is of course 1 sec. Is >>> that what u wanted? >> >> That sounds as great idea but: >> - may dynamic clock rate and multiple CPU cores mess with your proposal? >> - How precise is sleep about sleeping for 1 second? >> - I hope that the out of order execution mechanism of the CPU gets >> frustrated with your proposal and runs the instructions in the order >> we're expecting (tickBegin-> sleep-> tickEnd). How can we be sure that >> the instructions were run in correct order? >> >>> >>> BTW, modern OS does not use TSC any more, but yes, your assembly can >>> still access and read TSC. But the OS usually read from HPET (which >>> is how sleep(1) calculate the time differences) and to read the HPET >>> here is a link: >>> >>> http://www.fftw.org/cycle.h >> >> Looking cycle.h I found this familiar code(starts on line 216): >> >> /*----------------------------------------------------------------*/ >> /* >> * X86-64 cycle counter >> */ >> >> static __inline__ ticks getticks(void) >> { >> unsigned a, d; >> asm volatile("rdtsc" : "=a" (a), "=d" (d)); >> return ((ticks)a) | (((ticks)d) << 32); >> } >> > > Oh no, you are right, I just re-quote the link from Wiki which says > 'code to read the high-resolution timer on many CPUs and compilers' > ......ok, RDTSC is nevertheless a high resolution timer as well..... > >> The code found on cycle.h is so similar to the one I was using that I >> guess that both codes were written by the same author. I got the code >> I'm using from the paper at: >> http://people.virginia.edu/~chg5w/page3/assets/MeasuringUnix.pdf >> >>> >>> And query the OS via: >>> >>> cat /sys/devices/system/clocksource/clocksource0/* >>> hpet acpi_pm >>> hpet >>> > > The above is from my x86 Ubuntu 10.04 laptop. > >>> and u can see from above that "tsc" is missing from my system. >>> (linux kernel is 2.6.35-22) >>> >>> For TSC, I am not sure what is the highest resolution u can go, but in >>> a modern SoC chip, with 600Mhz core speed (speaking of PowerPC >>> http://en.wikipedia.org/wiki/PowerPC_e500), the fastest execution is >>> 600 millions instruction per sec, assuming the instruction is one insn >>> per clock. With this kind of speed, TSC is a very bad for measuring >>> time differences. >> >> This is my mistake. I did not told you about my tests will run only on x86 arch. >> > > Sorry to you too....I forgotten to mention that my hpet output is from > x86 arch. Anyway, TSC is nevertheless a valid timer, after some > research, I found its resolution is as good as HPET: > > http://aufather.wordpress.com/2010/09/08/high-performance-time-measuremen-in-linux/ > > But it did highlight lots of risks with TSC. > > And reading further: > > http://stackoverflow.com/questions/3388134/rdtsc-accuracy-across-cpu-cores > > http://stackoverflow.com/questions/3835111/whats-the-most-accurate-way-of-measuring-elapsed-time-in-a-modern-pc > > http://the-b.org/Linux_timers > > beware of something called "constant TSC" or 'invariant tsc', and > overflow time (all different timers are given except for TSC, in link > above) - if your duration is longer than that, the timer would have > turnaround before that and gave you inaccurate figures. > >>> >>> On Mon, Oct 3, 2011 at 9:27 AM, Peter Senna Tschudin >>> <peter.senna@xxxxxxxxx> wrote: >>>> Dear list members, >>>> >>>> I'm following: >>>> >>>> http://people.virginia.edu/~chg5w/page3/assets/MeasuringUnix.pdf >>>> >>>> And I'm trying to measure executing time of simple operations with RDTSC. >>>> >>>> See the code below: >>>> >>>> #include <stdio.h> >>>> #define CPU_THOUSAND_HZ 800000 >>>> typedef unsigned long long ticks; >>>> static __inline__ ticks getticks(void) { >>>> unsigned a, d; >>>> asm("cpuid"); >>>> asm volatile("rdtsc" : "=a" (a), "=d" (d)); >>>> return (((ticks)a) | (((ticks)d) << 32)); >>>> } >>>> >>>> void main() { >>>> ticks tickBegin, tickEnd; >>>> tickBegin = getticks(); >>>> >>>> // code to time >>>> >>>> tickEnd = getticks(); >>>> double time = (tickEnd-tickBegin)/CPU_THOUSAND_HZ; >>>> >>>> printf ("%Le\n", time); >>>> } >>>> >>>> How can the C code detects the correct value for CPU_THOUSAND_HZ? The >>>> problems I see are: >>>> - It is needed to collect the information for the CPU that will run >>>> the process. On Core i7 processors, different cores can run at >>>> different clock speed at same time. >>>> - If the clock changes during the execution of process, what should >>>> it do? When is the best time for collecting the clock speed? >>>> >>>> The authors of the paper are not sure about the effects of >>>> "asm("cpuid");" Does it ensure that the entire process will run on the >>>> same CPU, and will serialize it avoiding out of order execution by the >>>> CPU? >>>> >>>> Thank you very much! :-) >>>> >>>> Peter >>>> >>>> >>>> -- >>>> Peter Senna Tschudin >>>> peter.senna@xxxxxxxxx >>>> gpg id: 48274C36 >>>> >>>> _______________________________________________ >>>> Kernelnewbies mailing list >>>> Kernelnewbies@xxxxxxxxxxxxxxxxx >>>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >>>> >>> >>> >>> >>> -- >>> Regards, >>> Peter Teoh >>> >> >> >> >> -- >> Peter Senna Tschudin >> peter.senna@xxxxxxxxx >> gpg id: 48274C36 >> > > > > -- > Regards, > Peter Teoh > -- Peter Senna Tschudin peter.senna@xxxxxxxxx gpg id: 48274C36 _______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies