Re: How to use the KNC Vectorregisters with GCC? Race condition with ICC & KNC?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 21, 2014 at 2:23 AM, Stephan Walter
<stephan.walter@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> Hi,
>
> i am new to the gcc mailinglist, so i hope i am right here.
>
> As the subject shows, i work with KNC. My problem is, that i have developed
> a kernel modul for a NIC and now want to use the 512Bit registers of KNC for
> some memcopy jobs.
>
> I have experience how to use the GCC to compile der KNC-linux and kernel
> moduls. So no problem at the moment. Everything works fine.
>
> Before i started to write inline assembler with the 512Bit registers, i have
> written some minimal examples.
>
> On a normal i5-3470 everything works fine together with the gcc. Also on KNC
> everything works. The problem now is, that when i try to use the 512Bit
> registers, it looks like GCC doesn't know the register names and
> instructions.
>
> To solve the problem with the instructions i think is no problem, because i
> have the instruction manual, but i have no idea how to solve the register
> problem.
>
> So i try to use the ICC with -mmic. The source compiles, but when i measure
> the clock cycles with rdtsc, the two first check work, but the 3. and 4.
> not.
> I tried to solve the problem with the gdb, but when i use -g the mistake no
> longer occur. Also when i use a printf, sleep(1) or usleep(1), the problem
> is fixed. So i think there is a race condition with the write of the value
> into the memory, because 1 or even 100 nops have no effects.
>
> My inline assembler knowledge is rudimental, so i don't know if i have some
> problems with the use of clobber registers and so on or if there is a bug in
> gcc or icc.
>
> That the -g with the icc solve the problem makes it impossible for me to
> debug the problem. So i hope somebody is able to help me.
>
> My favourite is to use gcc together with the 512Bit registers, if there is a
> bug in my inline assembler, a solution/hint would be also fine.
>
> So there is my code:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <inttypes.h>
>
>
> int rdtsc_count(void){
> int count;
> __asm__ __volatile__(   "rdtsc;                 \n\t"
>                         "movl   %%eax, %0;      \n\t"
>                          :"=m"(count)//, "=r"(brd), "=r"(crd), "=r"(drd)
>                          :
>                          :"%eax", "memory"//, "cc"//, "%ebx", "%ecx"
>                         );
>
> return count;
> }
>
>
> int main(int argc, char *argv[]){
>
> int starta=0, startb=0, stopa=0, stopb=0;
> int buffer_size=32;
> uint64_t* buffer;
> uint32_t buflen=atoi(argv[1]);
>
>
> /////////////setup
> buffer = (uint64_t*) malloc (buffer_size*sizeof(uint64_t));
> packet_buffer = (uint64_t*) malloc (buffer_size*sizeof(uint64_t));
> packet_buffer_ref= (uint64_t*) malloc (buffer_size*sizeof(uint64_t));//REF
>
> waddr=0;
>
> //printf("Adresse von packet_buffer %x", waddr);
> printf("Orginaldaten\n");
> for(i=0; i<buffer_size; i++){
>         buffer[i]=i+i*i;
>         packet_buffer[i]=0;
>         packet_buffer_ref[i]=0;
>         printf("%x\t", buffer[i]);
> };
> printf("\n");
>
> printf("packet_buffer start\n");
> for(i=0; i<buffer_size; i++){
>         printf("%x\t", packet_buffer[i]);
> };
> printf("\n");
>
> ////////////end_setup
>
> if(buflen==0 | buflen>120){
>         printf("buflen too big or too small\n");
>         return 0;
> }
>
>
> ########################################
> starta=rdtsc_count();
> memcpy(&(packet_buffer_ref[waddr+1]), buffer,
> sizeof(uint64_t)*(buflen));//REF
> stopa=rdtsc_count();
> printf("memcpy took\t%d\tclocks\n", stopa-starta);
> ########################################
> ##Here everything is fine
> ########################################
>
> ########################################
> startb=rdtsc_count();
> __asm__ (             "movq   %1,             %%rsi;          \n\t"
>                         "movq   %0,             %%rdi;          \n\t"
>                         "movl   %2,             %%ecx;          \n\t"
>                         "addq   $8,             %%rdi;          \n\t"
> //                      "shl    $3,             %%ecx;          \n\t"
>         "Schleife:       movsq;                                 \n\t"
>                         "loop Schleife;                         \n\t"
>                         :"=m"(packet_buffer)
>                         :"r"(buffer), "r"(buflen)
>                         :"%rsi", "%rdi", "%rcx", "memory"
>                         );
>
> stopb=rdtsc_count();
>
> ######################################### If i use one of this functions,
> everything is fine.
> //usleep(1);
> //printf("stopa %d\n", stopa);
> //printf("fdsagfa\n");
> #########################################
> printf("asm movsq took\t%d\tclocks\n", stopb-startb);
>
> ########################################
> ##Here i have the problem. It looks like stopb or startb is still 0, when i
> use no function between the output and the rdtsc_count()
> ########################################
>
>

I'm unsure if this is what is causing your problem, but rdtsc can be
executed out of order to other instructions, and so instructions
issued prior to rdtsc need not be complete before the measurement is
made.  I've seen that the cpuid instruction  forces this to be the
case.  I believe also that using rdtscp will prevent the reordering on
its own.

  Brian




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux