Re: How to use the KNC Vectorregisters with GCC? Race condition with ICC & KNC?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 21.01.2014 20:53, schrieb Brian Budge:
On Tue, Jan 21, 2014 at 2:23 AM, Stephan Walter
<stephan.walter@xxxxxxxxxxxxxxxxxxxxxx> wrote:
Hi,

i am new to the gcc mailinglist, so i hope i am right here.

As the subject shows, i work with KNC. My problem is, that i have developed
a kernel modul for a NIC and now want to use the 512Bit registers of KNC for
some memcopy jobs.

I have experience how to use the GCC to compile der KNC-linux and kernel
moduls. So no problem at the moment. Everything works fine.

Before i started to write inline assembler with the 512Bit registers, i have
written some minimal examples.

On a normal i5-3470 everything works fine together with the gcc. Also on KNC
everything works. The problem now is, that when i try to use the 512Bit
registers, it looks like GCC doesn't know the register names and
instructions.

To solve the problem with the instructions i think is no problem, because i
have the instruction manual, but i have no idea how to solve the register
problem.

So i try to use the ICC with -mmic. The source compiles, but when i measure
the clock cycles with rdtsc, the two first check work, but the 3. and 4.
not.
I tried to solve the problem with the gdb, but when i use -g the mistake no
longer occur. Also when i use a printf, sleep(1) or usleep(1), the problem
is fixed. So i think there is a race condition with the write of the value
into the memory, because 1 or even 100 nops have no effects.

My inline assembler knowledge is rudimental, so i don't know if i have some
problems with the use of clobber registers and so on or if there is a bug in
gcc or icc.

That the -g with the icc solve the problem makes it impossible for me to
debug the problem. So i hope somebody is able to help me.

My favourite is to use gcc together with the 512Bit registers, if there is a
bug in my inline assembler, a solution/hint would be also fine.

So there is my code:

#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>


int rdtsc_count(void){
int count;
__asm__ __volatile__(   "rdtsc;                 \n\t"
                         "movl   %%eax, %0;      \n\t"
                          :"=m"(count)//, "=r"(brd), "=r"(crd), "=r"(drd)
                          :
                          :"%eax", "memory"//, "cc"//, "%ebx", "%ecx"
                         );

return count;
}


int main(int argc, char *argv[]){

int starta=0, startb=0, stopa=0, stopb=0;
int buffer_size=32;
uint64_t* buffer;
uint32_t buflen=atoi(argv[1]);


/////////////setup
buffer = (uint64_t*) malloc (buffer_size*sizeof(uint64_t));
packet_buffer = (uint64_t*) malloc (buffer_size*sizeof(uint64_t));
packet_buffer_ref= (uint64_t*) malloc (buffer_size*sizeof(uint64_t));//REF

waddr=0;

//printf("Adresse von packet_buffer %x", waddr);
printf("Orginaldaten\n");
for(i=0; i<buffer_size; i++){
         buffer[i]=i+i*i;
         packet_buffer[i]=0;
         packet_buffer_ref[i]=0;
         printf("%x\t", buffer[i]);
};
printf("\n");

printf("packet_buffer start\n");
for(i=0; i<buffer_size; i++){
         printf("%x\t", packet_buffer[i]);
};
printf("\n");

////////////end_setup

if(buflen==0 | buflen>120){
         printf("buflen too big or too small\n");
         return 0;
}


########################################
starta=rdtsc_count();
memcpy(&(packet_buffer_ref[waddr+1]), buffer,
sizeof(uint64_t)*(buflen));//REF
stopa=rdtsc_count();
printf("memcpy took\t%d\tclocks\n", stopa-starta);
########################################
##Here everything is fine
########################################

########################################
startb=rdtsc_count();
__asm__ (             "movq   %1,             %%rsi;          \n\t"
                         "movq   %0,             %%rdi;          \n\t"
                         "movl   %2,             %%ecx;          \n\t"
                         "addq   $8,             %%rdi;          \n\t"
//                      "shl    $3,             %%ecx;          \n\t"
         "Schleife:       movsq;                                 \n\t"
                         "loop Schleife;                         \n\t"
                         :"=m"(packet_buffer)
                         :"r"(buffer), "r"(buflen)
                         :"%rsi", "%rdi", "%rcx", "memory"
                         );

stopb=rdtsc_count();

######################################### If i use one of this functions,
everything is fine.
//usleep(1);
//printf("stopa %d\n", stopa);
//printf("fdsagfa\n");
#########################################
printf("asm movsq took\t%d\tclocks\n", stopb-startb);

########################################
##Here i have the problem. It looks like stopb or startb is still 0, when i
use no function between the output and the rdtsc_count()
########################################



I'm unsure if this is what is causing your problem, but rdtsc can be
executed out of order to other instructions, and so instructions
issued prior to rdtsc need not be complete before the measurement is
made.  I've seen that the cpuid instruction  forces this to be the
case.  I believe also that using rdtscp will prevent the reordering on
its own.

   Brian

KNC is a inorder CPU-Design, so there should be no instruction reodering. The only possibility is the superscalarity, but then i don't know how to be save, that a instruction have been already done.




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux