Hi,
i am new to the gcc mailinglist, so i hope i am right here.
As the subject shows, i work with KNC. My problem is, that i have developed
a kernel modul for a NIC and now want to use the 512Bit registers of KNC for
some memcopy jobs.
I have experience how to use the GCC to compile der KNC-linux and kernel
moduls. So no problem at the moment. Everything works fine.
Before i started to write inline assembler with the 512Bit registers, i have
written some minimal examples.
On a normal i5-3470 everything works fine together with the gcc. Also on KNC
everything works. The problem now is, that when i try to use the 512Bit
registers, it looks like GCC doesn't know the register names and
instructions.
To solve the problem with the instructions i think is no problem, because i
have the instruction manual, but i have no idea how to solve the register
problem.
So i try to use the ICC with -mmic. The source compiles, but when i measure
the clock cycles with rdtsc, the two first check work, but the 3. and 4.
not.
I tried to solve the problem with the gdb, but when i use -g the mistake no
longer occur. Also when i use a printf, sleep(1) or usleep(1), the problem
is fixed. So i think there is a race condition with the write of the value
into the memory, because 1 or even 100 nops have no effects.
My inline assembler knowledge is rudimental, so i don't know if i have some
problems with the use of clobber registers and so on or if there is a bug in
gcc or icc.
That the -g with the icc solve the problem makes it impossible for me to
debug the problem. So i hope somebody is able to help me.
My favourite is to use gcc together with the 512Bit registers, if there is a
bug in my inline assembler, a solution/hint would be also fine.
So there is my code:
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
int rdtsc_count(void){
int count;
__asm__ __volatile__( "rdtsc; \n\t"
"movl %%eax, %0; \n\t"
:"=m"(count)//, "=r"(brd), "=r"(crd), "=r"(drd)
:
:"%eax", "memory"//, "cc"//, "%ebx", "%ecx"
);
return count;
}
int main(int argc, char *argv[]){
int starta=0, startb=0, stopa=0, stopb=0;
int buffer_size=32;
uint64_t* buffer;
uint32_t buflen=atoi(argv[1]);
/////////////setup
buffer = (uint64_t*) malloc (buffer_size*sizeof(uint64_t));
packet_buffer = (uint64_t*) malloc (buffer_size*sizeof(uint64_t));
packet_buffer_ref= (uint64_t*) malloc (buffer_size*sizeof(uint64_t));//REF
waddr=0;
//printf("Adresse von packet_buffer %x", waddr);
printf("Orginaldaten\n");
for(i=0; i<buffer_size; i++){
buffer[i]=i+i*i;
packet_buffer[i]=0;
packet_buffer_ref[i]=0;
printf("%x\t", buffer[i]);
};
printf("\n");
printf("packet_buffer start\n");
for(i=0; i<buffer_size; i++){
printf("%x\t", packet_buffer[i]);
};
printf("\n");
////////////end_setup
if(buflen==0 | buflen>120){
printf("buflen too big or too small\n");
return 0;
}
########################################
starta=rdtsc_count();
memcpy(&(packet_buffer_ref[waddr+1]), buffer,
sizeof(uint64_t)*(buflen));//REF
stopa=rdtsc_count();
printf("memcpy took\t%d\tclocks\n", stopa-starta);
########################################
##Here everything is fine
########################################
########################################
startb=rdtsc_count();
__asm__ ( "movq %1, %%rsi; \n\t"
"movq %0, %%rdi; \n\t"
"movl %2, %%ecx; \n\t"
"addq $8, %%rdi; \n\t"
// "shl $3, %%ecx; \n\t"
"Schleife: movsq; \n\t"
"loop Schleife; \n\t"
:"=m"(packet_buffer)
:"r"(buffer), "r"(buflen)
:"%rsi", "%rdi", "%rcx", "memory"
);
stopb=rdtsc_count();
######################################### If i use one of this functions,
everything is fine.
//usleep(1);
//printf("stopa %d\n", stopa);
//printf("fdsagfa\n");
#########################################
printf("asm movsq took\t%d\tclocks\n", stopb-startb);
########################################
##Here i have the problem. It looks like stopb or startb is still 0, when i
use no function between the output and the rdtsc_count()
########################################