aloha! Some days ago I did some testing with the {un,}likly() macro's defined in Linux (defined using __builtin_expect()), and discovered some unexpected results. Using the test.c source (see below), compiled it with GCC v4.1.2, v4.2.2 or v4.3.0 20071130, and spesified one of -DVER_A, -DVER_B or -DVER_B to gcc, and run the resulting program like this (see below for the script named numbers): ./test 2097152 $(./numbers 1024 1) I made the below table. The user ticks value's (see output of ./test) is averaged over 5 different run's of the program. user tick: -DVER_A -DVER_B -DVER_C ------------------------------------------- gcc-4.1.2 1609.8 2468.8 1511.8 gcc-4.2.2 1296.6 1812.2 1239.4 gcc-4.3.0 1468.2 1468.8 1297.0 I was expecting that test.c compiled with -DVER_A or -DVER_B should have produced almost the same user tick value's, but as the table above shows the -DVER_B version is slower than the -DVER_A version. (expect when test.c is compiled with gcc v4.3.0). This is because I was expecting the following implication to hold true, using the unlikly() macro from test.c: unlikly(expr_A || expr_B) => unlikly(expr_A) || unlikly(expr_B) Would this maybe indicate an (un)known bug or feature of the GCC compiler v4.1.2 and v4.2.2, and then corrected in GCC v4.3.0(1)? best regards, -- kjetil PS! CPU is Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz 1) When compiling the inner if-test moved inside an function and file of it's own, I noticed that the *.s files was identical for -DVER_A and -DVER_B _____________________________ Below is the C source file (test.c). The if-test inside the inner for-loop is based on some snippet of the rw_copy_check_uvector() function inside the fs/read_write.c file in Linux v2.6.23.9. Compile with: gcc -O3 -DVER_A -o test test.c #include <stdio.h> #include <stdlib.h> #include <sys/times.h> #define MAX_INTS_ARR 16384 #define unlikely(x) __builtin_expect(!!(x), 0) int main(int argc, char ** argv) { int ints_arr[MAX_INTS_ARR]; unsigned long time_diff; signed long long int ret=0, num=0; unsigned long j, j_max=5; unsigned long k, ints_max=0; struct tms times_start, times_finish; if ( argc <= 2 ) { printf("usage: %s repeat-num num_1 num_2 ... num_n\n", argv[0]); return -1; } else { j_max = strtoul(argv[1], NULL, 10); while ( argc-- > 2 && ints_max < MAX_INTS_ARR ) { ints_arr[ints_max++] = atoi(argv[argc]); } } times(×_start); for (j = 0; j < j_max; j++) { for (k = 0; k < ints_max; k++) { num = ints_arr[k]; #if defined(VER_A) if ( unlikely(num < 0) || unlikely(ret + num < ret) ) { #elif defined(VER_B) if ( unlikely(num < 0 || ret + num < ret) ) { #else /* VER_C */ if ( num < 0 || ret + num < ret ) { #endif ret = 42; goto out; } ret += num; } } out: times(×_finish); time_diff = times_finish.tms_utime - times_start.tms_utime; printf("%s j_max:%lu ints_max:%lu user ticks:%lu\n", #if defined(VER_A) "version A", #elif defined(VER_B) "version B", #else /* VER_C */ "version C", #endif j_max, ints_max,time_diff); return (int)ret; } _____________________________ Below is the small script, named numbers, used to generate some input argument's to the test program above: #!/bin/bash for ((i = 0; $i < $1; i++)); do printf "$2 "; done