Hey- Theres a bug in how numactl handles the building of the mask when executing the --physbind option. The call to cpumask in main uses the variable ncpus as the second parameter to cpumask, and treats it as a number of cpus when calling numa_sched_setaffinity, as is evidenced by the fvact that it attempts to convert it to a byte count again with the CPU_BYTES macro. The problem is that cpumask already stores the size of the mask in bytes in the ncpus variable (rounded up to the nearest long, since thats how its allocated). This just happens to work for most systems, but we wind up passing in too short a size to numa_sched_setaffinity if you try to bind to processors greater than 64: consider the following situation (assuming an x86_64 system): syconf(NR_CPUS) == 128 ncpus on return from cpumask = 128 / 4 = 32 CPU_BYTES(32) = 32/8 = 4 so 4 is the value passed as cpusetsize (which is expressed in bytes) to sched_setaffinity. Even though you will have enough allocated space to set affinity on all 128 processors in cpumask, sched_setaffinity won't be able to see any part of the mast beyond the first 32 bits because of this bug. the patch bellow corrects this by clarifying the meaning of that variable and passing it consistently as a number of bytes. I've tested it and observed it to work well on a 64 way system. Thanks & Regards Neil Signed-off-by: Neil Horman <nhorman@xxxxxxxxxxxxx> numactl.c | 6 +++--- util.c | 4 ++-- util.h | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff -up numactl-1.0.2/numactl.c.orig numactl-1.0.2/numactl.c --- numactl-1.0.2/numactl.c.orig 2007-09-21 06:23:51.000000000 -0400 +++ numactl-1.0.2/numactl.c 2008-04-24 15:22:44.000000000 -0400 @@ -355,14 +355,14 @@ int main(int ac, char **av) break; case 'C': /* --physcpubind */ { - int ncpus; + int bufsz; unsigned long *cpubuf; dontshm("-C/--physcpubind"); - cpubuf = cpumask(optarg, &ncpus); + cpubuf = cpumask(optarg, &bufsz); errno = 0; check_cpubind(do_shm); did_cpubind = 1; - numa_sched_setaffinity(0, CPU_BYTES(ncpus), cpubuf); + numa_sched_setaffinity(0, bufsz, cpubuf); checkerror("sched_setaffinity"); free(cpubuf); break; diff -up numactl-1.0.2/util.h.orig numactl-1.0.2/util.h --- numactl-1.0.2/util.h.orig 2007-08-16 10:36:23.000000000 -0400 +++ numactl-1.0.2/util.h 2008-04-24 15:22:44.000000000 -0400 @@ -1,7 +1,7 @@ extern void printmask(char *name, nodemask_t *mask); extern void printcpumask(char *name, unsigned long *mask, int len); extern nodemask_t nodemask(char *s); -extern unsigned long *cpumask(char *s, int *ncpus); +extern unsigned long *cpumask(char *s, int *bufsz); extern int read_sysctl(char *name); extern void complain(char *fmt, ...); extern void nerror(char *fmt, ...); diff -up numactl-1.0.2/util.c.orig numactl-1.0.2/util.c --- numactl-1.0.2/util.c.orig 2008-04-24 15:23:02.000000000 -0400 +++ numactl-1.0.2/util.c 2008-04-24 15:23:17.000000000 -0400 @@ -52,7 +52,7 @@ void printmask(char *name, nodemask_t *m int numcpus; /* caller must free buffer */ -unsigned long *cpumask(char *s, int *ncpus) +unsigned long *cpumask(char *s, int *bufsz) { int invert = 0; char *end; @@ -110,7 +110,7 @@ unsigned long *cpumask(char *s, int *ncp set_bit(i, cpubuf); } } - *ncpus = cpubufsize; + *bufsz = cpubufsize; return cpubuf; } -- /**************************************************** * Neil Horman <nhorman@xxxxxxxxxxxxx> * Software Engineer, Red Hat ****************************************************/ -- To unsubscribe from this list: send the line "unsubscribe linux-numa" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html