The patch titled sLeAZY FPU feature: x86_64 support has been added to the -mm tree. Its filename is sleazy-fpu-feature-x86_64-support.patch See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this ------------------------------------------------------ Subject: sLeAZY FPU feature: x86_64 support From: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx> Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every* context switch a trap is taken for the first FPU use to restore the FPU context lazily. This is of course great for applications that have very sporadic or no FPU use (since then you avoid doing the expensive save/restore all the time). However for very frequent FPU users... you take an extra trap every context switch. The patch below adds a simple heuristic to this code: After 5 consecutive context switches of FPU use, the lazy behavior is disabled and the context gets restored every context switch. If the app indeed uses the FPU, the trap is avoided. (the chance of the 6th time slice using FPU after the previous 5 having done so are quite high obviously). After 256 switches, this is reset and lazy behavior is returned (until there are 5 consecutive ones again). The reason for this is to give apps that do longer bursts of FPU use still the lazy behavior back after some time. Signed-off-by: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx> Cc: Andi Kleen <ak@xxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxx> --- arch/x86_64/kernel/process.c | 10 ++++++++++ arch/x86_64/kernel/traps.c | 1 + include/asm-x86_64/i387.h | 5 ++++- include/linux/sched.h | 9 +++++++++ 4 files changed, 24 insertions(+), 1 deletion(-) diff -puN arch/x86_64/kernel/process.c~sleazy-fpu-feature-x86_64-support arch/x86_64/kernel/process.c --- a/arch/x86_64/kernel/process.c~sleazy-fpu-feature-x86_64-support +++ a/arch/x86_64/kernel/process.c @@ -515,6 +515,10 @@ __switch_to(struct task_struct *prev_p, int cpu = smp_processor_id(); struct tss_struct *tss = &per_cpu(init_tss, cpu); + /* we're going to use this soon, after a few expensive things */ + if (next_p->fpu_counter>5) + prefetch(&next->i387.fxsave); + /* * Reload esp0, LDT and the page table pointer: */ @@ -618,6 +622,12 @@ __switch_to(struct task_struct *prev_p, } } + /* If the task has used fpu the last 5 timeslices, just do a full + * restore of the math state immediately to avoid the trap; the + * chances of needing FPU soon are obviously high now + */ + if (next_p->fpu_counter>5) + math_state_restore(); return prev_p; } diff -puN arch/x86_64/kernel/traps.c~sleazy-fpu-feature-x86_64-support arch/x86_64/kernel/traps.c --- a/arch/x86_64/kernel/traps.c~sleazy-fpu-feature-x86_64-support +++ a/arch/x86_64/kernel/traps.c @@ -1060,6 +1060,7 @@ asmlinkage void math_state_restore(void) init_fpu(me); restore_fpu_checking(&me->thread.i387.fxsave); task_thread_info(me)->status |= TS_USEDFPU; + me->fpu_counter++; } void __init trap_init(void) diff -puN include/asm-x86_64/i387.h~sleazy-fpu-feature-x86_64-support include/asm-x86_64/i387.h --- a/include/asm-x86_64/i387.h~sleazy-fpu-feature-x86_64-support +++ a/include/asm-x86_64/i387.h @@ -24,6 +24,7 @@ extern unsigned int mxcsr_feature_mask; extern void mxcsr_feature_mask_init(void); extern void init_fpu(struct task_struct *child); extern int save_i387(struct _fpstate __user *buf); +extern asmlinkage void math_state_restore(void); /* * FPU lazy state save handling... @@ -31,7 +32,9 @@ extern int save_i387(struct _fpstate __u #define unlazy_fpu(tsk) do { \ if (task_thread_info(tsk)->status & TS_USEDFPU) \ - save_init_fpu(tsk); \ + save_init_fpu(tsk); \ + else \ + tsk->fpu_counter = 0; \ } while (0) /* Ignore delayed exceptions from user space */ diff -puN include/linux/sched.h~sleazy-fpu-feature-x86_64-support include/linux/sched.h --- a/include/linux/sched.h~sleazy-fpu-feature-x86_64-support +++ a/include/linux/sched.h @@ -1027,6 +1027,15 @@ struct task_struct { spinlock_t delays_lock; struct task_delay_info *delays; #endif + /* + * fpu_counter contains the number of consecutive context switches + * that the FPU is used. If this is over a threshold, the lazy fpu + * saving becomes unlazy to save the trap. This is an unsigned char + * so that after 256 times the counter wraps and the behavior turns + * lazy again; this to deal with bursty apps that only use FPU for + * a short time + */ + unsigned char fpu_counter; }; static inline pid_t process_group(struct task_struct *tsk) _ Patches currently in -mm which might be from arjan@xxxxxxxxxxxxxxx are origin.patch lock-validator-fix-ns83820c-irq-flags-bug.patch sleazy-fpu-feature-x86_64-support.patch lockdep-floppyc-irq-release-fix.patch lockdep-add-is_module_address.patch lockdep-add-per_cpu_offset.patch lockdep-better-lock-debugging.patch lockdep-mutex-section-binutils-workaround.patch lockdep-locking-init-debugging-improvement.patch lockdep-beautify-x86_64-stacktraces.patch lockdep-x86_64-document-stack-frame-internals.patch lockdep-stacktrace-subsystem-core.patch lockdep-stacktrace-subsystem-i386-support.patch lockdep-stacktrace-subsystem-x86_64-support.patch lockdep-irqtrace-subsystem-core.patch lockdep-irqtrace-cleanup-of-include-asm-i386-irqflagsh.patch lockdep-irqtrace-cleanup-of-include-asm-x86_64-irqflagsh.patch lockdep-locking-api-self-tests.patch lockdep-core.patch lockdep-design-docs.patch lockdep-procfs.patch lockdep-prove-rwsem-locking-correctness.patch lockdep-prove-spinlock-rwlock-locking-correctness.patch lockdep-prove-mutex-locking-correctness.patch lockdep-kconfig.patch lockdep-print-all-lock-classes-on-sysrq-d.patch lockdep-x86_64-early-init.patch lockdep-x86-smp-alternatives-workaround.patch lockdep-do-not-recurse-in-printk.patch lockdep-fix-rt_hash_lock_sz.patch lockdep-annotate-direct-io.patch lockdep-annotate-serial.patch lockdep-annotate-dcache.patch lockdep-annotate-i_mutex.patch lockdep-annotate-futex.patch lockdep-annotate-genirq.patch lockdep-annotate-waitqueues.patch lockdep-annotate-mm.patch lockdep-annotate-serio.patch lockdep-annotate-skb_queue_head_init.patch lockdep-annotate-timer-base-locks.patch lockdep-annotate-scheduler-runqueue-locks.patch lockdep-annotate-hrtimer-base-locks.patch lockdep-annotate-sock_lock_init.patch lockdep-annotate-af_unix-locking.patch lockdep-annotate-bh_lock_sock.patch lockdep-annotate-mmap_sem.patch lockdep-annotate-sunrpc-code.patch lockdep-annotate-the-quota-code.patch lockdep-annotate-usbfs.patch lockdep-annotate-sound-core-seq-seq_portsc.patch lockdep-annotate-sound-core-seq-seq_devicec.patch lockdep-annotate-8390c-disable_irq.patch lockdep-annotate-3c59xc-disable_irq.patch lockdep-annotate-forcedethc-disable_irq.patch lockdep-annotate-enable_in_hardirq.patch lockdep-annotate-s_lock.patch lockdep-annotate-sb-s_umount.patch lockdep-annotate-slab-code.patch lockdep-annotate-blkdev-nesting.patch lockdep-annotate-vlan-net-device-as-being-a-special-class.patch lockdep-annotate-on-stack-completions-mmc.patch lockdep-annotate-sk_locks.patch lockdep-annotate-hostap-netdev-xmit_lock.patch bcm43xx-netlink-deadlock-fix.patch sleazy-fpu-feature-i386-support.patch delay-accounting-taskstats-interface-send-tgid-once-locking.patch make-more-file_operation-structs-static.patch make-more-file_operation-structs-static-fix.patch - To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html