On 06/03/2015 04:21, Ralf Baechle wrote: > On Tue, Jun 02, 2015 at 06:21:33PM -0400, Joshua Kinard wrote: > >> From: Joshua Kinard <kumba@xxxxxxxxxx> >> >> The R12000 added a new feature to enhance branch prediction called >> "global history". Per the Vr10000 Series User Manual (U10278EJ4V0UM), >> Coprocessor 0, Diagnostic Register (22): >> >> """ >> If bit 26 is set, branch prediction uses all eight bits of the global >> history register. If bit 26 is not set, then bits 25:23 specify a count >> of the number of bits of global history to be used. Thus if bits 26:23 >> are all zero, global history is disabled. >> >> The global history contains a record of the taken/not-taken status of >> recently executed branches, and when used is XOR'ed with the PC of a >> branch being predicted to produce a hashed value for indexing the BPT. >> Some programs with small "working set of conditional branches" benefit >> significantly from the use of such hashing, some see slight performance >> degradation. >> """ >> >> This patch enables global history on R12000 CPUs and up by setting bit >> 26 in the branch prediction diagnostic register (CP0 $22) to '1'. Bits >> 25:23 are left alone so that all eight bits of the global history >> register are available for branch prediction. > > Will apply but could you also submit a patch to set cpu_has_bp_ghist to > 0/1 as applicable in all cpu-feature-overrides.h? I can, though at that point, the R10000 Kconfig item needs to be split to differentiate between R10000 and R12000/R14000/R16000. I sent a patch in to do that a few weeks ago, but it was rejected. Can you outline your specific issues with it and I'll re-submit it, then the 'cpu_has_bp_ghist' define can be '0' for R10000's and '1' for R12K-R16K? That'll also set things up for the potential discovery of bits specific to R14K/R16K that may be useful, but aren't known/understood just. > Also the manual suggests this CPU feature may not always be neneficial > for performance so I'm wondering if we should add a way to modify it > at runtime. I thought about this, too. It'd also allow for R12K+ options to control the Disable Branch Target Address Cache (BTAC, Bit 27) and the Disable Branch Return Cache (Bit 22). For global history, I just set Bit 26 so all of the ghistory bits are available, but even this could become a Kconfig item to control Bits 25:23. Would probably require some benchmarking to see what the effects of this are, but the entry in the manual suggests that the benefits outweigh the penalties in the end. > I'm curious, have you checked the default setting of the global history > on kernel entry? Yup, it's disabled by default: [ 0.000000] DEBUG: CPU0: c0_diag #1: 0x000400001030c000 [ 0.000000] DEBUG: CPU0: c0_diag #2: 0x0004000014148000 [ 7.798066] DEBUG: CPU1: c0_diag #1: 0x00000000103c8000 [ 7.798092] DEBUG: CPU1: c0_diag #2: 0x0000000014144000 I B G -BRC- -----------BP---------- T S B H H D | | | M S L I T I I B | | | o t I B d A S S R | | | M d a d O 0 M 0 x C T T C V W H P e t ** x 0 p xxxxxxxxxxxx xxxx xxxxxxxxxxxxxxxx xxxx x x xxx x x x x x xx xx xx xxxxxxxxx x xx --------------------------------------------------------------------------------- 000000000000 0100 0000000000000000 0001 0 0 000 0 1 1 0 0 00 11 00 000000000 0 00 CPU0 Before 000000000000 0100 0000000000000000 0001 0 1 000 0 0 1 0 1 00 10 00 000000000 0 00 CPU0 After 000000000000 0000 0000000000000000 0001 0 0 000 0 1 1 1 1 00 10 00 000000000 0 00 CPU1 Before 000000000000 0000 0000000000000000 0001 0 1 000 0 0 1 0 1 00 01 00 000000000 0 00 CPU1 After --------------------------------------------------------------------------------- 12 4 16 4 1 1 3 1 1 1 1 1 2 2 2 9 1 2 ** R12000 and up: Upper-two bits of BP-Idx. --J