* Russell King (rmk+lkml@xxxxxxxxxxxxxxxx) wrote: > On Tue, Nov 11, 2008 at 01:28:00PM -0500, Mathieu Desnoyers wrote: > > Let's see what it gives once implemented. Only compile-tested. Assumes > > pxa, sa110 and mn10300 are all UP-only. Correct smp_rmb() are used for > > arm versatile. > > Versatile is also UP only. > > The following are results from PXA built with gcc 3.4.3: > > 1. two additional registers used in sched_clock() > 2. 8 additional bytes of code (which are needless if gcc was more inteligent) > > both of these I put down to inefficiencies in gcc's register allocation. > > 3. worse instruction scheduling - two inter-dependent loads next to each > other causing a pipeline stall > > Actual reading of variables/hardware is unaffected by this patch. > > Old code: > > c: e59f3050 ldr r3, [pc, #80] ; load address of oscr2ns_scale > 10: e59fc050 ldr ip, [pc, #80] ; load address of __m_cnt_hi > 14: e5932000 ldr r2, [r3] ; read oscr2ns_scale > 18: e59f304c ldr r3, [pc, #76] ; load address of OSCR > 1c: e59c1000 ldr r1, [ip] ; read __m_cnt_hi > 20: e1a07002 mov r7, r2 > 24: e3a08000 mov r8, #0 ; 0x0 > 28: e5933000 ldr r3, [r3] ; read OSCR register > ... > 58: e1820b04 orr r0, r2, r4, lsl #22 > 5c: e1a01524 lsr r1, r4, #10 > 60: e89da9f0 ldm sp, {r4, r5, r6, r7, r8, fp, sp, pc} > > > New code: > > c: e59f0058 ldr r0, [pc, #88] ; load address of oscr2ns_scale > 10: e5901000 ldr r1, [r0] ; read oscr2ns_scale <= pipeline stall > 14: e59f3054 ldr r3, [pc, #84] ; load address of __m_cnt_hi > 18: e1a08001 mov r8, r1 > 1c: e5932000 ldr r2, [r3] ; read __m_cnt_hi > 20: e59f304c ldr r3, [pc, #76] ; load address of OSCR > 24: e1a09002 mov r9, r2 > 28: e3a0a000 mov sl, #0 ; 0x0 > 2c: e5933000 ldr r3, [r3] ; read OSCR > ... > 58: e1825b04 orr r5, r2, r4, lsl #22 > 5c: e1a06524 lsr r6, r4, #10 > 60: e1a01006 mov r1, r6 > 64: e1a00005 mov r0, r5 > 68: e89daff0 ldm sp, {r4, r5, r6, r7, r8, r9, sl, fp, sp, pc} > > Versatile: > > 1. 12 additional bytes of code > 2. same number of registers > 3. worse instruction scheduling causing pipeline stall > > Actual reading of variables/hardware is unaffected by this patch. > > So, we have two platforms where this patch makes things visibly worse > with no material benefit. > I think the added barrier() are causing these pipeline stalls. They don't allow the compiler to read variables such as oscr2ns_scale before the barrier because gcc cannot assume it won't be modified. However, to insure that OSCR read is done after __m_cnt_hi read, this barrier seems required to be safe against gcc optimizations. Have you compared my patch to Nicolas'patch, which adds a smp_rmb() in the macro or to a vanilla tree ? I wonder if reading those values sooner would help gcc ? Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html