Re: [PATCH] MIPS: tracing: Optimize the implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Thu, 2010-03-11 at 10:49 -0800, David Daney wrote:
[...]
> > +/*
> > + * If the Instruction Pointer is in module space (0xc0000000), return ture;
> 
> s/ture/true/
> 

yeah.

> > + * otherwise, it is in kernel space (0x80000000), return false.
> > + */
> > +#define in_module(ip) (unlikely((ip)&  0x40000000))
> > +
> 
> This isn't universally true, but it does hold for most configurations I 
> think.

Although I'm not sure who is the exception, we always need an universal
solution, what about this:

Compare module with kernel:

module:

        <saving registers>

        lui     v1, hi16_mcount                <--- ip
        addiu   v1, v1, lo16_mcount
        move    at, ra
        jalr    v1
         nop

kernel:

        <saving registers>

         move    at, ra
         jal     _mcount                       <--- ip

The above _ip_ is the address have been recorded into the __mcount_loc
section of the kernel by scripts/recordmcount.pl, as we can see, for
kernel, the *(ip - 4) is "move at, ra": 03e0082d, a certain instruction,
but for module, there is no possibility(?) of existing a "move at, ra"
at *(ip -4) but a register saving operation("s {d,w} rs, offset(sp)",
prefixed by 0xffb0 for 64bit and 0xafb0 for 32bit. ), and reversly, for
kernel, there is no such instruction there.

And consider the new option -mmcount-ra-address of gcc, some more
instructions will be inserted between "move at, ra" and the calling site
to mcount, so, *(ip-4) will not always be "move at, ra", then we need to
check if there is a "s {d,w} rs, offset(sp)" there, if yes, it is in
module, otherwise, it should be in kernel.

#define S_RS_SP          0xafb00000      /* s{d,w} rs, offset(sp) */

static inline int in_module(ip)
{
	insn = *(ip - 4); /* need to use safe_load_code instead, what about big
endian? */

	return ((insn & S_RS_SP) == S_RS_SP)
}

> 
> [...]
> 
> > +	/*
> > +	 * We have compiled modules with -mlong-calls, but compiled kernel
> > +	 * without it, therefore, need to cope with them respectively.
> > +	 *
> > +	 * For module:
> > +	 *
> > +	 *	lui	v1, hi16_mcount		-->  b	1f
> > +	 *	addiu	v1, v1, lo16_mcount
> > +	 *	move	at, ra
> > +	 *	jalr	v1
> > +	 *	 nop
> > +	 *					1f: (ip + 16)
> 
> 
> Have you thought about just overwriting the jalr here instead of 
> branching around it?  In any event, I don't think you can count on a 
> fixed size code sequence for calling _mcount.  We are passing the 
> address of the save location of RA to _mcount too.  The size of the code 
> will depend on the size of the functions stack frame *and* weather or 
> not it is a leaf function.  Although in the kernel we are unlikely to 
> see functions with large stack frames.

So even with "b 1f", we need to use the right offset, the original
version for module with -mmcount-ra-address should have bugs here for
the offset should be 16 + 8 or 4 (two instructions for leaf function,
one instruction for non-leaf function).

but for we only recorded the position of "lui v1, hi16_mcount" in the
__mcount_loc section, so we need to search the position of the real
calling site of mcount(jalr v1), this will goes to what you have
suggested below.

> 
> 
> > +	 * For kernel:
> > +	 *
> > +	 *	move	at, ra
> > +	 *	jal	_mcount			-->  nop
> > +	 *
> > +	 */
> > +	new = in_module(ip) ? INSN_B_1F : INSN_NOP;
> 
> 
> What would happen if you read the code to find the first JAL or JALR, 
> and then overwrote it with a NOP instead of relying on the function 
> address to figure out which type of prolog it has?
> 
> The reason I suggest this is that sometimes we place the entire kernel 
> in CSSEG.  When this is done, everything has the same (short) _mcount 
> calling sequence.

Right, then, we can search the JAL or JALR, for kernel, will get it
immediatly, for module, will only several instructions, we can do this
searching in ftrace_make_nop and ftrace_make_call at run-time, but just
found we can use the following function to do it in ftrace_init(), looks
good.

static inline int is_call_mcount(unsigned int insn)
{
	return ((insn & JAL) == JAL) || (insn == JALR_V1);
}

static inline unsinged long mcount_callsite(unsigned long addr)
{
	unsigned int insn;

	insn = *(unsigned int *)addr; /*need safe_load_code*/
	if (is_call_mcount(insn))
		return addr;

	do {
		addr += 4;	/* what about big endian? */
		insn = *(unsigned int *)addr; /*need safe_load_code*/
	} while (!is_call_mcount(insn));

	return addr;
}

static inline unsigned long ftrace_call_adjust(unsigned long addr)
{
        return mcount_callsite(addr);
}

With the above support, we only need this new ftrace_make_nop:

*(unsigned int *)ip = INSN_NOP;

(But for module, this may need more overhead than "b 1f". )

and ftrace_make_call:

*(unsigned int *)ip = in_module(ip) : INSN_JALR_V1 : insn_jal_mcount;

(And here, for module, we need more time to determine which space we
are.)

Any more suggestion?

Thanks & Regards,
	Wu Zhangjin



[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux