Re: [SOLVED] pt_regs structure for sys_clone()

eax <tcpip@xxxxxxx> · Mon, 29 May 2006 15:18:43 +0200

Hi all.
I finally found the solution for my previously described problem and
wanted to tell it all the world and all the poor people searching for it
like I did the past month :)
Well in fact the solution has always been here but i didn's see it.
Basically it's true, that all the values of the registers when entering
kernel mode by the int 0x80 instruction are saved on the kernel mode
stack. You can follow all the instructions in entry.S up from
ENTRY(system_call). Right before executing the instruction call
*sys_call_table(,%eax,4) and this way jumping to the system call
function the stack looks as follows:

eax
es;
ds;
eax;
ebp;
edi;
esi;
edx;
ecx;
ebx; <- esp

By exucting the call instruction the rip (return instruction pointer) as well
as the ebp (old base pointer for the stack) both are pushed onto the stack automatically,
so that now the stack looks like:

eax
es
ds
eax
ebp
edi
esi
edx
ecx
ebx
rip
old_ebp <- esp, ebp

Now there is a trick of how to think about the pt_regs structure. With
call *sys_call_table(,%eax,4) the code jumps to my own system call: 

my_test_func(struct pt_regs regs) {
	//doing something futile here
}

When calling a function it's parameters are always on the stack just
obove the rip address (in reverse order, but it doesn't matter since we
only have one parameter here in this function). At this point the
question for me was, where the hell does the kernel (for example in the
sys_clone()-function) get the pt_regs struct from? The answer is quite
simple now. On the assembly level it simply doesn't care about where to
get pt_regs from, but simply expects, that the address just before the
rip _is_ the one for the last parameter of the function and this way the
address of our struct. That seems to be the whole magic in these lines.
Finally this means, that by accessing regs within my_test_func() we
access the same address as 0(%esp) before having called the
my_test_func() so it fits to the comments in entry.S where the order of
the saved registers on the kernel mode stack is explaind. Accessing (%
esp) with the offset described in entry.S from within my_test_func()
does not lead to success since the %esp within my_test_func() points at
an other address as it pointed before calling my_test_func() (because
there are the rip and the old ebp on the stack now too, which weren't
here before).

That's it.
I don't know if all i wrote is right, but it seems to be, since now I am
able to create a light weight process from within my own system call by
invoking sys_clone() and getting all the saved registers for sys_clone()
the way i just described. When i'm wrong it would be very nice, if
someone could write it here. I'm really interested in getting all this
work for me :)

Thanks to Mulyadi and Fernando ;)

Regards,
E.X.

On Do, 2006-05-25 at 20:25 +0200, eax wrote:
> Hi Fernando. What I need is a way to fill out the pt_regs structure
> itself with the same values that are in the pt_regs structure of the
> currently running process. Let's say I have my own system call
> sys_do_something(). So when I'm inside the kernel mode, an want to call
> sys_clone() from within this new function I must put my_regs as
> parameter to sys_clone().
> So how to find out all the values the pt_regs must contain? For example
> what could I do in the following code?
> 
> int sys_do_something() {
> struct pt_regs my_regs;
> my_regs.eax = ?
> my_regs. ... ?
> my_regs. ebp = ?
> // and so on before calling
> int ret = sys_clone(pt_regs);
> }
> 
> I do not mean the system calls parameter only but all the values that
> make it possible that the new created thread is able to be switched to
> the next time the scheduler activates it. So far I get problems because
> of "iret execption" or "bad eip" or "no vm86_info: bad". That's why I
> suppose, that the values in the my_regs I give to sys_clone aren't valid
> and that leads to problems when the context switch is made to this new
> created thread.
> 
> Generally the problem is, that I do not know how to access all the
> values that where saved for the currently running process before doing
> my system call and as far as I know it's not possible to access all the
> content of this structure from the task_struct structure of the current
> process.
> Therefore it would be nice to have something like: my_ptregs =
> get_current_pt_regs(). I hope it's clearer now what my problem is.
> 
> Thank you for help.
> 
> Regards,
> E.X.
> 
> 
> On Do, 2006-05-25 at 18:21 +0200, Fernando Apesteguía wrote:
> > AFAIK, you can get all the syscall parameters from pt_regs as follows:
> > 
> > int sys_close(struct pt_regs my_regs);
> > 
> > and inside sys_close:
> > 
> > regs.ebx or regs.eax
> > 
> > Parameters are passed into cpu registers. If you want to catch that
> > information you should bridge the original address in the syscall
> > table to point to a custom function and then call to the original
> > function from the yours one. 
> > 
> > A cleaner solution for this is to use kprobes framework to keep track
> > of the syscall parameters.
> > 
> > The parameters are passed in an ordered way (eax, ebx, ecx and so on).
> > The first parameters is always the syscall number (__NR_* constants
> > defined in unistd.h). Some syscalls needs more parameters. If this is
> > the case, one of the parameters (sorry I forgot which one...) has a
> > pointer to a memory address where the parameters are.
> > 
> > 
> > 
> > Best regards
> > 
> > 
> > On 5/25/06, eax <tcpip@xxxxxxx> wrote:
> >         Hi all.
> >         Does anyone know how to get all the information a pt_regs
> >         structure (as
> >         parameter for the sys_clone() system call) must contain? I
> >         know that
> >         this structure contains the state of the user space processes
> >         just 
> >         before doing the system call, but how to fill it within kernel
> >         mode? I
> >         suppose there exists a function or a macro that returns such a
> >         struct
> >         for the current process, but all my efforts searching for
> >         something like 
> >         that were without success so far. Calling sys_clone() from
> >         within the
> >         kernel mode makes this neccessary I think.
> >         
> >         In addition I'm interested in something like a specification
> >         for linux
> >         system calls in general. For example how should I know how
> >         sys_clone 
> >         expects the child stack to be set up and so on. Is there such
> >         a
> >         document? After all the glibc developer suceeded implementing
> >         all this,
> >         so how did they know how to do it? :)
> >         
> >         Thank you all in advance.
> >         
> >         Regards,
> >         E.X.
> >         
> >         
> >         
> >         
> >         -----BEGIN PGP SIGNATURE-----
> >         Version: GnuPG v1.4.3 (GNU/Linux)
> >         
> >         iEYEABECAAYFAkR1g5UACgkQhW3f3uMdmbF4ZgCg67ijyvIvYeRoQrtGuKdVSdsv
> >         pQkAoM8bRMv975I3J+OGxY8tfjauMLQq
> >         =mt3H
> >         -----END PGP SIGNATURE-----
> >         
> >         
> > 
Attachment:
signature.asc

Description: This is a digitally signed message part