Re: scheduler is a process and driver context?

Frederic Weisbecker <fweisbec@xxxxxxxxx> · Thu, 17 Jun 2010 22:18:01 +0200

On Thu, Jun 17, 2010 at 07:30:53PM +0300, Sudipta GHOSH wrote:
> Hi,
> 
> Memory management , scheduling is part of core kernel.
> 
> Is it a process or special code resides in RAM?
> 
> As I see init process has PID 0, so the kernel code is a process or
> special code.
> 
> when there is an interrupt, device driver executes some code, in which context?
> 
> How data from userland to kernel space is transferred (user process to driver)
> 
> Thanks,
> -S

So, the kernel is not a whole process or so, better consider it as a
raw bunch of code stored in a specific ring (ring 0) that can be executed in
many contexts.

The kernel code may be executed in the following contexts:

- task
- irq
- exception

= task =

There are two ways for the kernel to execute a path in a
task context.
The first is the syscall path. When you are in userspace and
you want to execute a syscall, you call a specific arch instruction
that makes you enter the kernel (in x86, this path starts at
arch/x86/kernel/entry_32.S, search ENTRY(system_call)).

There you will execute some kernel code that services the userspace
request (open a file, read, etc..). Say firefox enters the kernel
to open a file, things are executed in the context of the firefox
task. There are some differences against userspace, like you use a kernel
stack there.

So for example drivers have their most part accessible from syscalls
and then task contexts.

The other way for the kernel to be accessed from a task is through
a kernel thread, those are internal tasks that do specific jobs.
For example there is the idle task, which is the fallback task when
there is nothing else to do (no other tasks that want the cpu).
The workqueues are another example, they execute some works
that need to be done asynchronously. For example when an irq
needs to do something that might sleep, it queues a work there.

Now concerning the scheduler, most parts of it are executed
in task contexts.

If you do a context switch for example, task x -> task y, the
first half of the context switch is executed by task x, the second
half by task y.

When yo fork, you execute in the parent context (the child will
get its turn on a later context switch).

When you wake up a task, you execute in the context of the waker
(the wakee will again get its turn on a later context switch).

There are some more particular cases like parts of the SMP load
balancing are made from the idle task: when there is nothing
left to do in a cpu, idle will execute some scheduler code to pull
tasks from other cpus.

= irq =

Irqs can interrupt any task contexts, in fact they can interrupt any
context that don't have irqs masked. And irqs are a specific context:
they use an irq specific stack, etc...

So the small part of the drivers that execute in irq context is their
irq handlers.

softirqs are a specific case. They are somehow artificially created
interrupts. They can be executed in two different contexts: in the end
of an irq (a hardirq), then they use the irq stack (but may be some
archs use a softirq specific stack, I don't know). Or they can
be executed in the context of a task (the ksoftirqd task). This
only happens if servicing softirqs takes too much time in the end
of a hardirq, and then we want to defer a bit the rest of the
softirq work. For that we kick the ksoftirqd task that will relay
the rest.

There are parts of the scheduler load balancing that are made from
softirq.

= exception =

exceptions happen when userspace or kernelspace execute something that
traps (page fault, breakpoint, ...). That too uses a specific stack
(at least in x86-64) but is executes in the context that did the
exception.

> Memory management , scheduling is part of core kernel.
>
> Is it a process or special code resides in RAM?

So it depends, as outlined above, the scheduler code can be
executed in different contexts.

Memory management is about the same issue: memory structures
of a child are allocated on fork (from the parent), or exec
(from the task that exec). Later on, page faults are serviced
from exception (usually in the context of a task). But in fact
memory management also has its own threads. kswapd can be kicked
to swap memory on need. The writeback also has its own threads,
etc...

So don't consider the Memory management or the scheduler as
tasks or irqs (although part of them may use irqs or specific tasks),
rather consider them as "libraries", this is what they are except
for some standalone parts of them.

In fact this is the same for the whole kernel: it is mostly a
big library, but also with some standalone parts.

> How data from userland to kernel space is transferred (user process to driver)

When kernel accesses userspace datas, this is in the context of a task (mostly,
it can also be from irqs), but the userspace memory of this task is
pageable. And the address of the userspace pointer can be a bad one.
We use copy_from_user() and copy_to_user() to handle that.

- if the pointer points to a page that is on memory, it's fine
- if the memory pointed is swapped, there is a page fault, and the
  page is retrieved, it's fine
- if it's a bad pointer, there will be a page fault, but it won't crash
  because copy_from/to_user will tell it can handle this page fault (setting
  a specific fixup for this), and it will do so by returning an error.

If this is made from irq, we can't sleep, so we can't play with
swapping. In this case we are only able to fetch user memory if
it is not on the swap.

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ