Re: [RFC v4 02/25] um lkl: architecture skeleton for Linux kernel library

Hajime Tazaki <thehajime@xxxxxxxxx> · Thu, 02 Apr 2020 15:44:44 +0900

Hello,

# dropped an address from Cc as it's not reachable.

On Wed, 01 Apr 2020 05:16:58 +0900,
Johannes Berg wrote:

> > > For the step 1, we put LKL as one of UMMODE in order to make less effort to
> > > integrate (make ARCH=um UMMODE=library).  The modification to existing UML
> > > code is trying to be minimized.
> 
> > The current step (1 in the milestone) tries to cover this goal:
> > splitting ARCH=um into UMMODE_KERNEL and UMMODE_LIB.
> 
> So maybe we're doing this backwards?
> 
> I mean ... you're trying to minimize the UML code changes, while I'm
> sort of arguing for maximizing them, to achieve a cleaner split.

I see the point.

> In a sense, I think if this is to happen, then we're in it for the long
> haul. Meaning that we don't actually need all of this working
> immediately.
> 
> So I think conceptually we should answer the questions that I raised
> below first (basically a kind of "can it be done?" question), and then
> work towards that goal? IMHO.

okay, agree.

> > > > 1) We give up, and get ARCH=lkl, sharing just (some of) the drivers
> > > >    while moving them into the right drivers/somewhere/ place. Even that
> > > >    looks somewhat awkward looking at the later patches in this set, but
> > > >    seems like that at *least* should be done.
> > > 
> > > Yeah, this would be a goal.
> > > UML and LKL are quite different but they should share at least their userspace
> > > drivers.
> > > I also don't mind if we don't share every driver at the beginning but
> > > it should be
> > > a feasible goal for the future.
> > 
> > Sharing drivers code is also included in this patchset, step 2 in the
> > milestone.
> > 
> > I was thinking that implementing os_*() functions with lkl_host_ops
> > would be the further goal (e.g., step 3 or 4).
> 
> Personally, I think this is backwards. That step is the actually
> *interesting* part, because if this turns out not to be possible, then
> we should pick option (1) instead of trying to do option (2), failing,
> and leaving the code a mess (at least personally I think that after this
> patchset, the code is kinda a mess with all the ifdefs, duplication,
> etc.) Yes, I know you're in this for the long haul, but still - it'd be
> a shame to have to do that.
> 
> So in a sense, I myself would actually prefer to have an LKL _without_
> drivers, but integrated well with UML, over the one that you have now.

LKL without drivers might be nothing, but let's see if this will end a
clean, and minimal viable patchset.

> > > > 2) Ideally, instead, we actually unify: LKL grows support for userspace
> > > >    processes using UML infrastructure, the "in-kernel" IRQ mechanisms
> > > >    are unified, UML stuff moves into lkl-ops, and the UML binary
> > > >    basically becomes a user of the LKL library to start everything up.
> > > >    There may be some bits remaining that are just not interesting (e.g.
> > > >    some drivers you don't care about would continue to make direct calls
> > > >    to the user side instead of lkl-ops, and then they're just not
> > > >    compatible with lkl, only with the uml special case of lkl), but then
> > > >    things are clean.
> > > 
> > > A few months ago I though this is doable but now I'm not so sure anymore.
> > 
> > For the part of (2) which Johannes pointed out (I mean the part "UML
> > stuff moves into lkl-ops"), I become to think that implementing os_*()
> > functions using lkl_host_ops would be also interesting if those
> > re-implementation makes the glue code eliminated.
> > 
> > I'll work on that.
> 
> Don't go too fast :-)
> 
> I really think that this only makes sense if we can also share much of
> the other code, e.g. the interrupt processing, thread model, etc. If we
> just share the lkl ops underneath, and then end up implementing two IRQ
> models and all on top of those, IMHO we've won nothing.
> 
> So I (at least) really see it as a choice between these two options:
> 
> 1) add LKL as arch/lkl/ and share the drivers, but not the arch code
> 
> 2) really unify LKL and UML, and have them share the arch code, and make
>    UML a special case of LKL, but not in the sense that it has vastly
>    different arch code (like special interrupt handling, etc.)

1) is not an option; we discussed this before not to have similar
archs in different directories.

# Richard, correct me if I'm wrong.

2) may be doable, the followings would be the list of things what we need:
- unify the driver codes
- unify interrupt handling, need to be x86/linux-independent
- unify thread model (i.e., struct thread_info)
- unify code scheduling  (e.g., __switch_to())
- unify memory management (mmu v.s. nommu)
- unify host interface (os_*() v.s. lkl_host_ops)
- support multiple syscall handlings (ptrace-based interception,
  direct func call, etc)
- (may still miss something..)

Those are actually the list of differences between UML and LKL.
And each item is inter-related: interrupt handling depends on how the
code/thread is executed, thread implementation interacts with memory
management, etc.

Thus for instance, unifying IRQ mechanism may involve several points
of the above list to be reorganized/reimplemented UML code to support
the library mode.  I thought this makes broader changes to UML, which
I was trying to avoid in v4 patch.

> Now, you allude to the fact that UML is pretty much x86 only, and
> perhaps that's a point where we can do (2) but only support userspace
> programs on x86, or such?

x86/linux only (LKL also has x86/win32 support).

> I don't know where the host architecture
> actually comes in much in UML, and where that may or may not be the case
> in LKL.
> 
> > For the other part of (2), I agree that your definition of the
> > unification will be the best and final goal of this integration.
> 
> Fair, but the problem is that we have to decide *now* whether it's
> actually possible or not. If not, then IMHO it's a bad choice to even
> put LKL under arch/um/.
> 
> > But, especially the support for UML userspace processes in LKL is not
> > easy as far as I can see
> 
> OK, I'll bite - why not? I mean, what's different enough in LKL and UML
> to make this "not easy"?
> 
> I'm not trying to paint you into a corner here with that, I'm just
> trying to understand the innards of LKL better. I have a _bit_ of a
> grasp of the UML internals by now, but of course not LKL.
> 
> So where do they differ? Conceptually, they seem very similar, but the
> details actually are different.
> 

The differences (and also conflicts) between UML and LKL are pretty
much the unification list of above 2).

The current LKL design assumes that there is only a single
process/application (but can have multiple threads) in a single LKL
kernel instance.

This assumption makes the design simple in several places of the
kernel:

- no memory protection across multiple users/processes needed
- no address space separation between user- and kernel-code

as a result, userspace code can directly call syscalls as function
calls.

This part is needed to expand more to support multiple processes
running/spawned (as UML does), which I mentioned the current LKL
doesn't have.

> But I have the same question on e.g. the IRQ model. I mean, OK, I
> understand that LKL started from a different base and all, but is it
> actually *necessary* for LKL to have a different IRQ model? Or is that
> "just" intertia?

As for the interrupt handling, the currently LKL's interrupt is
triggered by the function call (lkl_trigger_irq or equivalent) while
UML's one is triggered by fd notification (epoll).
# LKL actually uses fd events but encapsulates into the lkl_host_ops.

So now we have three options for interrupt handling:
- use UML IRQ model in LKL (and UML)
- use LKL IRQ model in UML (and LKL)
- use two IRQ models (the current v4 patch)

I gave up to take the 1st approach as it drops host-independent
characteristics of LKL, thus took 3rd approach.  So to answer your
question, yes, it was needed to have a different IRQ model.

But maybe I should try the 2nd approach as well to avoid duplication.
I will conduct another experiment if this sounds the right direction.

> > Or the title of the cover letter is somehow overstatement: instead,
> > "Minimum integration of UML with LKL" or something like this would be
> > better ?
> 
> Heh, well, doesn't really matter?
>
> But again, there are a few different aspects here:
>  - what's technically feasible
>  - what this patchset achieves
>  - where we want to be in the end
> 
> I think right now these are diverging enough that we can't even answer
> that last question, much less find the road to get there.

I think the option 2) is the final goal.

> > Since the patchset of LKL is relatively huge, I was trying to make a
> > minimum patchset for the first step.  Because having #ifdefs and glue
> > code makes existing arch/um code untouched, I took this way to make
> > the patchset as small as possible.
> > 
> > But if this is not the case, I will look for the other way and touch
> > arch/um code to fit the current LKL implementation.
> > 
> > What do you think ?
> 
> I think that'd be fine, if indeed that's what we want to do.
> 
> I really think we're beating around the bush, and need to first figure
> out the technical differences between UML and LKL and decide between the
> options (1) and (2) above. Maybe there's a compromise there somewhere,
> where some small bits of code still _are_ different, but IMHO having two
> (IRQ, thread, memory) models, two host interfaces (lkl-ops vs. os_*
> functions), even two include/asm/ source trees (and so on) is not
> appropriate.
> 
> This may take some patches, and some experimentation. I'd leave drivers
> out of this initially - you should be able to test LKL with something
> simpler, right? The API surface is basically the syscall interface as
> functions, so you can start the library and call something simple
> initially? Though I guess you need some driver for the IRQ model to make
> sense, etc.

okay.

> And like I said before, that decision will frame everything else. I
> really don't think we can make significant progress here without having
> decided whether this is possible.
>
> Perhaps UML *can* become a "special case" of LKL, with a special API
> function (that's not part of the syscall surface) to "boot(cmdline)" or
> something. But if it can't, and has to remain as separated as the two
> are today, I would argue we're better off just not calling them the same
> architecture.

I agree with this if the unification has all completed.

Thanks,
-- Hajime