Re: module locking

Jon Masters <jonathan@xxxxxxxxxxxxxx> · Mon, 14 Dec 2009 06:30:19 -0500

On Mon, 2009-12-14 at 11:24 +0100, Kay Sievers wrote:
> On Mon, Dec 14, 2009 at 02:12, Jon Masters <jonathan@xxxxxxxxxxxxxx> wrote:
> > On Sat, 2009-12-12 at 04:29 -0500, Jon Masters wrote:
> >
> >> FYI I am diagnosing an unlikely locking issue on a RHEL system right now
> >> that triggers when you have a read-only filesystem and can't use file
> >> locks. I know we ripped out a lot of that code since in upstream but I
> >> might need to address this differently. I'll followup with my findings.
> >
> > I have written a sysV shared memory locking mechanism that using an shm
> > segment in the case that the filesystem is mounted read-only. I'll
> > finish the work on the RHEL bug and then forward port it, and post. We
> > can probably re-introduce this because it works in a read-only context
> > and can't be abused by a user to prevent module loading because the IPC
> > shm segment is user-specific. Only downside is (optional, I guess)
> > dependency on sysV shm. But I don't think we honestly have anyone
> > wanting to use m-i-t on a system without such things compiled in.
> 
> Why would we need anything like that? Shouldn't the load syscall
> serialize all that properly?

It doesn't quite though (in the 2.6.18 kernel I'm focusing on right now,
upstream has some slight changes in the module loader code since then).
We only run under stop machine during the actual link and unlink, so
there's a possibility for a second modprobe to come along while another
is running. The existing code upstream was ripped out because there is a
possibility for the flocking to be abused by a user opening the module
file and because it's not normally a big deal if a second modprobe comes
along - worst case the module is already loaded by another instance and
we will fail harmlessly to load it a second time. Totally no problem.

But you know how these Enterprise distros are ;) there's a harmless
warning generated in the system kernel log in case that you have
multiple modprobes running at the same time trying to load a module with
deps that will fail because the hardware isn't available, and you're on
a root filesystem without the possibility of the previous locking code
(still in there) working(!) :P The old code even accounts for this by
spinning in the procfs reading code, looking for the "Loading" and
"Unloading" state change. The warning is harmless, but anyway.

We probably don't care in upstream as a vary contrived case that causes
no harm isn't enough to justify that locking be re-introduced. But I'm
sharing mostly for your interest at this point.

> The only thing that should happen is needless work, that the failing
> syscall tells us to discard -- which is probably more reliable

Right. Except in the case that you are loading a useless module (with a
dep that will fail) in the first place. In that case there seems to
still be a rare race (again, this is older 3.3 on an old distro, and
both the kernel and upstream have changed so I am going to get back to
seeing how we're affected now) in which we'll manage to convince
ourselves that one of the deps is loaded (it loads, but fails during
init because the hardware isn't available, though we'll run the
scheduler in between after we link it in before calling init, so another
modprobe might run) long enough to attempt to insert the module itself
and then fail to find one of the dependent symbols.

Rusty did some fixes to symbol resolution I'm also looking at and this
might turn out to be fixed there. Anyway, it's all harmless in what
happens but I thought I'd mention I'm thinking about locking and whether
we should revive something for contrived issues.

Jon.

--
To unsubscribe from this list: send the line "unsubscribe linux-modules" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html