Re: + revert-ipc-dont-allocate-a-copy-larger-than-max.patch added to -mm tree

Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> · Tue, 26 Mar 2013 22:33:35 -0400

On Tue, 2013-03-26 at 18:51 -0700, Andrew Morton wrote:
> On Tue, 26 Mar 2013 21:44:12 -0400 Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:
> 
> > On Tue, 2013-03-26 at 12:43 -0700, akpm@xxxxxxxxxxxxxxxxxxxx wrote:
> > > The patch titled
> > >      Subject: revert "ipc: don't allocate a copy larger than max"
> > > has been added to the -mm tree.  Its filename is
> > >      revert-ipc-dont-allocate-a-copy-larger-than-max.patch
> > > 
> > > Before you just go and hit "reply", please:
> > >    a) Consider who else should be cc'ed
> > >    b) Prefer to cc a suitable mailing list as well
> > >    c) Ideally: find the original patch on the mailing list and do a
> > >       reply-to-all to that, adding suitable additional cc's
> > > 
> > > *** Remember to use Documentation/SubmitChecklist when testing your code ***
> > > 
> > > The -mm tree is included into linux-next and is updated
> > > there every 3-4 working days
> > > 
> > > ------------------------------------------------------
> > > From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > > Subject: revert "ipc: don't allocate a copy larger than max"
> > > 
> > > Revert 88b9e456b164.  Dave has confirmed that this was causing oopses
> > > during trinity testing.
> > 
> > No, he didn't.
> 
> hm, so perhaps he didn't run the test for long enough?
> 
> > Here's a copy of Dave Jones's original report [1] on this very same bug
> > in linux-next on Feb 19, __6 days before__ I even submitted the series
> > that fixes this bug.
> > 
> > Note that the faulting instruction is __identical__ to Dave's most
> > recent report on 3.9-rc4:
> 
> Well bah.  Why wasn't I told this (sufficiently clearly for it to sink in)?
> 
> > My recommendation is to either:
> > 1) apply my entire 'ipc MSG_COPY fixes' series
> > --or--
> > 2) revert the entire ipc MSG_COPY implementation that introduced this
> > bug to begin with.
> 
> urgh.  I really don't want to merge a pile of patches, one of which we
> think fixes a bug which we don't understand for reasons we don't
> understand.
> 
> Not only does this generally suck, but it also creates a nightmare for
> maintainers of 3.8.x kernels - what patch should they merge to fix that
> bug?

(I didn't think MSG_COPY was in 3.8 because the first time we were
hitting this was in linux-next in Feb).

> I'm about to disappear for 1.5 weeks.  Stanislav, someone, please let's
> get this sorted out!

I just figured out how the queue is being corrupted and why my series
fixes it.

With MSG_COPY set, the queue scan can exit with the local variable 'msg'
pointing to a real msg if the msg_counter never reaches the copy_number.

The remaining execution looks like this:

	if (!IS_ERR(msg)) {
		....
		if (msgflg & MSG_COPY)
			goto out_unlock;
		....

out_unlock:
			msg_unlock(msq);
			break;
		}
	}
	if (IS_ERR(msg))
		....

	bufsz = msg_handler();
	free_msg(msg);			<<---- msg never unlinked

Since the msg should not have been found (because it failed the match
criteria), the if (!IS_ERR(msg)) clause should never have executed.

That's why my refactor fixes resolve this; because msg is not
inadvertently treated as a found msg.

But let's be honest; the real bug here is the poor structure of this
function that even made this possible. The deepest nesting executes a
goto to a label in the middle of an if clause. Yuck! No wonder this
thing's fragile.

So my recommendation still stands. The series that fixes this has been
getting tested in linux-next for a month. Fixing this some other way is
just asking for more trouble.

But why not just revert MSG_COPY altogether for 3.9?

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html