Re: scheduling while atomic followed by oops upon conntrackd -c execution

Kerin Millar <kerframil@xxxxxxxxx> · Mon, 05 Mar 2012 17:19:49 +0000

Hi Pablo,

On 04/03/2012 11:01, Pablo Neira Ayuso wrote:
Hi Kerin,

On Sat, Mar 03, 2012 at 06:47:27PM +0000, Kerin Millar wrote:
Hi,

On 03/03/2012 13:30, Pablo Neira Ayuso wrote:
I just posted another patch to the ML that is a relative fix to
Jozsef's patch. You have to apply that as well.

I've now tested 3.3-rc5 with the addition of the above mentioned
follow-on patch. The behaviour during conntrackd -c execution is
clearly much improved - in so far as it doesn't generate much noise
- but the crash that follows remains. Here's a netconsole capture:-

http://paste.pocoo.org/raw/560439/

Great to know :-).

I apologize but I think I may have led you astray on the nf_nat issue. 
At the time of submitting my original report, I now believe that the 
nf_nat module wasn't loaded prior to starting conntrackd, although it 
was definitely available. For all tests that followed, however, I am 
entirely certain the the nf_nat module was loaded in advance. The upshot 
is that my claim that things had improved may have been premature; I 
need to specifically test under both circumstances to be sure that 
things are improving. That is, both with and without the module loaded 
in advance.

Following my own advice then, I first tried going through my test case 
*without* loading nf_nat in advance. Alas, conntrackd -c triggered hard 
lockups and didn't return to prompt. Here are the results:-

http://paste.pocoo.org/raw/561350/

In case it matters, the existing ssh session continued to respond to 
input but I was no longer able to initiate any new sessions.

Regarding your previous email, I'm sorry, by reading your email I
thought you were using 2.6.32 which was not the case, your
configuration is perfectly reasonable.

It seems we still have problems regarding early_drop, but this time
with reliable event delivery enabled (15 seconds is the time that
is required to retry sending the destroy event).

If you can test the following patch, I'll appreciate.

Gladly. I applied the patch to my 3.3-rc5 tree, which is still carrying 
the two patches discussed earlier in the thread. I then went through my 
test case under normal circumstances i.e. all firewall rules in place, 
nf_nat confirmed present before conntrackd etc. Again, conntrackd -c did 
not return to prompt. Here are the results:-

http://paste.pocoo.org/raw/561354/

Well, at least there was no oops this time. I should also add that the 
patch was present for both of the tests mentioned in this email.

---
Incidentally, I found out why the internal cache on the master was 
filling up to capacity. It was apparently due to the use of "iptables -I 
PREROUTING -t raw -j CT --ctevents assured". Perhaps I'm missing 
something but doesn't this stop events such as new and destroy from 
being propagated? An inspection with conntrack -E suggests so. Once I 
removed the above rule, I could see destroy events being propagated and 
the number of active connections in the cache no longer exceeded my 
chosen limit of 2097152 ...

# conntrack -S | head -n1; conntrackd -s | head -n2
entries                 725826
cache internal:
current active connections:          1409472

Whatever the case, I'm quite happy to go without this rule as these 
systems are coping fine with the load incurred by conntrackd.

Cheers,

--Kerin

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html