Re: 2.6.33.6-rt28 kernel oops while stressing network

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi John,

First of all, you will want to become familiar with the 'chrt' command
(available also as an applet part of busybox).  That command (plus some
of the finer details in optional output from the 'ps' command) is how
you will tweak and experiment on a target system.

Then the attached may be of use (I attached them because they are
small).  rtctl is a package that originates from Red Hat.  If you are
using busybox shell (rather than full blown bash), then apply the
attached patch to it.

Extract the rpm something like...

rpm2cpio rtctl-1.7-1.el5rt.src.rpm | cpio --extract --make-directories

...assuming you have rpm2cpio installed on your host.  Then extract any
tarballs that were bundled inside it.  Then apply the attached patch to
it (if not using bash on your target).

Then install to your target rootfs staging area something like...

install -D -m 755 rtctl $MY_STAGING_DIR_PATH/usr/sbin/rtctl
install -D -m 644 rtgroups $MY_STAGING_DIR_PATH/etc/rtgroups
install -D -m 755 rtctl.sysconfig
$MY_STAGING_DIR_PATH/etc/sysconfig/rtctl 

The package includes more but that's all I've ported and tried out so
far on a uclibc/buildroot style embedded Linux with RT kernel so far.

Edit the rtgroups file accordingly to implement an RT policy that works
for you (based on outcome of your experiments using chrt).

What hasn't been implemented yet is how to hook rtctl into your system
initialization scripts.  The stock content from the Red Hat package is
for their real time distro.  You'd need to invoke it from your scripts
accordingly.


Regards,

Darcy

On Fri, 2010-08-13 at 12:16 -0700, John Culvertson wrote:
> Thanks for the pointers.  I have seen others mention adjusting the
> soft irq thread priorities, etc.  Can you shed any light on how to go
> about doing that?  Is that in the kernel configuration, or do you have
> to modify the kernel source?
> 
> On Fri, Aug 13, 2010 at 2:14 PM, Darcy Watkins <DWatkins@xxxxxxxxxxx>
> wrote:
> > Hi John,
> >
> > In the 'make menuconfig', look for and double check any config
> settings related to IRQ sharing, PCI, etc.  There are a lot of
> tweaks.  Probably enough to write a PhD dissertation about.
> >
> > The other thing, if it only happens under high network stress, you
> may want to check into tweaking the real time priorities of your
> kernel threads, user space program threads and even IRQ threads.  RT
> kernels tend to treat priorities more strictly so it is possible for a
> high priority thread to hog the CPU and deplete resources before a
> lower priority thread processes them and frees up the resources.
> >
> > Regards,
> >
> > Darcy
> >
> >
> > -----Original Message-----
> > From: John Culvertson [mailto:jculvertson@xxxxxxxxx]
> > Sent: Friday, August 13, 2010 11:06 AM
> > To: Darcy Watkins
> > Subject: Re: 2.6.33.6-rt28 kernel oops while stressing network
> >
> > Thanks for the suggestions.  I have tried the unpatched 2.6.33.7
> > kernel, and the problem does not occur.  The hardware is a single
> > board industrial computer with the network controllers onboard, so I
> > cannot easily try different NICs.  I have not seen the problem occur
> > with only one port in use, but I have not tested that long enough to
> > be positive.
> >
> > One thing that may be a little odd about this computer is that both
> > Ethernet controllers (Intel 82559) share the same PCI interrupt.
> > Interrupt sharing should be OK, but since adjacent PCI slots in
> normal
> > PCs generally use different interrupts, it may not occur often in
> > other systems.
> >
> > On Fri, Aug 13, 2010 at 1:56 PM, Darcy Watkins
> <DWatkins@xxxxxxxxxxx> wrote:
> >> Hi John,
> >>
> >> I use Fedora 13 which has 2.6.33.6 as the kernel (without RT).
> >>
> >> My machine has three net i/f in it.  Two PCI net cards with Realtek
> >> chipset and Intel PRO built into the mainboard's chipset.
> >>
> >> When I installed Fedora using the netboot USB flash drive, it
> insisted
> >> on using one of the Realtek interfaces for Internet connection so
> that
> >> is my eth0.  All fine.
> >>
> >> More recently, I activated the other two net i/f for private LAN to
> >> target HW test network.  I set one to 10.0.0.1 and the other to
> >> 192.168.101.4 and connected them to the target network.  Note that
> they
> >> were both connected to the same network switch.
> >>
> >> Shortly after that, the system froze.  After reboot it would run
> for a
> >> while and then freeze.  I unplugged the net i/f based on the Intel
> PRO
> >> and all has been fine since.
> >>
> >> I mention all this because you once mentioned you were using two
> Intel
> >> net i/f.  It may not even be RT related.
> >>
> >> I suggest you try (not in any particular order, but each on its
> own)...
> >>
> >>   - running with only one net i/f connected
> >>   - building a side-by-side vanilla kernel 2.6.33.7 (without RT
> patch)
> >> and running your two net i/f without RT
> >>   - using different net i/f cards (say based on Realtek or
> something
> >> other than Intel) try running it with RT
> >>
> >> If you see your system behavior change related to any of these, it
> >> possibly may not be RT patch related (or it could be tied to a
> specific
> >> driver).
> >>
> >> Regards,
> >>
> >> Darcy
> >>
> >> -----Original Message-----
> >> From: linux-rt-users-owner@xxxxxxxxxxxxxxx
> >> [mailto:linux-rt-users-owner@xxxxxxxxxxxxxxx] On Behalf Of John
> >> Culvertson
> >> Sent: Friday, August 13, 2010 10:38 AM
> >> To: linux-rt-users@xxxxxxxxxxxxxxx
> >> Subject: Re: 2.6.33.6-rt28 kernel oops while stressing network
> >>
> >> Since it was my understanding that x86 was the most mature and
> stable
> >> architecture for preempt-rt, I was surprised when I immediately
> >> encountered problems.  Is this typical when trying the patches on a
> >> new platform?  Like I mentioned before, I am a newbie with
> preempt-rt.
> >>
> >> On Wed, Aug 11, 2010 at 12:53 PM, John Culvertson
> >> <jculvertson@xxxxxxxxx> wrote:
> >>> I updated to 2.6.33.7-rt29, and I am seeing similar symptoms.
> >>>
> >>> [ 2120.781166] BUG: unable to handle kernel paging request at
> c11cd497
> >>> [ 2120.784018] IP: [<c11d5ce2>] tcp_set_skb_tso_segs+0x33/0x85
> >>> [ 2120.784018] *pde = 1d7f6063 *pte = 011cd161
> >>> [ 2120.784018] Oops: 0003 [#1] PREEMPT
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe
> >> linux-rt-users" in
> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >
> 
> 

Index: rtctl-1.7/rtctl
===================================================================
--- rtctl-1.7.orig/rtctl
+++ rtctl-1.7/rtctl
@@ -1,4 +1,4 @@
-#!/bin/bash
+#!/bin/sh
 
 usage ()
 {
@@ -26,67 +26,62 @@ shift
 
 GROUPNAME=""
 
+ALL_GROUPS=`awk '/^[a-zA-Z_0-9-]+:[*orbf]:[0-9]+:.+$/ { split($0, parts, ":") ; print parts[1] }' ${RTGROUPFILE}`
+
+group_properties_of()
+{
+	local grouprec=`awk '/^[a-zA-Z_0-9-]+:[*orbf]:[0-9]+:[*a-fA-F0-9]+:.+$/ { split($0, parts, ":") ; if (parts[1] == groupname) print }' groupname=$1 ${RTGROUPFILE}`
+	if [ -n "$grouprec" ] ; then
+		# 5 field record format
+		GROUP_AFFINITY=`echo $grouprec | cut -d ':' -f 4`
+		GROUP_REGEX=`echo $grouprec | cut -d ':' -f 5`
+	else
+		grouprec=`awk '/^[a-zA-Z_0-9-]+:[*orbf]:[0-9]+:.+$/ { split($0, parts, ":") ; if (parts[1] == groupname) print }' groupname=$1 ${RTGROUPFILE}`
+		if [ -n "$grouprec" ] ; then
+			# 4 field legacy record format
+			GROUP_AFFINITY="*"
+			GROUP_REGEX=`echo $grouprec | cut -d ':' -f 4`
+		else
+			return 1
+		fi
+	fi
+	local gname=`echo $grouprec | cut -d ':' -f 1`
+	GROUP_SCHED=`echo $grouprec | cut -d ':' -f 2`
+	GROUP_PRIORITY=`echo $grouprec | cut -d ':' -f 3`
+	GROUP_PIDS=`ps -eo pid,cmd | fgrep -v $GROUP_REGEX | egrep $GROUP_REGEX | awk '{ print $1 }'`
+	return 0;
+}
+
 #
 # print the PIDs of processes belonging to ${GROUPNAME} as defined
 # in ${RTGROUPFILE}.
 #
 group_pids ()
 {
-  ps -eo pid,cmd | awk '
-    /^[a-zA-Z_0-9-]+:[*orbf]:[0-9]+:.+$/ {
-      split($0, parts, ":") 
-      if (parts[1] == groupname) {
-	  nr_rules += 1
-	  regexp_offset = length(parts[1]) + length(parts[2]) + length(parts[3]) + 4
-	  if (length(parts) > 4) {
-	    regexp_offset += length(parts[4]) + 1
-	  }
-	  group_regexps[nr_rules] = substr($0,regexp_offset)
-      }
-    }
-    /^ *[0-9]+ .+$/ {
-      for (i = 1; i <= nr_rules; ++i) {
-	if (match($2, group_regexps[i])) {
-          print $1
-	  break
-        }
-      }
-    }' groupname=${GROUPNAME} ${RTGROUPFILE} -
+	if group_properties_of ${GROUPNAME} ; then
+		echo "$GROUP_PIDS"
+	else
+		return 1
+	fi
+	return 0
 }
 
 
 set_group_defaults ()
 {
-  ps -eo pid,cmd | awk '
-    /^[a-zA-Z_0-9-]+:[*orbf]:[0-9]+:.+$/ {
-      split($0, conf, ":")
-      if (groupname == "" || conf[1] == groupname) {
-        nr_rules += 1
-        group_sched[nr_rules] = conf[2]
-        group_prio[nr_rules] = conf[3]
-	regexp_offset = length(conf[1]) + length(conf[2]) + length(conf[3]) + 4
-	if (length(conf) < 5) {
-	  group_affinity[nr_rules] = "*"
-	} else {
-	  regexp_offset += length(conf[4]) + 1
-	  group_affinity[nr_rules] = conf[4]
-	}
-        group_regexps[nr_rules] = substr($0,regexp_offset)
-      }
-    }
-    /^ *[0-9]+ .+$/ {
-      for (i = nr_rules; i >= 1; --i) {
-	if (match($2, group_regexps[i])) {
-	  if (group_sched[i] != "*") {
-            print "chrt -p -" group_sched[i] " " group_prio[i] " " $1
-	  }
-	  if (group_affinity[i] != "*") {
-            print "taskset -p " group_affinity[i] " " $1 " > /dev/null"
-	  }
-          break
-        }
-      }
-    }' groupname=${GROUPNAME} ${RTGROUPFILE} - | sh
+	if group_properties_of ${GROUPNAME} ; then
+		for pid in $GROUP_PIDS ; do
+			if [ "$GROUP_SCHED" != "*" ] ; then
+				chrt -p -$GROUP_SCHED $GROUP_PRIORITY $pid
+			fi
+			if [ "$GROUP_AFFINITY" != "*" ] ; then
+				taskset -p $GROUP_AFFINITY $pid > /dev/null
+			fi
+		done
+	else
+		return 1
+	fi
+	return 0
 }
 
 
@@ -149,8 +144,13 @@ case "$CMD" in
         [ $# -gt 1 ] && usage
         if [ $# -ne 0 ]; then
             GROUPNAME=$1
+	        set_group_defaults
+		else
+			for grp in $ALL_GROUPS ; do
+	            GROUPNAME=$grp
+		        set_group_defaults
+			done
         fi
-        set_group_defaults
 	;;
 
     "show")

Attachment: rtctl-1.7-1.el5rt.src.rpm
Description: application/rpm


[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux