cgroup_fj tests will stick the nort kernel

Qiang Huang <h.huangqiang@xxxxxxxxxx> · Fri, 19 Apr 2013 15:30:23 +0800

Hi,

I ran cgroup_fj tests on RT kernel with PREEMPT_RT_FULL disabled, it will
stick the system when ran cpuset stress tests, it happens everytime.

Here stick the system means there are almost no response from the system and
we can hardly do anything on the terminal, but kernel isn't crash nor deadlocked
(according to the lockdep message), and it may do some response sometimes.

The problem exists on all RT versions from 3.4.18-rt29 to 3.4.37-rt51 AFAIK, but
without RT patches or with PREEMPT_RT_FULL enabled, the problem isn't exists.

When the system is stuck, we will get the following message:
# dmesg
...
[96967.772181] NOHZ: local_softirq_pending 200
[96967.776398] NOHZ: local_softirq_pending 200
[96967.780212] NOHZ: local_softirq_pending 200
[96967.781215] NOHZ: local_softirq_pending 200
[96967.784152] NOHZ: local_softirq_pending 200
[96967.784310] NOHZ: local_softirq_pending 200
[96967.788239] NOHZ: local_softirq_pending 200
[96967.796092] NOHZ: local_softirq_pending 200
[96967.800089] NOHZ: local_softirq_pending 200
[96967.800225] NOHZ: local_softirq_pending 200
[97112.950055] ------------[ cut here ]------------
[97112.950068] WARNING: at /usr/src/packages/BUILD/kernel-default-3.4.24.03/linux-3.4/kernel/workqueue.c:1208 worker_enter_idle+0x1d3/0x200()
[97112.950073] Hardware name: Tecal RH2285
[97112.950076] Modules linked in: reiserfs minix hfs vfat fat tun xt_limit xt_tcpudp nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 x_tables dummy edd cpufreq_conservative cpufreq_userspace
cpufreq_powersave acpi_cpufreq mperf loop dm_mod coretemp crc32c_intel igb ghash_clmulni_intel aesni_intel cryptd aes_x86_64 aes_generic iTCO_wdt bnx2 iTCO_vendor_support i7core_edac pcspkr i2c_i801
dca edac_core button rtc_cmos microcode serio_raw i2c_core ses enclosure sg mptctl ext3 jbd mbcache usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif processor thermal_sys hwmon
scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac scsi_dh ata_generic ata_piix libata mptsas mptscsih mptbase scsi_transport_sas scsi_mod [last unloaded: ip_tables]
[97112.950178] Pid: 5331, comm: kworker/0:2 Tainted: GF       WC   3.4.24.03-0.1.2-default #1
[97112.950182] Call Trace:
[97112.950191]  [<ffffffff8105e2d2>] warn_slowpath_common+0xb2/0x120
[97112.950196]  [<ffffffff8105e365>] warn_slowpath_null+0x25/0x30
[97112.950202]  [<ffffffff81085593>] worker_enter_idle+0x1d3/0x200
[97112.950207]  [<ffffffff81084a95>] ? need_to_create_worker+0x15/0x50
[97112.950213]  [<ffffffff8108a308>] worker_thread+0x2a8/0x4f0
[97112.950218]  [<ffffffff8108a060>] ? rescuer_thread+0x320/0x320
[97112.950226]  [<ffffffff81091d86>] kthread+0xc6/0xe0
[97112.950233]  [<ffffffff81720454>] kernel_thread_helper+0x4/0x10
[97112.950239]  [<ffffffff81091cc0>] ? __init_kthread_worker+0x50/0x50
[97112.950244]  [<ffffffff81720450>] ? gs_change+0x13/0x13
[97112.950248] ---[ end trace 61f48fadbd018007 ]---

Here is a sample version of cgroup_fj which can trigger this problem everytime:
(make sure we have CONFIG_CGROUPS and CONFIG_CPUSET endabled :))
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# cat cgroup_fj.sh
#! /bin/sh

LOGFILE=./cgroup_fj-output.txt
TMPFILE=/tmp/cgroup_fj_tempfile.txt

subsystem=2
subsystem_name="cpuset"

subgroup_num=100

cur_subgroup_path1=""

get_subgroup_path1()
{
        cur_subgroup_path1=""
        if [ "$#" -ne 1 ] || [ "$1" -lt 1 ] ; then
                return;
        fi

        cur_subgroup_path1="/dev/cgroup/subgroup_$1/"
}

cleanup()
{
        mount_str="`mount -l | grep /dev/cgroup`"
        if [ "$mount_str" != "" ]; then
                umount /dev/cgroup
        fi

        if [ -e /dev/cgroup ]; then
                rmdir /dev/cgroup
        fi
}

setup()
{
        mkdir /dev/cgroup
        mount -t cgroup -o $subsystem_name cgroup /dev/cgroup
}

reclaim_foundling()
{
        cat `find /dev/cgroup/subgroup_* -name "tasks"` > $TMPFILE
        nlines=`cat "$TMPFILE" | wc -l`
        for k in `seq 1 $nlines`
        do
                cur_pid=`sed -n "$k""p" $TMPFILE`
                if [ -e /proc/$cur_pid/ ];then
                        echo "pid $cur_pid reclaimed"
                        echo "$cur_pid" > "/dev/cgroup/tasks"
                fi
        done
}

##########################  main   #######################
echo "-------------------------------------------------------------------------" >> $LOGFILE

cleanup;

setup;

if [ $subsystem -eq 2 ]; then
        cpus=`cat /dev/cgroup/cpuset.cpus`
        mems=`cat /dev/cgroup/cpuset.mems`
fi

count=0
pathes[1]=""
for i in `seq 1 $subgroup_num`
do
        get_subgroup_path1 $i
        mkdir $cur_subgroup_path1

        if [ $subsystem -eq 2 ]; then
                echo "$cpus" > "$cur_subgroup_path1""cpuset.cpus"
                echo "$mems" > "$cur_subgroup_path1""cpuset.mems"
        fi

        let "count = $count + 1"
        pathes[$count]="$cur_subgroup_path1"
done

echo "...mkdired $count times" >> $LOGFILE

sleep 1

count2=$count
let "count2 = $count2 + 1"
pathes[0]="/dev/cgroup/"
pathes[$count2]="/dev/cgroup/"
for i in `seq 0 $count`
do
        j=$i
        let "j = $j + 1"
        cat "${pathes[$i]}tasks" > $TMPFILE
        nlines=`cat "$TMPFILE" | wc -l`
        for k in `seq 1 $nlines`
        do
                cur_pid=`sed -n "$k""p" $TMPFILE`
                if [ -e /proc/$cur_pid/ ];then
                        echo "$cur_pid" > "${pathes[$j]}tasks"
                        echo "task: $cur_pid" >> $LOGFILE
                        echo "target: ${pathes[$j]}tasks}" >> $LOGFILE
                fi
        done
done

reclaim_foundling;

for i in `seq 1 $count`
do
        j=i
        let "j = $count - $j + 1"
        rmdir ${pathes[$j]}
done

sleep 1

cleanup;

exit 0;
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html