[BUG] cgroup writeback crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
cgroup based writeback has a race condition bug leads to a kernel crash.

When an inode's bdi_writeback is switched, an additional ref count on the inode
is acquired in inode_switch_wbs()  and the actual reassignment work is scheduled
to be executed later. If file gets deleted and fs unmounted before the work is
finished, then the last ref drop by inode_switch_wbs_work_fn() will
try to evict the
inode and so attempt to access released filesystem data.

Here is the shell script that I am using to reproduce this (not a
reliable repro):

cat > repro.sh << "EOF"
#!/bin/bash
set -e

FILE_COUNT=${1:-18}
BLK_COUNT=${2:-2}


CGROUP_ROOT=/mnt-cgroup2

mkdir -p $CGROUP_ROOT

if ! mount | grep -qw cgroup2; then
mount -t cgroup2 none $CGROUP_ROOT
fi

mkdir -p $CGROUP_ROOT/mem1
mkdir -p $CGROUP_ROOT/mem2

echo '+memory' > $CGROUP_ROOT/cgroup.subtree_control

mkdir -p /mnt/sdb

if mount | grep -qw /dev/sdb; then
umount /dev/sdb &> /dev/null || true
fi

mount /dev/sdb /mnt/sdb

FILES=$(seq 1 $FILE_COUNT)

for f in $FILES; do
rm -f /mnt/sdb/dd$f
done

# Move to mem1 cgroup
echo $$ > $CGROUP_ROOT/mem1/cgroup.procs

for i in {1..10}; do
for f in $FILES; do
dd if=/dev/urandom of=/mnt/sdb/dd$f conv=notrunc \
bs=4k count=$BLK_COUNT seek=$(($BLK_COUNT*$i)) &> /dev/null
done
sync

# After first iteration, switch to mem2 cgroup
if [[ "$i" == "1" ]]; then
echo $$ > $CGROUP_ROOT/mem2/cgroup.procs
fi
done

for f in $FILES; do
rm -f /mnt/sdb/dd$f
done

umount /mnt/sdb

EOF

[  278.498009] ------------[ cut here ]------------
[  278.502764] kernel BUG at fs/jbd2/transaction.c:319!
[  278.507652] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[  278.507652] CPU: 1 PID: 29158 Comm: kworker/1:10 Not tainted 4.5.0-rc3 #51
[  278.507652] Hardware name: Google Google, BIOS Google 01/01/2011
[  278.507652] Workqueue: events inode_switch_wbs_work_fn
[  278.507652] task: ffff880213dbbd40 ti: ffff880209264000 task.ti:
ffff880209264000
[  278.507652] RIP: 0010:[<ffffffff803e6922>]  [<ffffffff803e6922>]
start_this_handle+0x382/0x3e0
[  278.507652] RSP: 0018:ffff880209267c30  EFLAGS: 00010202
[  278.507652] RAX: 0000000000000031 RBX: ffff880213fba000 RCX: 0000000000000000
[  278.507652] RDX: 0000000000000001 RSI: 00000000000001ff RDI: ffff880213fba028
[  278.507652] RBP: ffff880209267cb0 R08: 0000000000002000 R09: 00000000000000ef
[  278.507652] R10: ffff880216085750 R11: 0000000000000006 R12: ffff880213fba024
[  278.507652] R13: ffff880213fba070 R14: ffff880216085750 R15: 00000000000000ef
[  278.507652] FS:  0000000000000000(0000) GS:ffff88021ef00000(0000)
knlGS:0000000000000000
[  278.507652] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  278.507652] CR2: 00007f310a2a9095 CR3: 0000000000e0a000 CR4: 00000000000406e0
[  278.507652] Stack:
[  278.507652]  000000000000000a ffff880213fba3e0 024000400000000c
ffff880209267c78
[  278.507652]  0000000000000000 0000003000000000 ffff880216085750
ffff880209267cb0
[  278.507652]  ffffffff8032d78a ffffffff803e6b69 ffff880216085720
ffff880213fba000
[  278.507652] Call Trace:
[  278.507652]  [<ffffffff8032d78a>] ? kmem_cache_alloc+0x10a/0x150
[  278.507652]  [<ffffffff803e6b69>] ? jbd2__journal_start+0x79/0x190
[  278.507652]  [<ffffffff803e6be4>] jbd2__journal_start+0xf4/0x190
[  278.507652]  [<ffffffff803cfc7e>] __ext4_journal_start_sb+0x4e/0x70
[  278.507652]  [<ffffffff803b31ec>] ext4_evict_inode+0x12c/0x3d0
[  278.507652]  [<ffffffff8035338b>] evict+0xbb/0x190
[  278.507652]  [<ffffffff80354190>] iput+0x130/0x190
[  278.507652]  [<ffffffff80360223>] inode_switch_wbs_work_fn+0x343/0x4c0
[  278.507652]  [<ffffffff80279819>] process_one_work+0x129/0x300
[  278.507652]  [<ffffffff802ab1a3>] ? try_to_del_timer_sync+0x43/0x60
[  278.507652]  [<ffffffff80279b16>] worker_thread+0x126/0x480
[  278.507652]  [<ffffffff802799f0>] ? process_one_work+0x300/0x300
[  278.507652]  [<ffffffff8027ed14>] kthread+0xc4/0xe0
[  278.507652]  [<ffffffff8027ec50>] ? __kthread_parkme+0x70/0x70
[  278.507652]  [<ffffffff809771df>] ret_from_fork+0x3f/0x70
[  278.507652]  [<ffffffff8027ec50>] ? __kthread_parkme+0x70/0x70
[  278.507652] Code: 00 00 e8 82 6d f4 ff 48 85 c0 48 89 45 a0 0f 85
28 fd ff ff 41 bf f4 ff ff ff e9 bf fe ff ff c7 45 a8 00 00 00 00 e9
b8 fc ff ff <0f> 0b 41 bf e2 ff ff ff e9 a6 fe ff ff 0f 0b 8b 4d a8 8b
55 ac
[  278.507652] RIP  [<ffffffff803e6922>] start_this_handle+0x382/0x3e0
[  278.507652]  RSP <ffff880209267c30>
[  278.775069] ---[ end trace b85bc47b5909067f ]---
[  278.779848] BUG: unable to handle kernel paging request at ffffffffffffffd8
[  278.787306] IP: [<ffffffff8027eebb>] kthread_data+0xb/0x20
[  278.789819] PGD e0b067 PUD e0d067 PMD 0
[  278.789819] Oops: 0000 [#2] SMP DEBUG_PAGEALLOC
[  278.789819] CPU: 1 PID: 29158 Comm: kworker/1:10 Tainted: G      D
       4.5.0-rc3 #51
[  278.789819] Hardware name: Google Google, BIOS Google 01/01/2011
[  278.789819] task: ffff880213dbbd40 ti: ffff880209264000 task.ti:
ffff880209264000
[  278.789819] RIP: 0010:[<ffffffff8027eebb>]  [<ffffffff8027eebb>]
kthread_data+0xb/0x20
[  278.789819] RSP: 0018:ffff880209267930  EFLAGS: 00010002
[  278.789819] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
[  278.789819] RDX: ffffffff80fbf880 RSI: 0000000000000001 RDI: ffff880213dbbd40
[  278.789819] RBP: ffff880209267930 R08: 00000040e892772b R09: 0000000000000000
[  278.789819] R10: 0000000000000000 R11: ffffea000857da00 R12: ffff880213dbc1f0
[  278.789819] R13: 0000000000013cc0 R14: 0000000000000001 R15: ffff880213dbbd40
[  278.789819] FS:  0000000000000000(0000) GS:ffff88021ef00000(0000)
knlGS:0000000000000000
[  278.789819] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  278.789819] CR2: 0000000000000028 CR3: 0000000000e0a000 CR4: 00000000000406e0
[  278.789819] Stack:
[  278.789819]  ffff880209267948 ffffffff802793fc ffff88021ef13cc0
ffff880209267998
[  278.789819]  ffffffff80973ad7 ffff880213dbbd40 ffffffff802a6d22
ffff8802092679b0
[  278.789819]  ffff880209268000 ffff880213dbc0e0 ffffffff80c36d32
ffff880213dbbd40
[  278.789819] Call Trace:
[  278.789819]  [<ffffffff802793fc>] wq_worker_sleeping+0xc/0x90
[  278.789819]  [<ffffffff80973ad7>] __schedule+0x347/0x7d6
[  278.789819]  [<ffffffff802a6d22>] ? call_rcu_sched+0x12/0x20
[  278.789819]  [<ffffffff80973fc0>] schedule+0x30/0x80
[  278.789819]  [<ffffffff8026972a>] do_exit+0x5fa/0xa50
[  278.789819]  [<ffffffff80206058>] oops_end+0x68/0x90
[  278.789819]  [<ffffffff802061b6>] die+0x46/0x60
[  278.789819]  [<ffffffff802038b3>] do_trap+0xa3/0x140
[  278.789819]  [<ffffffff802039c2>] do_error_trap+0x72/0xe0
[  278.789819]  [<ffffffff803e6922>] ? start_this_handle+0x382/0x3e0
[  278.789819]  [<ffffffff80203c6b>] do_invalid_op+0x1b/0x20
[  278.789819]  [<ffffffff80978318>] invalid_op+0x18/0x20
[  278.789819]  [<ffffffff803e6922>] ? start_this_handle+0x382/0x3e0
[  278.789819]  [<ffffffff8032d78a>] ? kmem_cache_alloc+0x10a/0x150
[  278.789819]  [<ffffffff803e6b69>] ? jbd2__journal_start+0x79/0x190
[  278.789819]  [<ffffffff803e6be4>] jbd2__journal_start+0xf4/0x190
[  278.789819]  [<ffffffff803cfc7e>] __ext4_journal_start_sb+0x4e/0x70
[  278.789819]  [<ffffffff803b31ec>] ext4_evict_inode+0x12c/0x3d0
[  278.789819]  [<ffffffff8035338b>] evict+0xbb/0x190
[  278.789819]  [<ffffffff80354190>] iput+0x130/0x190
[  278.789819]  [<ffffffff80360223>] inode_switch_wbs_work_fn+0x343/0x4c0
[  278.789819]  [<ffffffff80279819>] process_one_work+0x129/0x300
[  278.789819]  [<ffffffff802ab1a3>] ? try_to_del_timer_sync+0x43/0x60
[  278.789819]  [<ffffffff80279b16>] worker_thread+0x126/0x480
[  278.789819]  [<ffffffff802799f0>] ? process_one_work+0x300/0x300
[  278.789819]  [<ffffffff8027ed14>] kthread+0xc4/0xe0
[  278.789819]  [<ffffffff8027ec50>] ? __kthread_parkme+0x70/0x70
[  278.789819]  [<ffffffff809771df>] ret_from_fork+0x3f/0x70
[  278.789819]  [<ffffffff8027ec50>] ? __kthread_parkme+0x70/0x70
[  278.789819] Code: 25 80 ac 00 00 48 8b 80 50 04 00 00 5d 48 8b 40
c8 48 d1 e8 83 e0 01 c3 0f 1f 84 00 00 00 00 00 55 48 8b 87 50 04 00
00 48 89 e5 <48> 8b 40 d8 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00
00 00
[  278.789819] RIP  [<ffffffff8027eebb>] kthread_data+0xb/0x20
[  278.789819]  RSP <ffff880209267930>
[  278.789819] CR2: ffffffffffffffd8
[  278.789819] ---[ end trace b85bc47b59090680 ]---
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux