Call traces on console from a test machine

Kelly Kane <kelly@xxxxxxxxxxxxxxx> · Thu, 13 Nov 2008 16:31:31 -0800

We have a production (yay!) ext4 server which has started spewing 
ext4_da_writepages errors on the console. The only change anyone can 
think of is that we started doing rsync backups of the machine to 
another. Perhaps this heavy I/O on user home directories is causing the 
problem?

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          51.46   20.91   19.90    0.63    0.00    7.10

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   165.48  0.00 40.61     0.00  1648.73    40.60     1.28   31.45   0.90   3.65
sda1              0.00   165.48  0.00 40.61     0.00  1648.73    40.60     1.28   31.45   0.90   3.65
sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda3              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.51     0.00 31.98  0.00   795.94     0.00    24.89     0.27    8.76   6.73  21.52
sdb1              0.51     0.00 31.98  0.00   795.94     0.00    24.89     0.27    8.76   6.73  21.52

The errors scrolling by pretty quickly on the serial console:

ext4_da_writepages: jbd2_start: 1024 pages, ino 3014931; err -30
Pid: 284, comm: pdflush Tainted: G        W 
2.6.27-serf-xeon-c6.1-ext4-grsec #1

Call Trace:
 [<ffffffff8031d485>] ext4_da_writepages+0x2f5/0x320
 [<ffffffff80227cc5>] __dequeue_entity+0x55/0x80
 [<ffffffff80227d15>] set_next_entity+0x25/0x50
 [<ffffffff8026f570>] do_writepages+0x20/0x40
 [<ffffffff802b3717>] __writeback_single_inode+0x97/0x340
 [<ffffffff8022787f>] update_curr+0x3f/0x60
 [<ffffffff80227cc5>] __dequeue_entity+0x55/0x80
 [<ffffffff802b3e17>] generic_sync_sb_inodes+0x217/0x320
 [<ffffffff802b42ce>] writeback_inodes+0x7e/0xc0
 [<ffffffff8026ffc6>] wb_kupdate+0xa6/0x120
 [<ffffffff802704a0>] pdflush+0x0/0x220
 [<ffffffff802704a0>] pdflush+0x0/0x220
 [<ffffffff802705de>] pdflush+0x13e/0x220
 [<ffffffff8026ff20>] wb_kupdate+0x0/0x120
 [<ffffffff80246b6b>] kthread+0x4b/0x80
 [<ffffffff80203789>] child_rip+0xa/0x11
 [<ffffffff80246b20>] kthread+0x0/0x80
 [<ffffffff8020377f>] child_rip+0x0/0x11

This is a vanilla 2.6.27 kernel + grsec + "2.6.27-ext4-2" patchset + the 
following patch per Sandeen:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3c37fc86d20fe35be656f070997d62f75c2e4874;hp=8c9fa93d51123c5540762b1a9e1919d6f9c4af7c

Unfortunately I do not have a reproducible, yet, and the kernel is 
monolithic. It hasn't been rebooted (yet!) so I can gather something 
from the memory. If it crashes or proves unusable, though, I will have 
to reboot it.

We also switched the fstab, but no one remembers remounting the 
filesystem to be as follows:

/dev/sdb1       /home      ext4 
defaults,noatime,nodiratime,nosuid,nodev,errors=remount-ro,data=writeback 
      0       0

Prior it had no "data=" section.

Kelly
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html