mark_buffer_dirty WARN_ON_ONCE on buffer_uptodate

"Elliott, Robert (Server Storage)" <Elliott@xxxxxx> · Mon, 15 Sep 2014 21:56:44 +0000

Stress-testing blk-mq/scsi-mq (3.17rc4/blk-next), I was running 
fio + mkfs.ext4 + e2fsck to 16 mpt3sas devices and unplugged the 
JBOD containing the SAS SSDs.  This triggered lots of mpt3sas,
SCSI midlayer, and block layer error messages, as expected.
The linux device (/dev/sdc) does not disappear here; it just
starts generating errors for every IO.

After it triggered "Remounting filesystem read-only", a 
WARN_ON_ONCE triggered in mark_buffer_dirty in the filesystem
layer.  I don't know if that is expected/desired error handling 
behavior.

Kernel log excerpt:
...
[18075.539314] Buffer I/O error on dev sdk, logical block 0, lost sync page write
[18075.539333] EXT4-fs (sdk): previous I/O error to superblock detected
<presumably one of those also appeared for sdc, but it is no longer in the buffer>

...
[18156.572672] mpt3sas0: log_info(0x311201ff): originator(PL), code(0x12), sub_code(0x01ff)
[18156.572676] sd 0:0:2:0: timing out command, waited 0s
[18156.572680] Buffer I/O error on dev sdc, logical block 48791552, lost sync page write
[18156.572699] JBD2: Error -5 detected when updating journal superblock for sdc-8.
[18156.582177] sd 0:0:2:0: timing out command, waited 0s
[18156.583969] Buffer I/O error on dev sdc, logical block 0, lost sync page write
[18156.586107] mpt3sas0: log_info(0x311201ff): originator(PL), code(0x12), sub_code(0x01ff)
[18156.589136] EXT4-fs error (device sdc): ext4_journal_check_start:56: Detected aborted journal
[18156.589137] EXT4-fs (sdc): Remounting filesystem read-only
[18156.589142] ------------[ cut here ]------------
[18156.589146] WARNING: CPU: 1 PID: 3368 at fs/buffer.c:1139 mark_buffer_dirty+0xb5/0xd0()
[18156.589171] Modules linked in: ftdi_sio usbserial nfsd nfs_acl exportfs autofs4 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd sunrpc cpufrendemand pcc_cpufreq dm_mirror dm_region_hash dm_log uinput ipv6 iTCO_wdt iTCO_vendor_support microcode serio_raw pcspkr sb_edac edac_core hpilo hpwdt lpc_ich mfd_core ioatdma dca dm_mod wmi sg tg3 ptp pps_core ext4(E) jbd2(E) mbcache(E) sd_mod(E) crc_t10dif(E) crct10dif_common(E) pata_acpi(E) ata_generic(E) ata_piix(E) hpsa(E) mpt3sas(E) scsi_transport_sas(E) raid_class(E)
[18156.589174] CPU: 1 PID: 3368 Comm: ddpt Tainted: G        W   EL 3.17.0-rc4+ #3
[18156.589175] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 09/08/2013
[18156.589177]  0000000000000473 ffff8800b777b9f8 ffffffff815a8b3f 0000000000000473
[18156.589179]  0000000000000000 ffff8800b777ba38 ffffffff8105267c ffffffff8181c8fa
[18156.589181]  ffff88037afd5800 ffff8803537cdd28 ffff8803ddda2400 0000000000000001
[18156.589182] Call Trace:
[18156.589186]  [<ffffffff815a8b3f>] dump_stack+0x49/0x62
[18156.589189]  [<ffffffff8105267c>] warn_slowpath_common+0x8c/0xc0
[18156.589191]  [<ffffffff810526ca>] warn_slowpath_null+0x1a/0x20
[18156.589193]  [<ffffffff811cc0b5>] mark_buffer_dirty+0xb5/0xd0
[18156.589205]  [<ffffffffa00d8a9a>] ext4_commit_super+0x18a/0x250 [ext4]
[18156.589215]  [<ffffffffa00d93c3>] save_error_info+0x23/0x30 [ext4]
[18156.589223]  [<ffffffffa00d9a6e>] __ext4_abort+0x10e/0x130 [ext4]
[18156.589226]  [<ffffffff815a93f9>] ? _cond_resched+0x9/0x40
[18156.589234]  [<ffffffffa00c638a>] ? ext4_da_write_begin+0x19a/0x2b0 [ext4]
[18156.589244]  [<ffffffffa00f0f88>] ext4_journal_check_start+0x68/0x90 [ext4]
[18156.589253]  [<ffffffffa00f13f1>] __ext4_journal_start_sb+0x41/0xf0 [ext4]
[18156.589261]  [<ffffffffa00c638a>] ext4_da_write_begin+0x19a/0x2b0 [ext4]
[18156.589265]  [<ffffffff8115d5dd>] ? iov_iter_fault_in_readable+0xd/0x80
[18156.589268]  [<ffffffff8113554a>] generic_perform_write+0xca/0x1c0
[18156.589270]  [<ffffffff815aef5d>] ? ftrace_call+0x5/0x2f
[18156.589273]  [<ffffffff811384ef>] __generic_file_write_iter+0x18f/0x390
[18156.589280]  [<ffffffffa00bcc19>] ext4_file_write_iter+0x109/0x420 [ext4]
[18156.589]  [<ffffffff8115d219>] ? iov_iter_init+0x9/0x40
[18156.589286]  [<ffffffff811996a2>] new_sync_write+0x92/0xd0
[18156.589289]  [<ffffffff81199bbe>] vfs_write+0xce/0x180
[18156.589291]  [<ffffffff8119a1fa>] SyS_write+0x5a/0xd0
[18156.589294]  [<ffffffff815ad152>] system_call_fastpath+0x16/0x1b
[18156.589296] ---[ end trace 72065e1b51c7c1cb ]---

That's apparently from this function:

static int ext4_commit_super(struct super_block *sb, int sync)
{
...
        if (!sbh || block_device_ejected(sb))
                return error;
        if (buffer_write_io_error(sbh)) {
                /*
                 * Oh, dear.  A previous attempt to write the
                 * superblock failed.  This could happen because the
                 * USB device was yanked out.  Or it could happen to
                 * be a transient write error and maybe the block will
                 * be remapped.  Nothing we can do but to retry the
                 * write and hope for the best.
                 */
                ext4_msg(sb, KERN_ERR, "previous I/O error to "
                       "superblock detected");
                clear_buffer_write_io_error(sbh);
                set_buffer_uptodate(sbh);
        }
...
        /*
         * If the file system is mounted read-only, don't update the
         * superblock write time.  This avoids updating the superblock
         * write time when we are mounting the root file system
         * read/only but we need to replay the journal; at that point,
         * for people who are east of GMT and who make their clock
         * tick in localtime for Windows bug-for-bug compatibility,
         * the clock is set in the future, and this will cause e2fsck
         * to complain and force a full file system check.
         */
...
        BUFFER_TRACE(sbh, "marking dirty");
        ext4_superblock_csum_set(sb);
        mark_buffer_dirty(sbh);
...

void mark_buffer_dirty(struct buffer_head *bh)
{
        WARN_ON_ONCE(!buffer_uptodate(bh));

---
Rob Elliott    HP Server Storage

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html