Hello, On Thu 16-03-17 11:37:29, piaojun wrote: > I found a problem that 'ls /mnt/ocfs2/' failed, error log as below: > > # ls: /mnt/ocfs2: Input/output error > > kernel log as below: > Mar 16 10:27:45 linux-yxqzUv kernel: [169213.398778] (ls,19875,0):ocfs2_read_blocks:388 ERROR: E9854523FF8343F9AF043F1A5505B1E1: iblock(17), bh->state(0x44828) > Mar 16 10:27:45 linux-yxqzUv kernel: [169213.398787] (ls,19875,0):ocfs2_assign_bh:776 ERROR: status = -5 > Mar 16 10:27:45 linux-yxqzUv kernel: [169213.398795] (ls,19875,0):ocfs2_dlm_inode_lock_full_nested:1937 ERROR: status = -5 > Mar 16 10:27:45 linux-yxqzUv kernel: [169213.398799] (ls,19875,0):ocfs2_xattr_get:1334 ERROR: status = -5 > Mar 16 10:27:45 linux-yxqzUv kernel: [169213.402691] (ls,19875,0):ocfs2_read_blocks:388 ERROR: E9854523FF8343F9AF043F1A5505B1E1: iblock(17), bh->state(0x44828) > Mar 16 10:27:45 linux-yxqzUv kernel: [169213.402704] (ls,19875,0):ocfs2_dir_foreach_blk_id:1789 ERROR: E9854523FF8343F9AF043F1A5505B1E1: Unable to read inode block for dir 17 > > Test Environment: > OS: suse11 sp3 > kernel: 3.0.93-0.8.2 > filesystem: ocfs2 So this is very old kernel (from upstream point of view) so people generally don't care. Furthermore it is heavily patched kernel of an enterprise distribution which is another reason why upstream people don't care. So handle such issues through standard channels for reporting problems with SLES please - SUSE Customer Care -> bugzilla.suse.com ... > test step: > 1. mount device to /mnt/ocfs2/ > 2. cut down the storage link of device > 3. mkdir /mnt/ocfs2/123 > 4. recover the storage link of device > 5. ls /mnt/ocfs2/123, then failed > > The 'bh' is submitted to jbd2 after 'mkdir', and then the write-back > thread of device submit the 'bh' to disk and failed due to bad storage > link. At last 'bh' state is marked as BH_Write_EIO. the jbd2 won't > release 'bh' in this case. so when ocfs2_read_blocks() is called, found > 'bh' still in jbd2 but marked as BH_Write_EIO and return fail. > > I wonder if jbd2 should handle this problem or ocfs2 filesystem should > do it? Once the IO error propagates to the filesystem, you have bad luck and generally it is not expected you will be able to recover from that situation. You can deal with link flaps by properly configuring multipath or whatever but let's deal with that in bugzilla... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR