[Bug 200753] write I/O error for inode structure leads to operation failure without any warning or error

bugzilla-daemon@xxxxxxxxxxxxxxxxxxx · Tue, 07 Aug 2018 18:16:19 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=200753

--- Comment #15 from Shehbaz (shehbazjaffer007@xxxxxxxxx) ---
Thanks Theodore and Eric,

> If you don't call fsync(2), the data blocks (or changes to the inode) may not
> be attempted to be written to disk before the userspace program exits so
> there is no guarantee that would be any opportunity for the system even
> *notice* that there is a problem.

I have changed my programs to call ftruncate(2), chmod(2), chown(2), creat(2)
and open(2),fsync(2), close(2). For each of the programs, I still notice
fsync(2) and close(2) return success (strace o/p attached), even after
returning I/O error to the file system.
Since symlink(2) and utimes(2) do not work on fds but paths, I did not
experiment with these.

> Also, as far as metadata blocks (such as inode table blocks), what's
> generally important is whether they are successfully written to the journal.

I agree. I observe the following: Once the writes are done to the journal,
there is a follow-up write then to the in-place location of the inode. When
this inode write is failed, no error is reported. I assume that the journal
also does not bother since the write has already been made in-place on the file
system. This is why after remount even though write was done on journal, the
updated inode image is not seen on remount.

Thank you for explaining other scenarios and real world use-cases, I would take
these into consideration for extending this work.

> For example a buffered write failure won't be seen by the application unless
> it i.e. calls fsync on the write.

I thought unmount of file system would automatically ensure any left over
blocks in buffer cache would be flushed to disk. However, after your feedback I
make sure fsync(2), close(2) and open(2) are done at right locations from the
userspace program.

> Now, I probably would expect to see some errors in dmesg if for example inode
> flushing fails at unmount time, though.

I agree.

> It looks like none of your operations make any data persistence calls, so
> you're not /asking/ for a return of success or failure of persistence.

In my current truncate.c, creat.c, chmod.c and chown.c, I ask for data
persistence. I still do not see any observable user/kernel space warnings or
errors.

> We'll of course log errors writing into the journal

I have some different observations regarding this, I observe that an EIO during
journal write I/O also do not log any errors in kernel space. This does not
cause the operation to fail (unlike failure of inode blocks) because I observe
that after the write has been done to journal, the writes are also done to
in-place location of the inode. So, if I were to fail a journal block write for
truncate operation and remount the file system, I see the truncated file, as
the inplace write happened successfully. I am sorry I am unable to determine
which data structure - journal super, journal descriptor, journal commit or
journal data - lies on the failed block, but I can get back to you regarding
this in a separate report.

Also, thank you for providing background of real disk behaviour. My only
concern is if disk vendors only reply with EIO and depend on upper layers to
make sense of it, and file systems rely on disk vendors to log errors, we might
end up having silent corruptions. 2 log messages would be ok. no log messages
would be bad.

Please bear with me while I file my findings and I will follow up with patch
recommendations.

Thank you,

-- 
You are receiving this mail because:
You are watching the assignee of the bug.