Re: [PATCH] fsync.2: ERRORS: add EIO and ENOSPC

"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> · Wed, 9 Sep 2020 12:52:48 +0200

Hello Jan,

Thank you for jumping in on this thread.

On 9/8/20 1:27 PM, Jan Kara wrote:
> Added Jeff to CC since he has written the code...
> 
> On Mon 07-09-20 09:11:06, Michael Kerrisk (man-pages) wrote:
>> [Widening the CC to include Andrew and linux-fsdevel@]
>> [Milan: thanks for the patch, but it's unclear to me from your commit
>> message how/if you verified the details.]
>>
>> Andrew, maybe you (or someone else) can comment, since long ago your
>>
>>     commit f79e2abb9bd452d97295f34376dedbec9686b986
>>     Author: Andrew Morton <akpm@xxxxxxxx>
>>     Date:   Fri Mar 31 02:30:42 2006 -0800
>>
>> included a comment that is referred to in  stackoverflow discussion
>> about this topic (that SO discussion is in turn referred to by
>> https://bugzilla.kernel.org/show_bug.cgi?id=194757).
>>
>> The essence as I understand it, is this:
>> (1) fsync() (and similar) may fail EIO or ENOSPC, at which point data
>> has not been synced.
>> (2) In this case, the EIO/ENOSPC setting is cleared so that...
>> (3) A subsequent fsync() might return success, but...
>> (4) That doesn't mean that the data in (1) landed on the disk.
> 
> Correct.

Thanks for the confirmation!

>> The proposed manual page patch below wants to document this, but I'd
>> be happy to have an FS-knowledgeable person comment before I apply.
> 
> Just a small comment below:
> 
>> On Sat, 29 Aug 2020 at 09:13, <milan.opensource@xxxxxxxxx> wrote:
>>>
>>> From: Milan Shah <milan.opensource@xxxxxxxxx>
>>>
>>> This Fix addresses Bug 194757.
>>> Ref: https://bugzilla.kernel.org/show_bug.cgi?id=194757
>>> ---
>>>  man2/fsync.2 | 13 +++++++++++++
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/man2/fsync.2 b/man2/fsync.2
>>> index 96401cd..f38b3e4 100644
>>> --- a/man2/fsync.2
>>> +++ b/man2/fsync.2
>>> @@ -186,6 +186,19 @@ In these cases disk caches need to be disabled using
>>>  or
>>>  .BR sdparm (8)
>>>  to guarantee safe operation.
>>> +
>>> +When
>>> +.BR fsync ()
>>> +or
>>> +.BR fdatasync ()
>>> +returns
>>> +.B EIO
>>> +or
>>> +.B ENOSPC
>>> +any error flags on pages in the file mapping are cleared, so subsequent synchronisation attempts
>>> +will return without error. It is
>>> +.I not
>>> +safe to retry synchronisation and assume that a non-error return means prior writes are now on disk.
>>>  .SH SEE ALSO
>>>  .BR sync (1),
>>>  .BR bdflush (2),
> 
> So the error state isn't really stored "on pages in the file mapping".
> Current implementation (since 4.14) is that error state is stored in struct
> file (I think this tends to be called "file description" in manpages) and

(Yes, "open file description" is the POSIX terminology for the thing that
sits between the FD and the inode--struct file in kernel parlance--and I
try to follow POSIX terminology in the manual pages where possible.

> so EIO / ENOSPC is reported once for each file description of the file that
> was open before the error happened. Not sure if we want to be so precise in
> the manpages or if it just confuses people. 

Well, people are confused now, so I think more detail would be good.

> Anyway your takeway that no
> error on subsequent fsync() does not mean data was written is correct.

Thanks. (See also my rply to Jeff.)

By the way, a question related to your comments above. In the 
errors section, there is this:

       EIO    An  error  occurred during synchronization.  This error may
              relate to data written to some other file descriptor on the
*             same  file.   Since Linux 4.13, errors from write-back will
              be reported to all file descriptors that might have written
              the  data  which  triggered  the  error.   Some filesystems
              (e.g., NFS) keep close track of  which  data  came  through
              which  file  descriptor,  and  give more precise reporting.
              Other  filesystems  (e.g.,  most  local  filesystems)  will
              report errors to all file descriptors that were open on the
*             file when the error was recorded.

In the marked (*) lines, we have the word "file". Is this accurate? I mean, I
would normally take "file" in this context to mean the inode ('struct inode').
But I wonder if really what is meant here is "open file description"
('struct file'). In other words, is the EIO being generated for all FDs 
connected to the same open file description, or for all FDs for all of the
open file descriptions connected to the inode? Your thoughts?

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/