Re: [RFC PATCH] iomap: report collisions between directio and buffered writes to userspace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 15, 2017 at 08:35:57PM +0100, Holger Hoffstätte wrote:
> On 11/15/17 19:54, Darrick J. Wong wrote:
> > On Wed, Nov 15, 2017 at 02:16:01PM +0100, Holger Hoffstätte wrote:
> >> On 11/14/17 22:46, Darrick J. Wong wrote:
> >> (snip)
> >>> +static void iomap_warn_stale_pagecache(struct inode *inode)
> >>> +{
> >>> +	errseq_set(&inode->i_mapping->wb_err, -EIO);
> >>> +	pr_crit_ratelimited("Stale pagecache contents after collision "
> >>> +			    "between direct and buffered write!\n");
> >>> +}
> >>
> >> In this form the error message is IMHO useless since it tells me
> >> neither the file in question nor the misbehaving application.
> >> "Something went wrong somewhere" is not actionable information
> >> and in practice will only be ignored.
> >>
> >> Since you already have the inode in question at hand, print at least
> >> the full path + filename so that it's clear where things are going
> >> wrong. Usually that will let people deduce which application is
> >> misbehaving.
> > 
> > The whole point of the errseq_set call in this patch is to record the
> > write collision so that all the writers of this file will receive an EIO
> > the next time they try to flush the file.  You can pinpoint exactly
> > which fd(s) in which application(s) caused the problem.  The old dmesg
> > spew only captured which program issued the dio write.
> 
> Then what is the use of printing the message? I'm not arguing against
> handling the error, which is of course much better than silent corruption;
> I'm asking what the point of the message is because it doesn't tell
> anything actionable after the fact. If you really want to rely on
> applications handling this condition (which I agree is the right thing
> to do!) then there is simply no need for the message; if I found it in
> the log one day, I'd have no idea what to do about it. That's all.

The message is there for the people that have to triage reports of
spurious EIO errors at the application level, or if the app ignores
the errors, reports of data corruption being detected. The output
lets us know the likely trigger of the problem rather than having to
start searching for phantom data corruption vectors that don't
exist.

If we add the process name to the error message, then we have pretty
much all we need to correlate the "app misbehaving/data corrupted"
report with the cause...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux