Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal

Jan Kara <jack@xxxxxxx> · Wed, 23 Nov 2011 14:08:03 +0100



On Wed 23-11-11 17:05:33, Wu Fengguang wrote:
> On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote:
> > On Wed, 16 Nov 2011 19:44:21 +0800
> > Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:
> > 
> > > Due to the (very low) possibility of data loss by partial writes, IMHO
> > > it would safer to test this patch in linux-next until next merge window,
> > 
> > Any such bugs will not be discovered in linux-next testing.
> 
> Yup, I'm afraid.
> 
> > The only way to find these things in a reasonable period of time is to
> > go in and find them.  For example, intensive fsx-linux testing with
> > concurrent heavy memory pressure on various filesystems with various
> > block sizes.  And of course concurrent signalling.  If you're talking
> > about O_DIRECT then iirc I hacked support for that into fsx-linux.  I
> > think.
> 
> How are we going to measure the success/failure? Check if it
> eventually resulted in filesystem corruption or whatever?
  There are a few different questions:
  1) Checking for filesystem corruption via fsck - I find such corruption
caused by stopping write early extremely unlikely.
  2) Checking that we do not expose uninitialized data after a partial
(possibly DIRECT_IO) write - I did not find a place where that could happen
but this would be worth testing. I think I can write a test for this if
people are afraid of data exposure problems.
  3) Is it acceptable for write(2) to be interrupted by SIGKILL in the
middle? That obviously does happen with my patches so there's no reason
to test that. The question is whether someone cares or not and that can be
tested only by reality check :). Since the signal is SIGKILL, the process
itself cannot notice the interrupted write but someone else can. But as I
already said earlier, partial writes can already be observed when the
machine crashes, filesystem is close to ENOSPC or so. Arguably these are
more severe error conditions than application catching SIGKILL so my
patch lowers the bar for observing partial writes. But I wouldn't like to
throw away a sensible thing - allow SIGKILL to interrupt a system call -
just because of fear of possibility some broken app could rely on this.
Sure if the reality check shows there are such broken apps and users who
care enough to report, then I have nothing against biting the bullet
and reverting the change... Opinions?

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html