Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal

Wu Fengguang <fengguang.wu@xxxxxxxxx> · Wed, 23 Nov 2011 21:27:59 +0800

On Wed, Nov 23, 2011 at 09:08:03PM +0800, Jan Kara wrote:
> On Wed 23-11-11 17:05:33, Wu Fengguang wrote:
> > On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote:
> > > On Wed, 16 Nov 2011 19:44:21 +0800
> > > Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:
> > > 
> > > > Due to the (very low) possibility of data loss by partial writes, IMHO
> > > > it would safer to test this patch in linux-next until next merge window,
> > > 
> > > Any such bugs will not be discovered in linux-next testing.
> > 
> > Yup, I'm afraid.
> > 
> > > The only way to find these things in a reasonable period of time is to
> > > go in and find them.  For example, intensive fsx-linux testing with
> > > concurrent heavy memory pressure on various filesystems with various
> > > block sizes.  And of course concurrent signalling.  If you're talking
> > > about O_DIRECT then iirc I hacked support for that into fsx-linux.  I
> > > think.
> > 
> > How are we going to measure the success/failure? Check if it
> > eventually resulted in filesystem corruption or whatever?
>   There are a few different questions:
>   1) Checking for filesystem corruption via fsck - I find such corruption
> caused by stopping write early extremely unlikely.

Agreed.

>   2) Checking that we do not expose uninitialized data after a partial
> (possibly DIRECT_IO) write - I did not find a place where that could happen
> but this would be worth testing. I think I can write a test for this if
> people are afraid of data exposure problems.

Do we already have such kind of tests in xfstests? If not, it sounds
like a good gap to fill :-)

>   3) Is it acceptable for write(2) to be interrupted by SIGKILL in the
> middle? That obviously does happen with my patches so there's no reason
> to test that. The question is whether someone cares or not and that can be
> tested only by reality check :). Since the signal is SIGKILL, the process
> itself cannot notice the interrupted write but someone else can. But as I
> already said earlier, partial writes can already be observed when the
> machine crashes, filesystem is close to ENOSPC or so. Arguably these are
> more severe error conditions than application catching SIGKILL so my
> patch lowers the bar for observing partial writes. But I wouldn't like to
> throw away a sensible thing - allow SIGKILL to interrupt a system call -
> just because of fear of possibility some broken app could rely on this.
> Sure if the reality check shows there are such broken apps and users who
> care enough to report, then I have nothing against biting the bullet
> and reverting the change... Opinions?

Reading Ted's information feed, I tend to disregard the partial write
issue: since the "broken" applications will already fail and get
punished in various other cases, I don't care adding one more penalty
case to them :-P

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html