On Wed, Nov 23, 2011 at 09:08:03PM +0800, Jan Kara wrote: > On Wed 23-11-11 17:05:33, Wu Fengguang wrote: > > On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote: > > > On Wed, 16 Nov 2011 19:44:21 +0800 > > > Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote: > > > > > > > Due to the (very low) possibility of data loss by partial writes, IMHO > > > > it would safer to test this patch in linux-next until next merge window, > > > > > > Any such bugs will not be discovered in linux-next testing. > > > > Yup, I'm afraid. > > > > > The only way to find these things in a reasonable period of time is to > > > go in and find them. For example, intensive fsx-linux testing with > > > concurrent heavy memory pressure on various filesystems with various > > > block sizes. And of course concurrent signalling. If you're talking > > > about O_DIRECT then iirc I hacked support for that into fsx-linux. I > > > think. > > > > How are we going to measure the success/failure? Check if it > > eventually resulted in filesystem corruption or whatever? > There are a few different questions: > 1) Checking for filesystem corruption via fsck - I find such corruption > caused by stopping write early extremely unlikely. Agreed. > 2) Checking that we do not expose uninitialized data after a partial > (possibly DIRECT_IO) write - I did not find a place where that could happen > but this would be worth testing. I think I can write a test for this if > people are afraid of data exposure problems. Do we already have such kind of tests in xfstests? If not, it sounds like a good gap to fill :-) > 3) Is it acceptable for write(2) to be interrupted by SIGKILL in the > middle? That obviously does happen with my patches so there's no reason > to test that. The question is whether someone cares or not and that can be > tested only by reality check :). Since the signal is SIGKILL, the process > itself cannot notice the interrupted write but someone else can. But as I > already said earlier, partial writes can already be observed when the > machine crashes, filesystem is close to ENOSPC or so. Arguably these are > more severe error conditions than application catching SIGKILL so my > patch lowers the bar for observing partial writes. But I wouldn't like to > throw away a sensible thing - allow SIGKILL to interrupt a system call - > just because of fear of possibility some broken app could rely on this. > Sure if the reality check shows there are such broken apps and users who > care enough to report, then I have nothing against biting the bullet > and reverting the change... Opinions? Reading Ted's information feed, I tend to disregard the partial write issue: since the "broken" applications will already fail and get punished in various other cases, I don't care adding one more penalty case to them :-P Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html