On Wed 23-11-11 17:05:33, Wu Fengguang wrote: > On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote: > > On Wed, 16 Nov 2011 19:44:21 +0800 > > Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote: > > > > > Due to the (very low) possibility of data loss by partial writes, IMHO > > > it would safer to test this patch in linux-next until next merge window, > > > > Any such bugs will not be discovered in linux-next testing. > > Yup, I'm afraid. > > > The only way to find these things in a reasonable period of time is to > > go in and find them. For example, intensive fsx-linux testing with > > concurrent heavy memory pressure on various filesystems with various > > block sizes. And of course concurrent signalling. If you're talking > > about O_DIRECT then iirc I hacked support for that into fsx-linux. I > > think. > > How are we going to measure the success/failure? Check if it > eventually resulted in filesystem corruption or whatever? There are a few different questions: 1) Checking for filesystem corruption via fsck - I find such corruption caused by stopping write early extremely unlikely. 2) Checking that we do not expose uninitialized data after a partial (possibly DIRECT_IO) write - I did not find a place where that could happen but this would be worth testing. I think I can write a test for this if people are afraid of data exposure problems. 3) Is it acceptable for write(2) to be interrupted by SIGKILL in the middle? That obviously does happen with my patches so there's no reason to test that. The question is whether someone cares or not and that can be tested only by reality check :). Since the signal is SIGKILL, the process itself cannot notice the interrupted write but someone else can. But as I already said earlier, partial writes can already be observed when the machine crashes, filesystem is close to ENOSPC or so. Arguably these are more severe error conditions than application catching SIGKILL so my patch lowers the bar for observing partial writes. But I wouldn't like to throw away a sensible thing - allow SIGKILL to interrupt a system call - just because of fear of possibility some broken app could rely on this. Sure if the reality check shows there are such broken apps and users who care enough to report, then I have nothing against biting the bullet and reverting the change... Opinions? Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html