On Wed, Mar 11, 2009 at 03:11:17PM -0700, Dave Hansen wrote: > I'm feeling a bit better about these, although I am still honestly quite > afraid of the barriers. I also didn't like all the #ifdefs much, but > here's some help on that. FWIW, we have this in suse kernels because page fault performance was so bad compared with SLES10. mnt_want_write & co was I think the 2nd biggest offender for file backed mappings (after pvops). I think we're around parity again even with pvops. Basically I think we have to improve this one way or another in mainline too. Is there any way to make you feel better about the barriers? More comments? mnt_make_readonly() mnt_want_write() 1. mnt_flags |= MNT_WRITE_HOLD A. mnt_writers[x]++ 2. smp_mb() B. smp_mb() 3. count += mnt_writers[0] C. while (mnt_flags & MNT_WRITE_HOLD) ; ... D. smp_rmb() count += mnt_writers[N] E. if (mnt_flags & MNT_READONLY) 4. if (count == 0) F. mnt_writers[x]-- /* fail */ 5. mnt_flags |= MNT_READONLY G. else /* success */ 6. else /* fail */ 7. smp_wmb() 8. mnt_flags &= ~MNT_WRITE_HOLD * 2 ensures that 1 is visible before 3 is loaded * B ensures that A is visible before C is loaded * Therefore, either count != 0 at 4, or C will loop (or both) * If count == 0 * (make_readonly success) * C will loop until 8 * D ensures E is not loaded until loop ends * 7 ensures 5 is visible before 8 is * Therefore E will find MNT_READONLY (want_write fail) * If C does not loop * 4 will find count != 0 (make_readonly fail) * Therefore 5 is not executed. * Therefore E will not find MNT_READONLY (want_write success) * If count != 0 and C loops * (make_readonly fail) * 5 will not be executed * Therefore E will not find MNT_READONLY (want_write success) I don't know if that helps (I should reference which statements rely on which). I think it shows that either one or the other only must succeed. It does not illustrate how the loop in the want_write side prevents the sumation from getting confused by decrementing count on a different CPU than it was incremented, but I've commented that case in the code fairly well I think. > How about this on top of what you have as a bit of a cleanup? It gets > rid of all the new #ifdefs in .c files? > > Did I miss the use of get_mnt_writers_ptr()? I don't think I actually > saw it used anywhere in this pair of patches, so I've stolen it. I > think gcc should compile all this new stuff down to be basically the > same as you had before. The one thing I'm not horribly sure of is the > "out_free_devname:" label. It shouldn't be reachable in the !SMP case. > > I could also consolidate the header #ifdefs into a single one if you > think that looks better. I don't like the get_mnt_writers_ptr terribly. The *_mnt_writers functions are quite primitive and just happen to be in the .c file because they're private to it. The alloc/free_mnt_writers is good (they could be in the .c file too?). Another thing I should probably do is slash away most of the crap from mnt_want_write in the UP case. It only needs to do a preempt_disable, test MNT_READONLY, increment mnt_writers (and similarly for mnt_make_readonly) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html