Re: [patch 1/2] fs: mnt_want_write speedup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 11, 2009 at 03:11:17PM -0700, Dave Hansen wrote:
> I'm feeling a bit better about these, although I am still honestly quite
> afraid of the barriers.  I also didn't like all the #ifdefs much, but
> here's some help on that.

FWIW, we have this in suse kernels because page fault performance was
so bad compared with SLES10. mnt_want_write & co was I think the 2nd
biggest offender for file backed mappings (after pvops). I think we're
around parity again even with pvops.

Basically I think we have to improve this one way or another in mainline
too. Is there any way to make you feel better about the barriers? More
comments?

mnt_make_readonly()                    mnt_want_write()
1. mnt_flags |= MNT_WRITE_HOLD         A. mnt_writers[x]++
2. smp_mb()                            B. smp_mb()
3. count += mnt_writers[0]             C. while (mnt_flags & MNT_WRITE_HOLD) ;
   ...                                 D. smp_rmb()
   count += mnt_writers[N]             E. if (mnt_flags & MNT_READONLY)
4. if (count == 0)                     F.    mnt_writers[x]-- /* fail */
5.     mnt_flags |= MNT_READONLY       G. else /* success */
6. else /* fail */
7. smp_wmb()
8. mnt_flags &= ~MNT_WRITE_HOLD

* 2 ensures that 1 is visible before 3 is loaded
* B ensures that A is visible before C is loaded
* Therefore, either count != 0 at 4, or C will loop (or both)
* If count == 0
  * (make_readonly success)
  * C will loop until 8
  * D ensures E is not loaded until loop ends
  * 7 ensures 5 is visible before 8 is
  * Therefore E will find MNT_READONLY (want_write fail)
* If C does not loop
  * 4 will find count != 0 (make_readonly fail)
  * Therefore 5 is not executed.
  * Therefore E will not find MNT_READONLY (want_write success)
* If count != 0 and C loops
  * (make_readonly fail)
  * 5 will not be executed
  * Therefore E will not find MNT_READONLY (want_write success)

I don't know if that helps (I should reference which statements rely
on which). I think it shows that either one or the other only must
succeed.

It does not illustrate how the loop in the want_write side prevents
the sumation from getting confused by decrementing count on a different
CPU than it was incremented, but I've commented that case in the code
fairly well I think.


> How about this on top of what you have as a bit of a cleanup?  It gets
> rid of all the new #ifdefs in .c files?
> 
> Did I miss the use of get_mnt_writers_ptr()?  I don't think I actually
> saw it used anywhere in this pair of patches, so I've stolen it.  I
> think gcc should compile all this new stuff down to be basically the
> same as you had before.  The one thing I'm not horribly sure of is the
> "out_free_devname:" label.  It shouldn't be reachable in the !SMP case. 
> 
> I could also consolidate the header #ifdefs into a single one if you
> think that looks better.

I don't like the get_mnt_writers_ptr terribly. The *_mnt_writers functions
are quite primitive and just happen to be in the .c file because they're
private to it. The alloc/free_mnt_writers is good (they could be
in the .c file too?).

Another thing I should probably do is slash away most of the crap from
mnt_want_write in the UP case. It only needs to do a preempt_disable,
test MNT_READONLY, increment mnt_writers (and similarly for mnt_make_readonly)
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux