On Mon, Aug 19, 2013 at 3:17 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > On Thu, Aug 15, 2013 at 04:01:49PM +1000, Dave Chinner wrote: >> On Wed, Aug 14, 2013 at 09:32:13PM -0700, Andy Lutomirski wrote: >> > On Wed, Aug 14, 2013 at 7:10 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: >> > > On Wed, Aug 14, 2013 at 09:11:01PM -0400, Theodore Ts'o wrote: >> > >> On Wed, Aug 14, 2013 at 04:38:12PM -0700, Andy Lutomirski wrote: >> > >> > > It would be better to write zeros to it, so we aren't measuring the >> > >> > > cost of the unwritten->written conversion. >> > >> > >> > >> > At the risk of beating a dead horse, how hard would it be to defer >> > >> > this part until writeback? >> > >> >> > >> Part of the work has to be done at write time because we need to >> > >> update allocation statistics (i.e., so that we don't have ENOSPC >> > >> problems). The unwritten->written conversion does happen at writeback >> > >> (as does the actual block allocation if we are doing delayed >> > >> allocation). >> > >> >> > >> The point is that if the goal is to measure page fault scalability, we >> > >> shouldn't have this other stuff happening as the same time as the page >> > >> fault workload. >> > > >> > > Sure, but the real problem is not the block mapping or allocation >> > > path - even if the test is changed to take that out of the picture, >> > > we still have timestamp updates being done on every single page >> > > fault. ext4, XFS and btrfs all do transactional timestamp updates >> > > and have nanosecond granularity, so every page fault is resulting in >> > > a transaction to update the timestamp of the file being modified. >> > >> > I have (unmergeable) patches to fix this: >> > >> > http://comments.gmane.org/gmane.linux.kernel.mm/92476 >> >> The big problem with this approach is that not doing the >> timestamp update on page faults is going to break the inode change >> version counting because for ext4, btrfs and XFS it takes a >> transaction to bump that counter. NFS needs to know the moment a >> file is changed in memory, not when it is written to disk. > > I don't think the in-memory updates of the data and the version have to > be completely atomic, if that's what you mean. > >> Also, NFS >> requires the change to the counter to be persistent over server >> failures, so it needs to be changed as part of a transaction.... > > I'm not sure those two updates have to be a single atomic transaction on > disk, either. > I hope not, because they aren't currently in the same transaction, and putting them in the same transaction require starting a transaction on page fault and doing the equivalent of writepages when the same transaction is committed. With my changes [1], they still aren't, but putting them in the same transaction would probably be only a couple lines of code, and it would actually improve performance. (I won't write those couple lines of code because I don't know anything at all about jbd2.) [1] https://lkml.org/lkml/2013/8/16/510 --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html