Theodore Ts'o <tytso@xxxxxxx> writes: > On Wed, Nov 12, 2014 at 04:47:42PM +0300, Dmitry Monakhov wrote: >> Also sync mtime updates is a great pain for AIO submitter >> because AIO submission may be blocked for a seconds (up to 5 second in my case) >> if inode is part of current committing transaction see: do_get_write_access > > 5 seconds?!? So you're seeing cases where the jbd2 layer is taking > that long to close a commit? It might be worth looking at that so we > can understand why that is happening, and to see if there's anything > we might do to improve things on that front. Even if we can get rid > of most of the mtime updates, there will be other cases where a commit > that takes a long time to complete will cause all sorts of other very > nasty latencies on the entire system. Our chunk server workload is quite generic submit_task: performs aio-dio requests in to multiple chunk files from several threads, this task should not block for too long. sync_task: performs fsync/fdatasync on demand for modified chunk files before we can ACK write-op to user, this task may block Here is chunk server simulation load: #TEST_CASE assumes that target fs is mounted to /mnt # Performs random aio-dio write bsz:64k to preallocated files (size:128M) threads:32 # and performs fdatasync each 32'th write operation $ fio ./aio-dio.fio # Measure AIO-DIO write submission latency $ dd if=/dev/zero of=/mnt/f bs=1M count=1 $ ioping -A -C -D -WWW /mnt/f 4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=1 time=410 us 4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=2 time=430 us 4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=3 time=370 us 4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=4 time=400 us 4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=5 time=1.9 s 4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=6 time=4.2 s 4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=7 time=3.8 s 4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=8 time=3.7 s 4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=9 time=4.1 s 4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=10 time=1.9 s > >> Yeah we also has ticket for that :) >> https://jira.sw.ru/browse/PSBM-20411 > > Is this supposed to be a URL to publically visible web page? > > Host jira.sw.ru not found: 3(NXDOMAIN) Ohh, unfortunetly this host is not visiable from outside. > >> > + if (flags & S_VERSION) >> > + inode_inc_iversion(inode); > .... >> Since we want update all in-memory data we also have to explicitly update inode->i_version >> Which was previously updated implicitly here: >> mark_inode_dirty_sync() >> ->__mark_inode_dirty >> ->ext4_dirty_inode >> ->ext4_mark_inode_dirty >> ->ext4_mark_iloc_dirty >> ->inode_inc_iversion(inode); > > It's not necessary to add a anothre call to inode_inc_version() since > we already incremented the i_version if S_VERSION is set, and > S_VERSIOn gets set when it's necessary to handle incrementing > i_Version. > > The inode_inc_iversion() in mark4_ext4_iloc_dirty() is probably not > necessary, since we already should be incrementing i_version whenever > ctime and mtime gets updated. The inode_inc_iversion() there is more > of a "belt and suspenders" safety thing, on the theory that the extra > bump in i_version won't hurt anything. > > Cheers, > > - Ted
Attachment:
signature.asc
Description: PGP signature