On 10/04/2011 01:38 AM, Lukas Czerner wrote:
On Mon, 3 Oct 2011, Allison Henderson wrote:
Hi all,
I've been working on locating all the existing uses of i_mutex in the current
ext4 code because I know we are planning to reduce the usage of i_mutex in
ext4. So I've gone through the ext4 code and also the vfs code and come up
with a list of ext4 items that appear to be protected under i_mutex. I'm
thinking about doing a patch to replace i_mutex with a private ext4 mutex, and
I wanted to update folks on this idea and pick up any feed back people might
have.
I'm thinking maybe we can have a separate mutex for functions that only modify
meta data like ext4_ioctl and ext4_setattr to help relieve unneeded
contention. And then the rest of functions that are modifying data can go
under a data mutex (including truncate since sometimes ext4_ioctl and
ext4_setattr will call ext4_truncate if they modify i_size).
Just the other day I was talking with Christoph (adding him to cc) about
this, but unfortunately I still did not have time to look at this, but I
am glad that someone did.
His suggestion was a bit more general than creating separate ext4
specific mutex. His idea was to change i_mutex to union of plain mutex
for directories and a rwlock for regular files. Then this union can be
used in other file systems as well, for example to replace xfs_iolock in
xfs.
Also it might be nice to do something smarter than just a rwlock for
regular files. It would be nice to have an structure of extent locks, so
we can use it for file system using extents, which will improve
scalability while hammering a single file from different processes.
Note that currently ext4 concurrent read/write are atomic only wrt
individual pages, but not on the system call as the whole. This might
cause read() to return data mixed from several different writes, which
is not posix conform. That could be solved with the generic rwlock for
files, or even better with the system of extent locking.
But Christoph, can probably describe hi idea a bit better.
Thanks!
-Lukas
Hi Lukas,
Sorry for the delay, and thanks for the response :) Alrighty, I will
have to do some prototyping and see if I can work in some of these
concepts into a solution. At the moment, Im trying to make sure I come
up with something that still provides all the existing functionality so
I dont introduce any new race problems, but there's certainly a lot of
room for optimizing too. Thx!
Allison Henderson
So these are ext4 functions that currently lock i_mutex:
ext4_sync_file
ext4_fallocate
ext4_move_extents via two helper routines:
mext_inode_double_lock and mext_inode_double_unlock
ext4_ioctl (for the EXT4_IOC_SETFLAGS ioctl)
ext4_quota_write
ext4_llseek
ext4_end_io_work
ext4_evict_inode (only while calling ext4_flush_completed_IO)
ext4_ind_direct_IO (only while calling ext4_flush_completed_IO)
And these are ext4 functions that have i_mutex locked by the vfs layer. So we
will need to lock the new private mutex here too if we want them to be
synchronous with the above functions.
ext4_setattr
ext4_da_writepages
ext4_rmdir
ext4_unlink
ext4_symlink
ext4_link
ext4_rename
And one unique case:
ext4_fiemap calls generic_block_fiemap and passes it a function pointer to
ext4_get_block. generic_block_fiemap will lock i_mutex before calling the
pointer. I dont think ext4_get_block needs i_mutex locked all the time, so I
think we can just make a wrapper for ext4_get_block that locks the new private
mutex and then we can pass a pointer to the wrapper.
That's my list so far, if anyone knows of one I missed please let me know, and
also if you spot any other places where we can reduce unneeded contention by
using a separate lock. Thx!
Allison Henderson
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html