On 08/27/2011 02:33 AM, Yongqiang Yang wrote:
On Sat, Aug 27, 2011 at 5:04 PM, Yongqiang Yang<xiaoqiangnk@xxxxxxxxx> wrote:
On Sat, Aug 27, 2011 at 6:35 AM, Allison Henderson
<achender@xxxxxxxxxxxxxxxxxx> wrote:
On 08/25/2011 07:53 PM, Yongqiang Yang wrote:
Hi Allison,
Currently, punch hole flushes all pages to disk and releases pages in
page cache, and then calls ext4_ext_map_blocks.
Assume that if a new page in the punching's range is mapped after
releasing pages and before down_write i_data_sem,
then ext4_ext_map_blocks will release map info of the page in extent
tree. However, up layers does not know this, and they think the page
is mapped.
I can not find how punch hole handle the situation above. Could you
shed a light on it?
Hi Yongqiang
This is a really good question and at the moment Im still looking into it.
:) The calling sequence in punch hole was modeled after truncate, which
also only locks i_data_sem when modifying the extent tree.
ext4_ext_map_blocks when called with the punch hole flag, only releases
blocks in the extent tree, using the same routines truncate does, but it
does not modify the state of the pages. Though that still does not prevent
the race condition you describe, so I am still investigating it.
I've found that I can catch a lot of race conditions by simply running the
stress test over night, and so far I havnt had anything like this come up,
but that certainly doesnt mean its not there. I will let you know what I
find. Thx!
Hi Allison,
I had a look at truncate code, truncates and writes are serialized by
inode->i_mutex in vfs layer, but fallocate does not take i_mutex, so
we need to take i_mutex in punching hole as well, I think. Fallocate
behaves differently with punching hole, so it is safe without taking
i_mutex.
It seems that race exists between reads and punching hole as well. If
a read comes after releasing pages and before down_write(i_data_sem),
then a page will be mapped, if the page is written later, it will
introduce an error. truncate avoids this situation by set file size
before truncating pages.
Yongqiang.
Hi Yongqiang,
Alrighty, I found the code for truncate that you are referring to and
what you are saying makes a lot of sense, so I will add a fix for it in
the punch hole patch set I am working on at the moment. Thx for finding
this one for me :)
Allison Henderson
What's your opinion?
Yongqiang.
Allison Henderson
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html