On 2019/4/13 7:40, Mike Kravetz wrote:
This specific part of the patch made me think,
@@ -497,12 +497,15 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
static void hugetlbfs_evict_inode(struct inode *inode)
{
struct resv_map *resv_map;
+ struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
remove_inode_hugepages(inode, 0, LLONG_MAX);
- resv_map = (struct resv_map *)inode->i_mapping->private_data;
+ resv_map = info->resv_map;
/* root inode doesn't have the resv_map, so we should check it */
- if (resv_map)
+ if (resv_map) {
resv_map_release(&resv_map->refs);
+ info->resv_map = NULL;
+ }
clear_inode(inode);
}
If inode->i_mapping may not be associated with the hugetlbfs inode, then
remove_inode_hugepages() will also have problems. It will want to operate
on the address space associated with the inode. So, there are more issues
than just the resv_map. When I looked at the first few lines of
remove_inode_hugepages(), I was surprised to see:
struct address_space *mapping = &inode->i_data;
Good catch!
So remove_inode_hugepages is explicitly using the original address space
that is embedded in the inode. As a result, it is not impacted by changes
to inode->i_mapping. Using git history I was unable to determine why
remove_inode_hugepages is the only place in hugetlbfs code doing this.
With this in mind, a simple change like the following will fix the original
leak issue as well as the potential issues mentioned in this patch.
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 53ea3cef526e..9f0719bad46f 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -511,6 +511,11 @@ static void hugetlbfs_evict_inode(struct inode *inode)
{
struct resv_map *resv_map;
+ /*
+ * Make sure we are operating on original hugetlbfs address space.
+ */
+ inode->i_mapping = &inode->i_data;
+
remove_inode_hugepages(inode, 0, LLONG_MAX);
resv_map = (struct resv_map *)inode->i_mapping->private_data;
/* root inode doesn't have the resv_map, so we should check it */
I don't know why hugetlbfs code would ever want to operate on any address
space but the one embedded within the inode. However, my uderstanding of
the vfs layer is somewhat limited. I'm wondering if the hugetlbfs code
(helper routines mostly) should perhaps use &inode->i_data instead of
inode->i_mapping. Does it ever make sense for hugetlbfs code to operate
on inode->i_mapping if inode->i_mapping != &inode->i_data ?
I also feel very confused.
Yufen
thanks.