On 4/7/21 12:24 AM, Miaohe Lin wrote: > Hi: > On 2021/4/7 10:49, Mike Kravetz wrote: >> On 4/2/21 2:32 AM, Miaohe Lin wrote: >>> The resv_map could be NULL since this routine can be called in the evict >>> inode path for all hugetlbfs inodes. So we could have chg = 0 and this >>> would result in a negative value when chg - freed. This is unexpected for >>> hugepage_subpool_put_pages() and hugetlb_acct_memory(). >> >> I am not sure if this is possible. >> >> It is true that resv_map could be NULL. However, I believe resv map >> can only be NULL for inodes that are not regular or link inodes. This >> is the inode creation code in hugetlbfs_get_inode(). >> >> /* >> * Reserve maps are only needed for inodes that can have associated >> * page allocations. >> */ >> if (S_ISREG(mode) || S_ISLNK(mode)) { >> resv_map = resv_map_alloc(); >> if (!resv_map) >> return NULL; >> } >> > > Agree. > >> If resv_map is NULL, then no hugetlb pages can be allocated/associated >> with the file. As a result, remove_inode_hugepages will never find any >> huge pages associated with the inode and the passed value 'freed' will >> always be zero. >> > > But I am confused now. AFAICS, remove_inode_hugepages() searches the address_space of > the inode to remove the hugepages while does not care if inode has associated resv_map. > How does it prevent hugetlb pages from being allocated/associated with the file if > resv_map is NULL? Could you please explain this more? > Recall that there are only two ways to get huge pages associated with a hugetlbfs file: fallocate and mmap/write fault. Directly writing to hugetlbfs files is not supported. If you take a closer look at hugetlbfs_get_inode, it has that code to allocate the resv map mentioned above as well as the following: switch (mode & S_IFMT) { default: init_special_inode(inode, mode, dev); break; case S_IFREG: inode->i_op = &hugetlbfs_inode_operations; inode->i_fop = &hugetlbfs_file_operations; break; case S_IFDIR: inode->i_op = &hugetlbfs_dir_inode_operations; inode->i_fop = &simple_dir_operations; /* directory inodes start off with i_nlink == 2 (for "." entry) */ inc_nlink(inode); break; case S_IFLNK: inode->i_op = &page_symlink_inode_operations; inode_nohighmem(inode); break; } Notice that only S_IFREG inodes will have i_fop == &hugetlbfs_file_operations. hugetlbfs_file_operations contain the hugetlbfs specific mmap and fallocate routines. Hence, only files with S_IFREG inodes can potentially have associated huge pages. S_IFLNK inodes can as well via file linking. If an inode is not S_ISREG(mode) || S_ISLNK(mode), then it will not have a resv_map. In addition, it will not have hugetlbfs_file_operations and can not have associated huge pages. I looked at this closely when adding commits 58b6e5e8f1ad hugetlbfs: fix memory leak for resv_map f27a5136f70a hugetlbfs: always use address space in inode for resv_map pointer I may not be remembering all of the details correctly. Commit f27a5136f70a added the comment that resv_map could be NULL to hugetlb_unreserve_pages. -- Mike Kravetz