Re: [PATCH] hugetlbfs: move resv_map to hugetlbfs_inode_info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2019/4/16 1:11, Mike Kravetz wrote:
On 4/15/19 2:15 AM, Michal Hocko wrote:
On Mon 15-04-19 06:16:15, Naoya Horiguchi wrote:
On Fri, Apr 12, 2019 at 04:40:01PM -0700, Mike Kravetz wrote:
On 4/11/19 9:02 PM, Yufen Yu wrote:
Commit 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
...
However, for inode mode that is 'S_ISBLK', hugetlbfs_evict_inode() may
free or modify i_mapping->private_data that is owned by bdev inode,
which is not expected!
...
We fix the problem by moving resv_map to hugetlbfs_inode_info. It may
be more reasonable.
Your patches force me to consider these potential issues.  Thank you!

The root of all these problems (including the original leak) is that the
open of a block special inode will result in bd_acquire() overwriting the
value of inode->i_mapping.  Since hugetlbfs inodes normally contain a
resv_map at inode->i_mapping->private_data, a memory leak occurs if we do
not free the initially allocated resv_map.  In addition, when the
inode is evicted/destroyed inode->i_mapping may point to an address space
not associated with the hugetlbfs inode.  If code assumes inode->i_mapping
points to hugetlbfs inode address space at evict time, there may be bad
data references or worse.
Let me ask a kind of elementary question: is there any good reason/purpose
to create and use block special files on hugetlbfs?  I never heard about
such usecases.
I am not aware of this as a common use case.  Yufen Yu may be able to provide
more details about how the issue was discovered.  My guess is that it was
discovered via code inspection.

In fact, we discover the issue by running syzkaller. The program like:

15:39:59 executing program 0:
r0 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file0/file0\x00', 0x44000, 0x1)
r1 = syz_open_dev$vcsn(&(0x7f00000000c0)='/dev/vcs#\x00', 0x3f, 0x202000)
renameat2(r0, &(0x7f0000000140)='./file0\x00', r0, &(0x7f0000000180)='./file0/file0/file0\x00', 0x4)
mkdir(&(0x7f0000000300)='./file0\x00', 0x0)
mount(0x0, &(0x7f0000000200)='./file0\x00', &(0x7f0000000240)='hugetlbfs\x00', 0x0, 0x0) mknod$loop(&(0x7f0000000000)='./file0/file0\x00', 0x6000, 0xffffffffffffffff)

Yufen
Thanks


                 I guess that the conflict of the usage of ->i_mapping is
discovered recently and that's because block special files on hugetlbfs are
just not considered until recently or well defined.  So I think that we might
be better to begin with defining it first.
Unless I am mistaken, this is just like creating a device special file
in any other filesystem.  Correct?  hugetlbfs is just some place for the
inode/file to reside.  What happens when you open/ioctl/close/etc the file
is really dependent on the vfs layer and underlying driver.

A absolutely agree. Hugetlbfs is overly complicated even without that.
So if this is merely "we have tried it and it has blown up" kinda thing
then just refuse the create blockdev files or document it as undefined.
You need a root to do so anyway.
Can we just refuse to create device special files in hugetlbfs?  Do we need
to worry about breaking any potential users?  I honestly do not know if anyone
does this today.  However, if they did I believe things would "just work".
The only known issue is leaking a resv_map structure when the inode is
destroyed.  I doubt anyone would notice that leak today.

Let me do a little more research.  I think this can all be cleaned up by
making hugetlbfs always operate on the address space embedded in the inode.
If nothing else, a change or explanation should be added as to why most code
operates on inode->mapping and one place operates on &inode->i_data.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux