On Wed 12-07-17 09:55:48, Mike Kravetz wrote: > On 07/12/2017 04:46 AM, Michal Hocko wrote: > > On Tue 11-07-17 11:23:19, Mike Kravetz wrote: > >> On 07/11/2017 05:36 AM, Michal Hocko wrote: > > [...] > >>> Anyway the patch should fail with -EINVAL on private mappings as Kirill > >>> already pointed out > >> > >> Yes. I think this should be a separate patch. As mentioned earlier, > >> mremap today creates a new/additional private mapping if called in this > >> way with old_size == 0. To me, this is a bug. > > > > Not only that. It clears existing ptes in the old mapping so the content > > is lost. That is quite unexpected behavior. Now it is hard to assume > > whether somebody relies on the behavior (I can easily imagine somebody > > doing backup&clear in atomic way) so failing with EINVAL might break > > userspace so I am not longer sure. Anyway this really needs to be > > documented. > > I am pretty sure it does not clear ptes in the old mapping, or modify it > in any way. Are you thinking they are cleared as part of the call to > move_page_tables? Since old_size == 0 (len as passed to move_page_tables), > the for loop in move_page_tables is not run and it doesn't do much of > anything in this case. Dang. I have completely missed that we give old_len as the len parameter. Then it is clear that this old_len == 0 trick never really worked for MAP_PRIVATE because it simply fails the main invariant that the content at the new location matches the old one. Care to send a patch to clarify that and sent EINVAL or should I do it? > My plan is to look into adding hugetlbfs support to memfd_create, as this > would meet the user's needs. And, this is a much more sane API than this > mremap(old_size == 0) behavior. agreed > If adding hugetlbfs support to memfd_create works out, I would like to > see mremap(old_size == 0) support dropped. Nobody here (kernel mm > development) seems to like it. However, as you note there may be somebody > depending on this behavior. What would be the process for removing > such support? AFAIK, it is not documented anywhere. If we do document > the behavior, then we will certainly be stuck with it for a long time. I would rather document it than remove it. From the past we know that there are users and my experience tells me that once something is used it lives its life for ever basically. And moreover it is not like this costs us any maintenance burden to support the hack. Just make it more obvious so that we do not have to rediscover it each time. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html