On Tue, May 21, 2013 at 10:07:52AM +0800, Tang Chen wrote: .... > I'm not saying using two callbacks before and after migration is better. > I don't want to use address_space_operations is because there is no such > member > for anonymous pages. That depends on the nature of the pinning. For the general case of get_user_pages(), you're correct that it won't work for anonymous memory. > In your idea, using a file mapping will create a > address_space_operations. But > I really don't think we can modify the way of memory allocation for all the > subsystems who has this problem. Maybe not just aio and cma. That means if > you want to pin pages in memory, you have to use a file mapping. This makes > the memory allocation more complicated. And the idea should be known by all > the subsystem developers. Is that going to happen ? Different subsystems will need to use different approaches to fixing the issue. I doubt any single approach will work for everything. > I also thought about reuse one field of struct page. But as you said, there > may not be many users of this functionality. Reusing a field of struct page > will make things more complicated and lead to high coupling. What happens when more than one subsystem tries to pin a particular page? What if it's a shared page rather than an anonymous page? > So, how about the other idea that Mel mentioned ? > > We create a 1-1 mapping of pinned page ranges and the pinner (subsystem > callbacks and data), maybe a global list or a hash table. And then, we can > find the callbacks. Maybe that is the simplest approach, but it's going to make get_user_pages() slower and more complicated (as if it wasn't already). Maybe with all the bells and whistles of per-cpu data structures and such you can make it work, but I'm pretty sure someone running the large unmentionable benchmark will complain about the performance regressions you're going to introduce. At least in the case of the AIO ring buffer, using the address_space approach doesn't introduce any new performance issues. There's also the bigger question of if you can or cannot exclude get_user_pages_fast() from this. In short: you've got a lot more work on your hands to do. > Thanks. :) Cheers, -ben -- "Thought is the essence of where you are now." -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html