On Thu, 29 Nov 2012 14:54:58 +0800 Lin Feng <linfeng@xxxxxxxxxxxxxx> wrote: > Hi all, > > We encounter a "Resource temporarily unavailable" fail while trying > to offline a memory section in a movable zone. We found that there are > some pages can't be migrated. The offline operation fails in function > migrate_page_move_mapping() returning -EAGAIN till timeout because > the if assertion 'page_count(page) != 1' fails. > I wonder in the case 'page_count(page) != 1', should we always wait > (return -EAGAING)? Or in other words, can we do something here for > migration if we know where the pages from? > > And finally found that such pages are used by /sbin/multipathd in the form > of aio ring_pages. Besides once increment introduced by the offline calling > chain, another increment is added by aio_setup_ring() via callling > get_userpages(), it won't decrease until we call aio_free_ring(). > > The dump_page info in the offline context is showed as following: > page:ffffea0011e69140 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1d > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > page:ffffea0011fb0480 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1c > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > page:ffffea0011fbaa80 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1a > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > page:ffffea0011ff21c0 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1b > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable) > > The multipathd seems never going to release the ring_pages until we reboot the box. > Furthermore, if some guy makes app which only calls io_setup() but never calls > io_destroy() for the reason that he has to keep the io_setup() for a long time > or just forgets to or even on purpose that we can't expect. > So I think the mm-hotplug framwork should get the capability to deal with such > situation. And should we consider adding migration support for such pages? > > However I don't know if there are any other kinds of such particular pages in > current kernel/Linux system. If unluckily there are many apparently it's hard to > handle them all, just adding migrate support for aio ring_pages is insufficient. > > But if luckily can we use the private field of page struct to track the > ring_pages[] pointer so that we can retrieve the user when migrate? > Doing so another problem occurs, how to distinguish such special pages? > Use pageflag may cause an impact on current pageflag layout, add new pageflag > item also seems to be impossible. > > I'm not sure what way is the right approach, seeking for help. > Any comments are extremely needed, thanks :) Tricky. I expect the same problem would occur with pages which are under O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long periods, but the durations could still be lengthy (seconds). Worse is a futex page, which could easily remain pinned indefinitely. The best I can think of is to make changes in or around get_user_pages(), to steal the pages from userspace and replace them with non-movable ones before pinning them. The performance cost of something like this would surely be unacceptable for direct-io, but maybe OK for the aio ring and futexes. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html