On 6/1/2016 10:00 PM, Minchan Kim wrote:
On Wed, Jun 01, 2016 at 01:40:48PM -0700, Shi, Yang wrote:
On 5/29/2016 11:11 PM, Minchan Kim wrote:
On Fri, May 27, 2016 at 11:16:41AM -0700, Shi, Yang wrote:
<snip>
If we goes this way, how to guarantee this race?
Thanks for pointing out this. It sounds reasonable. However, this
should be only possible to happen on 32 bit since just 32 bit
version page_is_idle() calls lookup_page_ext(), it doesn't do it on
64 bit.
And, such race condition should exist regardless of whether DEBUG_VM
is enabled or not, right?
rcu might be good enough to protect it.
A quick fix may look like:
diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h
index 8f5d4ad..bf0cd6a 100644
--- a/include/linux/page_idle.h
+++ b/include/linux/page_idle.h
@@ -77,8 +77,12 @@ static inline bool
test_and_clear_page_young(struct page *page)
static inline bool page_is_idle(struct page *page)
{
struct page_ext *page_ext;
+
+ rcu_read_lock();
page_ext = lookup_page_ext(page);
+ rcu_read_unlock();
+
if (unlikely(!page_ext))
return false;
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 56b160f..94927c9 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -183,7 +183,6 @@ struct page_ext *lookup_page_ext(struct page *page)
{
unsigned long pfn = page_to_pfn(page);
struct mem_section *section = __pfn_to_section(pfn);
-#if defined(CONFIG_DEBUG_VM) || defined(CONFIG_PAGE_POISONING)
/*
* The sanity checks the page allocator does upon freeing a
* page can reach here before the page_ext arrays are
@@ -195,7 +194,7 @@ struct page_ext *lookup_page_ext(struct page *page)
*/
if (!section->page_ext)
return NULL;
-#endif
+
return section->page_ext + pfn;
}
@@ -279,7 +278,8 @@ static void __free_page_ext(unsigned long pfn)
return;
base = ms->page_ext + pfn;
free_page_ext(base);
- ms->page_ext = NULL;
+ rcu_assign_pointer(ms->page_ext, NULL);
+ synchronize_rcu();
How does it fix the problem?
I cannot understand your point.
Assigning NULL pointer to page_Ext will be blocked until
rcu_read_lock critical section is done, so the lookup and writing
operations will be serialized. And, rcu_read_lock disables preempt
too.
I meant your rcu_read_lock in page_idle should cover test_bit op.
Yes, definitely. Thanks for catching it.
One more thing, you should use rcu_dereference.
I will check which one is the best since I saw some use rcu_assign_pointer.
As well, please cover memory onlining case I mentioned in another
thread as well as memory offlining.
I will look into it too.
Thanks,
Yang
Anyway, to me, every caller of page_ext should prepare lookup_page_ext
can return NULL anytime and they should use rcu_read_[un]lock, which
is not good. :(
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>