On Thu, 15 Oct 2015, Andrea Arcangeli wrote: > The stable_nodes can become stale at any time if the underlying pages > gets freed. The stable_node gets collected and removed from the stable > rbtree if that is detected during the rbtree tree lookups. > > Don't fail the lookup if running into stale stable_nodes, just restart > the lookup after collecting the stale entries. Otherwise the CPU spent > in the preparation stage is wasted and the lookup must be repeated at > the next loop potentially failing a second time in a second stale > entry. > > This also will contribute to pruning the stable tree and releasing the > stable_node memory more efficiently. > > Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> I'll say Acked-by: Hugh Dickins <hughd@xxxxxxxxxx> as a gesture of goodwill, but in honesty I'm sitting on the fence, and couldn't decide. I think I've gone back and forth on this in my own mind in the past, worried that we might get stuck a long time going back round to "again". In the past I've felt that to give up with NULL is consistent with KSM's willingness to give way to any obstruction; but if you're finding "goto again" a better strategy, sure, go ahead. And at least there's a cond_resched() just above the diff context shown. A dittoed nit below... > --- > mm/ksm.c | 30 +++++++++++++++++++++++++++--- > 1 file changed, 27 insertions(+), 3 deletions(-) > > diff --git a/mm/ksm.c b/mm/ksm.c > index 39ef485..929b5c2 100644 > --- a/mm/ksm.c > +++ b/mm/ksm.c > @@ -1225,7 +1225,18 @@ again: > stable_node = rb_entry(*new, struct stable_node, node); > tree_page = get_ksm_page(stable_node, false); > if (!tree_page) > - return NULL; > + /* > + * If we walked over a stale stable_node, > + * get_ksm_page() will call rb_erase() and it > + * may rebalance the tree from under us. So > + * restart the search from scratch. Returning > + * NULL would be safe too, but we'd generate > + * false negative insertions just because some > + * stable_node was stale which would waste CPU > + * by doing the preparation work twice at the > + * next KSM pass. > + */ > + goto again; When a comment gets that long, in fact even if it were only one line, I'd much prefer that block inside braces. I think I noticed Linus feeling the same way a few days ago, when he fixed up someone's patch. > > ret = memcmp_pages(page, tree_page); > put_page(tree_page); > @@ -1301,12 +1312,14 @@ static struct stable_node *stable_tree_insert(struct page *kpage) > unsigned long kpfn; > struct rb_root *root; > struct rb_node **new; > - struct rb_node *parent = NULL; > + struct rb_node *parent; > struct stable_node *stable_node; > > kpfn = page_to_pfn(kpage); > nid = get_kpfn_nid(kpfn); > root = root_stable_tree + nid; > +again: > + parent = NULL; > new = &root->rb_node; > > while (*new) { > @@ -1317,7 +1330,18 @@ static struct stable_node *stable_tree_insert(struct page *kpage) > stable_node = rb_entry(*new, struct stable_node, node); > tree_page = get_ksm_page(stable_node, false); > if (!tree_page) > - return NULL; > + /* > + * If we walked over a stale stable_node, > + * get_ksm_page() will call rb_erase() and it > + * may rebalance the tree from under us. So > + * restart the search from scratch. Returning > + * NULL would be safe too, but we'd generate > + * false negative insertions just because some > + * stable_node was stale which would waste CPU > + * by doing the preparation work twice at the > + * next KSM pass. > + */ > + goto again; Ditto. > > ret = memcmp_pages(kpage, tree_page); > put_page(tree_page); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>