Hi Hugh, first of all, please let me apologize for the delay in my response and thank you for your extensive review! On Sun, 30 Sep 2012, Hugh Dickins wrote: > Andrea's point about ksm_migrate_page() is an important one, and I've > answered that against his mail, but here's some other easier points. > > On Mon, 24 Sep 2012, Petr Holasek wrote: > > > Introduces new sysfs boolean knob /sys/kernel/mm/ksm/merge_across_nodes > > which control merging pages across different numa nodes. > > When it is set to zero only pages from the same node are merged, > > otherwise pages from all nodes can be merged together (default behavior). > > ... > > > > v4: - merge_nodes was renamed to merge_across_nodes > > - share_all debug knob was dropped > > Yes, you were right to drop the share_all knob for now. I do like the > idea, but it was quite inappropriate to mix it in with this NUMAnode > patch. And although I like the idea, I think it wants a bit more: I > already have a hacky "allksm" boot option patch to mm/mmap.c for my > own testing, which adds VM_MERGEABLE everywhere. If I gave that up > (I'd like to!) in favour of yours, I think I would still be missing > all the mms that are created before there's a chance to switch the > share_all mode on. Or maybe I misread your v3. Anyway, that's a > different topic, happily taken off today's agenda. > Agreed, it hid original purpose of this patch and made it more difficult for eventual merging. So let's move it lower on the ksm todo list for this time :) > > diff --git a/mm/ksm.c b/mm/ksm.c > > index 47c8853..7c82032 100644 > > --- a/mm/ksm.c > > +++ b/mm/ksm.c > > @@ -36,6 +36,7 @@ > > #include <linux/hash.h> > > #include <linux/freezer.h> > > #include <linux/oom.h> > > +#include <linux/numa.h> > > > > #include <asm/tlbflush.h> > > #include "internal.h" > > @@ -140,7 +141,10 @@ struct rmap_item { > > unsigned long address; /* + low bits used for flags below */ > > unsigned int oldchecksum; /* when unstable */ > > union { > > - struct rb_node node; /* when node of unstable tree */ > > + struct { > > + struct rb_node node; /* when node of unstable tree */ > > + struct rb_root *root; > > + }; > > This worries me a little, enlarging struct rmap_item: there may > be very many of them in the system, best to minimize their size. > > (This struct rb_root *root is used for one thing only, isn't it? For the > unstable rb_erase() in remove_rmap_item_from_tree(). It annoys me that > we need to record root just for that, but I don't see an alternative.) Yes, I've played a quite lot with this issue, but wasn't able to find an alternative solution, too. > > The 64-bit case can be easily improved by locating unsigned int nid > after oldchecksum instead. The 32-bit case (which supports smaller > NODES_SHIFT: 6 was the largest I found) could be "improved" by keeping > nid in the lower bits of address along with (moved) UNSTABLE_FLAG and > STABLE_FLAG and reduced SEQNR_MASK - we really need only 1 bit for that, > but 2 bits would be nice for keeping the BUG_ON(age > 1). > > But you may think I'm going too far there, and prefer just to put > #ifdef CONFIG_NUMA around the unsigned int nid, so at least it does > not enlarge the more basic 32-bit configuration. > I like your idea of unsigned int nid, will implement it in next version. ... > > > > - for (node = rb_first(&root_stable_tree); node; node = rb_next(node)) { > > - struct stable_node *stable_node; > > + for (i = 0; i < MAX_NUMNODES; i++) > > It's irritating to have to do this outer nid loop, but I think you're > right, that the memory hotremove notification does not quite tell us > which node to look at. Or can we derive that from start_pfn? Would > it be safe to assume that end_pfn-1 must be in the same node? > I had assumed that we can't rely on end_pfn-1 in the same node, but now mm/memory_hotremove.c looks to me that we can rely on it because memory hotremove callback is triggered for each zone where we are removing memory. So I think yes, we can optimize it in the way you mentioned above. If I am wrong correct me, please :) ... > > Looks nice - thank you. > Thanks for your help! Petr -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>