On Fri, Nov 09, 2012 at 03:42:57PM +0100, Andrea Arcangeli wrote: > Hi Mel, > > On Tue, Nov 06, 2012 at 09:14:36AM +0000, Mel Gorman wrote: > > This series addresses part of the integration and sharing problem by > > implementing a foundation that either the policy for schednuma or autonuma > > can be rebased on. The actual policy it implements is a very stupid > > greedy policy called "Migrate On Reference Of pte_numa Node (MORON)". > > While stupid, it can be faster than the vanilla kernel and the expectation > > is that any clever policy should be able to beat MORON. The advantage is > > that it still defines how the policy needs to hook into the core code -- > > scheduler and mempolicy mostly so many optimisations (s uch as native THP > > migration) can be shared between different policy implementations. > > I haven't had much time to look into it yet, because I've been > attending KVM Forum the last few days, That's fine. I knew you were travelling and that there would be delay. > but this foundation looks ok > with me as a starting base and I ack it for merging it upstream. I'll > try to rebase on top of this and send you some patches. > Thanks, that's great news! It's not quite ready for merging yet. I found a few bugs in the foundation that I ironed out since and I would like to have better figures for specjbb. With that in mind I'm still in the process of implementing something like cpu-follow-memory on top. I'll post it early next week even if the figures are crap for the purposes of illustration and to get the existing fixes out there. Even you think the version of the cpu-follow implementation is complete crap you'll at least see what I thought the integration points would look like and we'll come up with an alternative. My hope is that we layer the smallest amount on top each iteration with benchmark validation at each step until we get something approaching autonuma or schednumas in terms of performance. Which one we use as the performance target will depend on whether schednuma or autonuma was better on that particular test. I'll be using mmtests on a 4-node machine each step but obviously other testers would be very welcome. As things stand right now I just finished a script to show where threads and running and what their per-node memory usage is and it's showing that specjbb threads are not converging at all. I'm not losing sleep over it just yet as I would be incredibly surprised if I got this right first time even with having schednuma and autonuma to look at :) . > > Patch 14 adds a MPOL_MF_LAZY mempolicy that an interested application can use. > > On the next reference the memory should be migrated to the node that > > references the memory. > > This approach of starting with a stripped down foundation won't allow > for easy backportability anyway, so merging the userland API at the > first step shouldn't provide any benefit for the work that is ahead of > us. I would leave this for later and not part of the foundation. > This needs a bit more consensus. I'm happy to drop the userspace API until all this settles down but will initially try and keep the internal mempolicy aspects. Initially I preserved the userspace API because I understood Peter's logic that we should help application developers as much as possible before depending entirely on the automatic approach offered by both autonuma and schednuma. Peter? > All we need is a failsafe runtime and boot time turn off knob, just in > case. Yes, fully agreed. It's on the TODO list and I consider it a requirement before it's merged. THP experience has told us that being able to turn it off at runtime was very handy for debugging. Thanks Andrea. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>