Re: [PATCH 07/33] autonuma: mm_autonuma and task_autonuma data structures

Mel Gorman <mel@xxxxxxxxx> · Thu, 11 Oct 2012 13:28:27 +0100

On Thu, Oct 04, 2012 at 01:50:49AM +0200, Andrea Arcangeli wrote:
> Define the two data structures that collect the per-process (in the
> mm) and per-thread (in the task_struct) statistical information that
> are the input of the CPU follow memory algorithms in the NUMA
> scheduler.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> ---
>  include/linux/autonuma_types.h |  107 ++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 107 insertions(+), 0 deletions(-)
>  create mode 100644 include/linux/autonuma_types.h
> 
> diff --git a/include/linux/autonuma_types.h b/include/linux/autonuma_types.h
> new file mode 100644
> index 0000000..9673ce8
> --- /dev/null
> +++ b/include/linux/autonuma_types.h
> @@ -0,0 +1,107 @@
> +#ifndef _LINUX_AUTONUMA_TYPES_H
> +#define _LINUX_AUTONUMA_TYPES_H
> +
> +#ifdef CONFIG_AUTONUMA
> +
> +#include <linux/numa.h>
> +
> +
> +/*
> + * Per-mm (per-process) structure that contains the NUMA memory
> + * placement statistics generated by the knuma scan daemon. This
> + * structure is dynamically allocated only if AutoNUMA is possible on
> + * this system. They are linked togehter in a list headed within the

s/togehter/together/

> + * knumad_scan structure.
> + */
> +struct mm_autonuma {

Nit but this is very similar in principle to mm_slot for transparent
huge pages. It might be worth renaming both to mm_thp_slot and
mm_autonuma_slot to set the expectation they are very similar in nature.
Could potentially be made generic but probably overkill.

> +	/* link for knuma_scand's list of mm structures to scan */
> +	struct list_head mm_node;
> +	/* Pointer to associated mm structure */
> +	struct mm_struct *mm;
> +
> +	/*
> +	 * Zeroed from here during allocation, check
> +	 * mm_autonuma_reset() if you alter the below.
> +	 */
> +
> +	/*
> +	 * Pass counter for this mm. This exist only to be able to
> +	 * tell when it's time to apply the exponential backoff on the
> +	 * task_autonuma statistics.
> +	 */
> +	unsigned long mm_numa_fault_pass;
> +	/* Total number of pages that will trigger NUMA faults for this mm */
> +	unsigned long mm_numa_fault_tot;
> +	/* Number of pages that will trigger NUMA faults for each [nid] */
> +	unsigned long mm_numa_fault[0];
> +	/* do not add more variables here, the above array size is dynamic */
> +};

How cache hot is this structure? nodes are sharing counters in the same
cache lines so if updates are frequent this will bounce like a mad yoke.
Profiles will tell for sure but it's possible that some sort of per-cpu
hilarity will be necessary here in the future.

> +
> +extern int alloc_mm_autonuma(struct mm_struct *mm);
> +extern void free_mm_autonuma(struct mm_struct *mm);
> +extern void __init mm_autonuma_init(void);
> +
> +/*
> + * Per-task (thread) structure that contains the NUMA memory placement
> + * statistics generated by the knuma scan daemon. This structure is
> + * dynamically allocated only if AutoNUMA is possible on this
> + * system. They are linked togehter in a list headed within the
> + * knumad_scan structure.
> + */
> +struct task_autonuma {
> +	/* node id the CPU scheduler should try to stick with (-1 if none) */
> +	int task_selected_nid;
> +
> +	/*
> +	 * Zeroed from here during allocation, check
> +	 * mm_autonuma_reset() if you alter the below.
> +	 */
> +
> +	/*
> +	 * Pass counter for this task. When the pass counter is found
> +	 * out of sync with the mm_numa_fault_pass we know it's time
> +	 * to apply the exponential backoff on the task_autonuma
> +	 * statistics, and then we synchronize it with
> +	 * mm_numa_fault_pass. This pass counter is needed because in
> +	 * knuma_scand we work on the mm and we've no visibility on
> +	 * the task_autonuma. Furthermore it would be detrimental to
> +	 * apply exponential backoff to all task_autonuma associated
> +	 * to a certain mm_autonuma (potentially zeroing out the trail
> +	 * of statistical data in task_autonuma) if the task is idle
> +	 * for a long period of time (i.e. several knuma_scand passes).
> +	 */
> +	unsigned long task_numa_fault_pass;
> +	/* Total number of eligible pages that triggered NUMA faults */
> +	unsigned long task_numa_fault_tot;
> +	/* Number of pages that triggered NUMA faults for each [nid] */
> +	unsigned long task_numa_fault[0];
> +	/* do not add more variables here, the above array size is dynamic */
> +};
> +

Same question about cache hotness.

> +extern int alloc_task_autonuma(struct task_struct *tsk,
> +			       struct task_struct *orig,
> +			       int node);
> +extern void __init task_autonuma_init(void);
> +extern void free_task_autonuma(struct task_struct *tsk);
> +
> +#else /* CONFIG_AUTONUMA */
> +
> +static inline int alloc_mm_autonuma(struct mm_struct *mm)
> +{
> +	return 0;
> +}
> +static inline void free_mm_autonuma(struct mm_struct *mm) {}
> +static inline void mm_autonuma_init(void) {}
> +
> +static inline int alloc_task_autonuma(struct task_struct *tsk,
> +				      struct task_struct *orig,
> +				      int node)
> +{
> +	return 0;
> +}
> +static inline void task_autonuma_init(void) {}
> +static inline void free_task_autonuma(struct task_struct *tsk) {}
> +
> +#endif /* CONFIG_AUTONUMA */
> +
> +#endif /* _LINUX_AUTONUMA_TYPES_H */
> 

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>