Re: [PATCH RFC 02/12] mm: add config option and per-NUMA node VMS support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/3/2024 10:43 PM, Christoph Lameter (Ampere) wrote:
> On Thu, 28 Dec 2023, artem.kuzin@xxxxxxxxxx wrote:
>
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -626,7 +628,14 @@ struct mm_struct {
>>         unsigned long mmap_compat_legacy_base;
>> #endif
>>         unsigned long task_size;    /* size of task vm space */
>> -        pgd_t * pgd;
>> +#ifndef CONFIG_KERNEL_REPLICATION
>> +        pgd_t *pgd;
>> +#else
>> +        union {
>> +            pgd_t *pgd;
>> +            pgd_t *pgd_numa[MAX_NUMNODES];
>> +        };
>> +#endif
>
>
> Hmmm... This is adding the pgd pointers for all mm_structs. But we only need the numa pgs pointers for the init_mm. Can this be a separate variable? There are some architecures with larger number of nodes.
>
>
>

Hi, Christoph.

Sorry for such delay with the reply.

We already have per-NUMA node init_mm, but this is not enough.
We need this array of pointers in the task struct due to the proper pgd (per-NUMA node) should be used for threads of process that occupy more than one NUMA node.
On x86 we have one translation table per-process that contains both kernel and user space part. In case of kernel text and rodata replication enabled, we need to take
into account per-NUMA node kernel text and rodata replicas during the context switch and etc. For example, if particular thread runs a system call, we need to use the
kernel replica that corresponds to the NUMA node the thread running on. At the same time, the process can occupy several NUMA nodes, and the threads running on different
NUMA nodes should observe one user space version, but different kernel versions (per-NUMA node replicas).

But you are right that this place should be optimized. We no need this array for the processes that not expected to work in cross-NUMA node way. Possibly, we
need to implement some "lazy" approach for per-NUMA node translation tables allocation. Current version of kernel replication support is implemented in a way
when we try to do all the things as simple as possible.

Thank you!

Best regards,
Artem





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux