On Wed, Jan 31, 2024 at 02:43:12PM +0800, Huang, Ying wrote: > Gregory Price <gourry.memverge@xxxxxxxxx> writes: > > > > +static unsigned int weighted_interleave_nodes(struct mempolicy *policy) > > +{ > > + unsigned int node = current->il_prev; > > + > > + if (!current->il_weight || !node_isset(node, policy->nodes)) { > > + node = next_node_in(node, policy->nodes); > > + /* can only happen if nodemask is being rebound */ > > + if (node == MAX_NUMNODES) > > + return node; > > I feel a little unsafe to read policy->nodes at same time of writing in > rebound. Is it better to use a seqlock to guarantee its consistency? > It's unnecessary to be a part of this series though. > I think this is handled already? It is definitely an explicit race condition that is documented elsewhere: /* * mpol_rebind_policy - Migrate a policy to a different set of nodes * * Per-vma policies are protected by mmap_lock. Allocations using per-task * policies are protected by task->mems_allowed_seq to prevent a premature * OOM/allocation failure due to parallel nodemask modification. */ example from slub: do { cpuset_mems_cookie = read_mems_allowed_begin(); zonelist = node_zonelist(mempolicy_slab_node(), pc->flags); ... } while (read_mems_allowed_retry(cpuset_mems_cookie)); quick perusal through other allocators, show similar checks. page_alloc.c - check_retry_cpusetset() filemap.c - filemap_alloc_folio() If we ever want mempolicy to be swappable from outside the current task context, this will have to change most likely - but that's another feature for another day. > > + while (target) { > > + /* detect system default usage */ > > + weight = table ? table[nid] : 1; > > + weight = weight ? weight : 1; > > I found duplicated pattern as above in this patch. Can we define a > function like below to remove the duplication? > > u8 __get_il_weight(u8 *table, int nid) > { > u8 weight; > > weight = table ? table[nid] : 1; > return weight ? : 1; > } > When we implement the system-default array, this will change to: weight = sysfs_table ? sysfs_table[nid] : default_table[nid]; This cleanup will get picked up in that patch set since this code is going to change anyway. > > + if (delta == weight) { > > + /* boundary: resume from next node/weight */ > > + resume_node = next_node_in(node, nodes); > > + resume_weight = weights[resume_node]; > > + } else { > > + /* remainder: resume this node w/ remainder */ > > + resume_node = node; > > + resume_weight = weight - delta; > > + } > > If we are comfortable to leave resume_weight == 0, then the above > branch can be simplified to. > > resume_node = node; > resume_weight = weight - delta; > > But, this is a style issue again. I will leave it to you to decide. Good point, and in fact there's a similar branch in the first half of the function that can be simplified. Will follow up with a style patch. mm/mempolicy.c | 21 ++++----------------- 1 file changed, 4 insertions(+), 17 deletions(-) My favorite style of patch :D Andrew if you happen to be monitoring, this is the patch (not tested yet, but it's pretty obvious, otherwise i'll submit individually tomorrow). diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 2c1aef8eab70..b0ca9bcdd64c 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2405,15 +2405,9 @@ static unsigned long alloc_pages_bulk_array_weighted_interleave(gfp_t gfp, page_array += nr_allocated; total_allocated += nr_allocated; /* if that's all the pages, no need to interleave */ - if (rem_pages < weight) { - /* stay on current node, adjust il_weight */ + if (rem_pages <= weight) { me->il_weight -= rem_pages; return total_allocated; - } else if (rem_pages == weight) { - /* move to next node / weight */ - me->il_prev = next_node_in(node, nodes); - me->il_weight = get_il_weight(me->il_prev); - return total_allocated; } /* Otherwise we adjust remaining pages, continue from there */ rem_pages -= weight; @@ -2460,17 +2454,10 @@ static unsigned long alloc_pages_bulk_array_weighted_interleave(gfp_t gfp, node_pages += weight; delta -= weight; } else if (delta) { + /* when delta is deleted, resume from that node */ node_pages += delta; - /* delta may deplete on a boundary or w/ a remainder */ - if (delta == weight) { - /* boundary: resume from next node/weight */ - resume_node = next_node_in(node, nodes); - resume_weight = weights[resume_node]; - } else { - /* remainder: resume this node w/ remainder */ - resume_node = node; - resume_weight = weight - delta; - } + resume_node = node; + resume_weight = weight - delta; delta = 0; } /* node_pages can be 0 if an allocation fails and rounds == 0 */ > > So, except the issue you pointed out already. All series looks good to > me! Thanks! Feel free to add > > Reviewed-by: "Huang, Ying" <ying.huang@xxxxxxxxx> > > to the whole series. > Thank you so much for your patience with me! I appreciate all the help. I am looking forward to this feature very much! ~Gregory