Gregory Price <gregory.price@xxxxxxxxxxxx> writes: > On Mon, Jan 15, 2024 at 11:18:00AM +0800, Huang, Ying wrote: >> Gregory Price <gourry.memverge@xxxxxxxxx> writes: >> >> > +static struct iw_table default_iw_table; >> > +/* >> > + * iw_table is the sysfs-set interleave weight table, a value of 0 >> > + * denotes that the default_iw_table value should be used. >> > + * >> > + * iw_table is RCU protected >> > + */ >> > +static struct iw_table __rcu *iw_table; >> > +static DEFINE_MUTEX(iw_table_mtx); >> >> I greped "mtx" in kernel/*.c and mm/*.c and found nothing. To following >> the existing coding convention, better to name this as iw_table_mutex or >> iw_table_lock? >> > > ack. > >> And, I think this is used to protect both iw_table and default_iw_table? >> If so, it deserves some comments. >> > > Right now default_iw_table cannot be updated, and so it is neither > protected nor requires protection. > > I planned to add the protection comment in the next patch series, which > would implement the kernel-side interface for updating the default > weights during boot/hotplug. > > We haven't had the discussion on how/when this should happen yet, > though, and there's some research to be done. (i.e. when should DRAM > weights be set? should the entire table be reweighted on hotplug? etc) Before that, I'm OK to remove default_iw_table and use hard coded "1" as default weight for now. >> > +static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr, >> > + const char *buf, size_t count) >> > +{ >> > + struct iw_node_attr *node_attr; >> > + struct iw_table __rcu *new; >> > + struct iw_table __rcu *old; >> > + u8 weight = 0; >> > + >> > + node_attr = container_of(attr, struct iw_node_attr, kobj_attr); >> > + if (count == 0 || sysfs_streq(buf, "")) >> > + weight = 0; >> > + else if (kstrtou8(buf, 0, &weight)) >> > + return -EINVAL; >> > + >> > + new = kmalloc(sizeof(*new), GFP_KERNEL); >> > + if (!new) >> > + return -ENOMEM; >> > + >> > + mutex_lock(&iw_table_mtx); >> > + old = rcu_dereference_protected(iw_table, >> > + lockdep_is_held(&iw_table_mtx)); >> > + /* If value is 0, revert to default weight */ >> > + weight = weight ? weight : default_iw_table.weights[node_attr->nid]; >> >> If we change the default weight in default_iw_table.weights[], how do we >> identify whether the weight has been customized by users via sysfs? So, >> I suggest to use 0 in iw_table for not-customized weight. >> >> And if so, we need to use RCU to access default_iw_table too. >> > > Dumb simplification on my part, I'll walk this back and add the > > if (!weight) weight = default_iw_table[node] > > logic back into the allocator paths accordinly. > >> > + memcpy(&new->weights, &old->weights, sizeof(new->weights)); >> > + new->weights[node_attr->nid] = weight; >> > + rcu_assign_pointer(iw_table, new); >> > + mutex_unlock(&iw_table_mtx); >> > + kfree_rcu(old, rcu); >> >> synchronize_rcu() should be OK here. It's fast enough in this cold >> path. This make it good to define iw_table as >> > I'll take a look. > >> u8 __rcu *iw_table; >> >> Then, we only need to allocate nr_node_ids elements now. >> > > We need nr_possible_nodes to handle hotplug correctly. nr_node_ids >= num_possible_nodes(). It's larger than any possible node ID. > I decided to simplify this down to MAX_NUMNODES *juuuuuust in case* > "true node hotplug" ever becomes a reality. If that happens, then > only allocating space for possible nodes creates a much bigger > headache on hotplug. > > For the sake of that simplification, it seemed better to just eat the > 1KB. If you really want me to do that, I will, but the MAX_NUMNODES > choice was an explicitly defensive choice. When "true node hotplug" becomes reality, we can make nr_node_ids == MAX_NUMNODES. So, it's safe to use it. Please take a look at setup_nr_node_ids(). >> > +static int __init mempolicy_sysfs_init(void) >> > +{ >> > + /* >> > + * if sysfs is not enabled MPOL_WEIGHTED_INTERLEAVE defaults to >> > + * MPOL_INTERLEAVE behavior, but is still defined separately to >> > + * allow task-local weighted interleave and system-defaults to >> > + * operate as intended. >> > + * >> > + * In this scenario iw_table cannot (presently) change, so >> > + * there's no need to set up RCU / cleanup code. >> > + */ >> > + memset(&default_iw_table.weights, 1, sizeof(default_iw_table)); >> >> This depends on sizeof(default_iw_table.weights[0]) == 1, I think it's >> better to use explicit loop here to make the code more robust a little. >> > > oh hm, you're right. rookie mistake on my part. > -- Best Regards, Huang, Ying