On Sat, Aug 31, 2019 at 06:09:39PM +0800, Yunsheng Lin wrote: > > > On 2019/8/31 16:55, Peter Zijlstra wrote: > > On Sat, Aug 31, 2019 at 01:58:16PM +0800, Yunsheng Lin wrote: > >> According to Section 6.2.14 from ACPI spec 6.3 [1], the setting > >> of proximity domain is optional, as below: > >> > >> This optional object is used to describe proximity domain > >> associations within a machine. _PXM evaluates to an integer > >> that identifies a device as belonging to a Proximity Domain > >> defined in the System Resource Affinity Table (SRAT). > > > > That's just words.. what does it actually mean? > > It means the dev_to_node(dev) may return -1 if the bios does not > implement the proximity domain feature, user may use that value > to call cpumask_of_node and cpumask_of_node does not protect itself > from node id being -1, which causes out of bound access. > >> @@ -69,6 +69,12 @@ extern const struct cpumask *cpumask_of_node(int node); > >> /* Returns a pointer to the cpumask of CPUs on Node 'node'. */ > >> static inline const struct cpumask *cpumask_of_node(int node) > >> { > >> + if (node >= nr_node_ids) > >> + return cpu_none_mask; > >> + > >> + if (node < 0 || !node_to_cpumask_map[node]) > >> + return cpu_online_mask; > >> + > >> return node_to_cpumask_map[node]; > >> } > >> #endif > > > > I _reallly_ hate this. Users are expected to use valid numa ids. Now > > we're adding all this checking to all users. Why do we want to do that? > > As above, the dev_to_node(dev) may return -1. > > > > > Using '(unsigned)node >= nr_nods_ids' is an error. > > 'node >= nr_node_ids' can be dropped if all user is expected to not call > cpumask_of_node with node id greater or equal to nr_nods_ids. you copied my typo :-) > From what I can see, the problem can be fixed in three place: > 1. Make user dev_to_node return a valid node id even when proximity > domain is not set by bios(or node id set by buggy bios is not valid), > which may need info from the numa system to make sure it will return > a valid node. > > 2. User that call cpumask_of_node should ensure the node id is valid > before calling cpumask_of_node, and user also need some info to > make ensure node id is valid. > > 3. Make sure cpumask_of_node deal with invalid node id as this patchset. > > Which one do you prefer to make sure node id is valid, or do you > have any better idea? > > Any detail advice and suggestion will be very helpful, thanks. 1) because even it is not set, the device really does belong to a node. It is impossible a device will have magic uniform access to memory when CPUs cannot. 2) is already true today, cpumask_of_node() requires a valid node_id. 3) is just wrong and increases overhead for everyone.