On Tue, Feb 27, 2024 at 01:59:26PM +0800, Huang, Ying wrote: > Gregory Price <gregory.price@xxxxxxxxxxxx> writes: > > > I have to press this issue: Is this an actual, practical, concern? > > I don't know who have large machine like that. But I guess that it's > possible in the long run. > Certainly possible, although that seems like a hyper-specialized case of a supercomputer. I suppose still worth considering for a bit. > > I suppose another strategy is to calculate the interleave weights > > un-bounded from the raw bandwidth - but continuously force reductions > > (through some yet-undefined algorithm) until at least one node reaches a > > weight of `1`. This suffers from the opposite problem: what if the top > > node has a value greater than 255? Do we just cap it at 255? That seems > > the opposite form of problematic. > > > > (Large numbers are quite pointless, as it is essentially the antithesis > > of interleave) > > Yes. So I suggest to use a relative small number as the default weight > to start with for normal DRAM. We will have to floor/ceiling the weight > value. Yeah more concretely, I was thinking something like unsigned int *temp_weights; /* sizeof nr_node_ids */ memcpy(temp_weights, node_bandwidth); while min(temp_weights) > 1: - attempt GCD reduction - if failed (GCD=1), adjust all odd numbers to be even (+1), try again for weight in temp_weights: iw_table[N] = (weight > 255) ? 255 : (unsigned char)weight; Something like this. Of course this breaks if you have two nodes with a massively different bandwidth ratio (> 255:1), but that seems unrealistic given the intent of the devices. ~Gregory