Gregory Price <gregory.price@xxxxxxxxxxxx> writes: > On Tue, Feb 27, 2024 at 01:59:26PM +0800, Huang, Ying wrote: >> Gregory Price <gregory.price@xxxxxxxxxxxx> writes: >> >> > I have to press this issue: Is this an actual, practical, concern? >> >> I don't know who have large machine like that. But I guess that it's >> possible in the long run. >> > > Certainly possible, although that seems like a hyper-specialized case of > a supercomputer. I suppose still worth considering for a bit. > >> > I suppose another strategy is to calculate the interleave weights >> > un-bounded from the raw bandwidth - but continuously force reductions >> > (through some yet-undefined algorithm) until at least one node reaches a >> > weight of `1`. This suffers from the opposite problem: what if the top >> > node has a value greater than 255? Do we just cap it at 255? That seems >> > the opposite form of problematic. >> > >> > (Large numbers are quite pointless, as it is essentially the antithesis >> > of interleave) >> >> Yes. So I suggest to use a relative small number as the default weight >> to start with for normal DRAM. We will have to floor/ceiling the weight >> value. > > Yeah more concretely, I was thinking something like > > unsigned int *temp_weights; /* sizeof nr_node_ids */ > > memcpy(temp_weights, node_bandwidth); > while min(temp_weights) > 1: > - attempt GCD reduction > - if failed (GCD=1), adjust all odd numbers to be even (+1), try again > > for weight in temp_weights: > iw_table[N] = (weight > 255) ? 255 : (unsigned char)weight; > > Something like this. Of course this breaks if you have two nodes with a > massively different bandwidth ratio (> 255:1), but that seems > unrealistic given the intent of the devices. Better to evaluate the maximum error introduced. For example, for 3:2 bandwidth, the result could be 2:1. That appears not necessary. -- Best Regards, Huang, Ying