On Tue, May 26, 2020 at 07:53:35PM +0200, Hannes Reinecke wrote: > On 5/26/20 4:27 PM, Matthew Wilcox wrote: > > Ah, OK. I think for these arrays you'd be better off accepting the cost > > of an extra 4 bytes in the struct scsi_device rather than the cost of > > storing the scsi_device at the LUN. > > > > Let's just work an example where you have a 64-bit LUN with 4 ranges, > > each of 64 entries (this is almost a best-case scenario for the XArray). > > [0,63], 2^62+[0,63], 2^63+[0,63], 2^63+2^62+[0,63]. > > > > If we store them sequentially in an allocating XArray, we take up 256 * > > 4 bytes = 1kB extra space in the scsi_device. The XArray will allocate > > four nodes plus one node to hold the four nodes, which is 5 * 576 bytes > > (2780 bytes) for a total of 3804 bytes. > > > > Storing them in at their LUN will allocate a top level node which covers > > bits 60-66, then four nodes, each covering bits of 54-59, another four > > nodes covering bits 48-53, four nodes for 42-47, ... I make it 41 nodes, > > coming to 23616 bytes. And the pointer chase to get to each LUN is > > ten deep. It'll mostly be cached, but still ... > > > Which is my worry, too. > In the end we're having a massively large array space (128bit if we take the > numbers as they stand today), of which only a _tiny_ fraction is actually > allocated. In your scheme, yes. Do you ever need to look up scsi_device from a scsi_host with only the C:T:L as a key (and it's a situation where speed matters)? Everything I've seen so far is about iterating every sdev in an shost, but maybe this is a "when you have a hammer" situation. > We can try to reduce the array space by eg. restricting channels and targets > to be 16 bits, and the LUN to be 32 bits. > But then we'll be having a hard time arguing; "Hey, we have a cool new > feature, which has a really nice interface, but we can't support the same > set of devices as we have now...". > That surely will go down well. But using the allocating XArray will outperform both the linked list and the direct-store XArray that you're proposing. > Maybe one should look at using an rbtree directly; _that_ could be made to > work with an 128bit index, and lookup should be fast, too. > Let's see. The rbtree is even worse than the linked list if your primary API is "iterate all objects".