On Tue, May 26, 2020 at 08:21:26AM +0200, Hannes Reinecke wrote: > On 5/25/20 7:40 PM, Matthew Wilcox wrote: > > You aren't the first person to ask about having a 64-bit lookup on > > 32-bit machines. Indeed, I remember Hannes asking for a 256 bit lookup > > at LinuxCon in Prague. I have always been reluctant to add such a thing > > because the XArray uses quite a naive data type underneath. It works well with > > dense arrays but becomes very wasteful of memory for sparse arrays. > > > > My understanding of SCSI-world is that most devices have a single > > LUN 0. Most devices that have multiple LUNs number them sequentially. > > Some devices have either an internal structure or essentially pick random > > LUNs for the devices they expose. > > Not quite. You are correct that most devices have a single LUN 0 > (think of libata :-), but those with several LUNs typically > enumerate them. In most cases the enumeration starts at 0 (or 1, > if LUN 0 is used for a management LUN), and reaches up to 256. > Some arrays use a different LUN layout, which means that the top > two bit of the LUN number are set, and possibly some intermediate > numbers, too. But the LUNs themselves are numbered consecutively, too; > it's just at a certain offset. > I've never seen anyone picking LUN numbers at random. Ah, OK. I think for these arrays you'd be better off accepting the cost of an extra 4 bytes in the struct scsi_device rather than the cost of storing the scsi_device at the LUN. Let's just work an example where you have a 64-bit LUN with 4 ranges, each of 64 entries (this is almost a best-case scenario for the XArray). [0,63], 2^62+[0,63], 2^63+[0,63], 2^63+2^62+[0,63]. If we store them sequentially in an allocating XArray, we take up 256 * 4 bytes = 1kB extra space in the scsi_device. The XArray will allocate four nodes plus one node to hold the four nodes, which is 5 * 576 bytes (2780 bytes) for a total of 3804 bytes. Storing them in at their LUN will allocate a top level node which covers bits 60-66, then four nodes, each covering bits of 54-59, another four nodes covering bits 48-53, four nodes for 42-47, ... I make it 41 nodes, coming to 23616 bytes. And the pointer chase to get to each LUN is ten deep. It'll mostly be cached, but still ... > But still, the original question still stands: what would be the most > efficient way using xarrays here? > We have a four-level hierarchy Host:Channel:Target:LUN > and we need to lookup devices (and, occasinally, targets) per host. > At this time, 'channel' and 'target' are unsigned integer, and > LUNs are 64 bit. It certainly seems sensible to me to have a per-host allocating XArray to store the targets that belong to that host. I imagine you also want a per-target XArray for the LUNs that belong to that target. Do you also want a per-host XArray to store the LUNs so you can iterate all LUNs per host as a single lookup rather than indirecting through the target Xarray? That's a design decision for you to make.