On Mon, Feb 1, 2021 at 11:13 AM Ben Widawsky <ben.widawsky@xxxxxxxxx> wrote: > > On 21-02-01 12:54:00, Konrad Rzeszutek Wilk wrote: > > > +#define cxl_doorbell_busy(cxlm) \ > > > + (cxl_read_mbox_reg32(cxlm, CXLDEV_MB_CTRL_OFFSET) & \ > > > + CXLDEV_MB_CTRL_DOORBELL) > > > + > > > +#define CXL_MAILBOX_TIMEOUT_US 2000 > > > > You been using the spec for the values. Is that number also from it ? > > > > Yes it is. I'll add a comment with the spec reference. > > > > + > > > +enum opcode { > > > + CXL_MBOX_OP_IDENTIFY = 0x4000, > > > + CXL_MBOX_OP_MAX = 0x10000 > > > +}; > > > + > > > +/** > > > + * struct mbox_cmd - A command to be submitted to hardware. > > > + * @opcode: (input) The command set and command submitted to hardware. > > > + * @payload_in: (input) Pointer to the input payload. > > > + * @payload_out: (output) Pointer to the output payload. Must be allocated by > > > + * the caller. > > > + * @size_in: (input) Number of bytes to load from @payload. > > > + * @size_out: (output) Number of bytes loaded into @payload. > > > + * @return_code: (output) Error code returned from hardware. > > > + * > > > + * This is the primary mechanism used to send commands to the hardware. > > > + * All the fields except @payload_* correspond exactly to the fields described in > > > + * Command Register section of the CXL 2.0 spec (8.2.8.4.5). @payload_in and > > > + * @payload_out are written to, and read from the Command Payload Registers > > > + * defined in (8.2.8.4.8). > > > + */ > > > +struct mbox_cmd { > > > + u16 opcode; > > > + void *payload_in; > > > + void *payload_out; > > > > On a 32-bit OS (not that we use those that more, but lets assume > > someone really wants to), the void is 4-bytes, while on 64-bit it is > > 8-bytes. > > > > `pahole` is your friend as I think there is a gap between opcode and > > payload_in in the structure. > > > > > + size_t size_in; > > > + size_t size_out; > > > > And those can also change depending on 32-bit/64-bit. > > > > > + u16 return_code; > > > +#define CXL_MBOX_SUCCESS 0 > > > +}; > > > > Do you want to use __packed to match with the spec? > > > > Ah, reading later you don't care about it. > > > > In that case may I recommend you move 'return_code' (or perhaps just > > call it rc?) to be right after opcode? Less of gaps in that structure. > > > > I guess I hadn't realized we're supposed to try to fully pack structs by > default. This is just the internal parsed context of a command, I can't imagine packing is relevant here. pahole optimization feels premature as well. > > > > + > > > +static int cxl_mem_wait_for_doorbell(struct cxl_mem *cxlm) > > > +{ > > > + const int timeout = msecs_to_jiffies(CXL_MAILBOX_TIMEOUT_US); > > > + const unsigned long start = jiffies; > > > + unsigned long end = start; > > > + > > > + while (cxl_doorbell_busy(cxlm)) { > > > + end = jiffies; > > > + > > > + if (time_after(end, start + timeout)) { > > > + /* Check again in case preempted before timeout test */ > > > + if (!cxl_doorbell_busy(cxlm)) > > > + break; > > > + return -ETIMEDOUT; > > > + } > > > + cpu_relax(); > > > + } > > > > Hm, that is not very scheduler friendly. I mean we are sitting here for > > 2000us (2 ms) - that is quite the amount of time spinning. > > > > Should this perhaps be put in a workqueue? > > So let me first point you to the friendlier version which was shot down: > https://lore.kernel.org/linux-cxl/20201111054356.793390-8-ben.widawsky@xxxxxxxxx/ > > I'm not opposed to this being moved to a workqueue at some point, but I think > that's unnecessary complexity currently. The reality is that it's expected that > commands will finish way sooner than this or be implemented as background > commands. I've heard a person who makes a lot of the spec decisions say, "if > it's 2 seconds, nobody will use these things". That said, asynchronous probe needs to be enabled for the next driver update.