On Sun, Dec 18, 2022 at 03:55:53PM +0000, Jonathan Cameron wrote: > On Sun, 18 Dec 2022 08:25:34 +0800 > johnny <johnny.li@xxxxxxxxxxxxxxxx> wrote: > [snip] > > > > > > > > + } > > > > > + > > > > > + mbox_cmd = (struct cxl_mbox_cmd) { > > > > > + .opcode = CXL_MBOX_OP_CLEAR_EVENT_RECORD, > > > > > + .payload_in = &payload, > > > > > + .size_in = pl_size, > > > > > > > > This payload size should be whatever we need to store the records, > > > > not the max size possible. Particularly as that size is currently > > > > bigger than the mailbox might be. > > > > > > But the above check and set ensures that does not happen. > > > > > > > > > > > It shouldn't fail (I think) simply because a later version of the spec might > > > > add more to this message and things should still work, but definitely not > > > > good practice to tell the hardware this is much longer than it actually is. > > > > > > I don't follow. > > > > > > The full payload is going to be sent even if we are just clearing 1 record > > > which is inefficient but it should never overflow the hardware because it is > > > limited by the check above. > > > > > > So why would this be a problem? > > > > > > > per spec3.0, Event Record Handles field is "A list of Event Record Handles the > > host has consumed and the device shall now remove from its internal Event Log > > store.". Extra unused handle list does not folow above description. And also > > spec mentions "All event record handles shall be nonzero value. A value of 0 > > shall be treated by the device as an invalid handle.". So if there is value 0 > > in extra unused handles, device shall return invalid handle error code > > I don't think we call into that particular corner as the number of event > record handles is set correctly. Otherwise I agree this isn't following the > spec - though I think key here is that it won't be broken against CXL 3.0 devices > (with that rather roundabout argument that a CXL 3.0 devices should handle later > spec messages as those should be backwards compatible) but it might be broken > against CXL 3.0+ ones that interpret the 0s at the end as having meaning. I'm respining this to add the pci_set_master() anyway. So I'm going to change this as well. I really don't see how hardware would go off anything but the number of records to process the handles I could see some overly strict firmware wanting to validate the size being exactly equal to the number specified rather than just less than (which is what I would anticipate an issue with). Dan has agreed to land the movement of the trace point definition to drivers/cxl patch I need to cxl/next. After that I will rebase and send out. Ira > > Thanks, > > Jonathan >