Re: [PATCH V10 6/9] cxl/port: Read CDAT table

Ira Weiny <ira.weiny@xxxxxxxxx> · Wed, 8 Jun 2022 14:27:14 -0700

On Mon, Jun 06, 2022 at 11:15:41AM -0700, Ben Widawsky wrote:
> On 22-06-04 17:50:46, ira.weiny@xxxxxxxxx wrote:
> > From: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> > 

[snip]

> > +
> > +static int cxl_cdat_get_length(struct cxl_port *port, size_t *length)
> > +{
> > +	u32 cdat_request_pl = CDAT_DOE_REQ(0);
> > +	u32 cdat_response_pl[32];
> > +	DECLARE_COMPLETION_ONSTACK(c);
> > +	struct pci_doe_task task = {
> > +		.prot.vid = PCI_DVSEC_VENDOR_ID_CXL,
> > +		.prot.type = CXL_DOE_PROTOCOL_TABLE_ACCESS,
> > +		.request_pl = &cdat_request_pl,
> > +		.request_pl_sz = sizeof(cdat_request_pl),
> > +		.response_pl = cdat_response_pl,
> > +		.response_pl_sz = sizeof(cdat_response_pl),
> > +		.complete = cxl_doe_task_complete,
> > +		.private = &c,
> > +	};
> 
> This is looking like something that could be nicely populated with a macro.

Probably.  But I'll leave that for another day.

> 
> > +	int rc = 0;
> > +
> > +	if (!port->cdat_mb) {
> > +		dev_err(&port->dev, "No CDAT mailbox\n");
> > +		return -EIO;
> > +	}
> 
> AIUI, !port->cdat_mb isn't actually an error.

It was when I was trying to get this to work...  ;-)  I change to dev_dbg().

> Does it make sense to simply
> return 0 here?

No because this is just a helper to the read_cdat below.  0 could be used to
indicate 'no data' but easier to return an obvious error.

> 
> > +
> > +	rc = pci_doe_submit_task(port->cdat_mb, &task);
> > +	if (rc < 0) {
> > +		dev_err(&port->dev, "DOE submit failed: %d", rc);
> > +		return rc;
> > +	}
> > +	wait_for_completion(&c);
> > +
> > +	if (task.rv < 1)
> > +		return -EIO;
> > +
> > +	*length = cdat_response_pl[1];
> > +	dev_dbg(&port->dev, "CDAT length %zu\n", *length);
> > +
> > +	return rc;
> > +}
> > +
> > +static int cxl_cdat_read_table(struct cxl_port *port,
> > +			       struct cxl_cdat *cdat)
> > +{
> > +	size_t length = cdat->length;
> > +	u32 *data = cdat->table;
> > +	int entry_handle = 0;
> > +	int rc = 0;
> > +
> > +	if (!port->cdat_mb) {
> > +		dev_err(&port->dev, "No CDAT mailbox\n");
> > +		return -EIO;
> > +	}
> 
> Similar to above, maybe just return 0?

Same response.  But I'll change the messages to dev_dbg().

> 
> > +
> > +	do {
> > +		u32 cdat_request_pl = CDAT_DOE_REQ(entry_handle);
> > +		u32 cdat_response_pl[32];
> > +		DECLARE_COMPLETION_ONSTACK(c);
> > +		struct pci_doe_task task = {
> > +			.prot.vid = PCI_DVSEC_VENDOR_ID_CXL,
> > +			.prot.type = CXL_DOE_PROTOCOL_TABLE_ACCESS,
> > +			.request_pl = &cdat_request_pl,
> > +			.request_pl_sz = sizeof(cdat_request_pl),
> > +			.response_pl = cdat_response_pl,
> > +			.response_pl_sz = sizeof(cdat_response_pl),
> > +			.complete = cxl_doe_task_complete,
> > +			.private = &c,
> > +		};
> > +		size_t entry_dw;
> > +		u32 *entry;
> > +
> > +		rc = pci_doe_submit_task(port->cdat_mb, &task);
> > +		if (rc < 0) {
> > +			dev_err(&port->dev, "DOE submit failed: %d", rc);
> > +			return rc;
> > +		}
> > +		wait_for_completion(&c);
> 
> I'd use the timeout variant, but if you don't want to, see below. I can't quite
> tell if pci_doe_submit_task() is guaranteed to end with FLAG_DEAD at some
> point...

Yes it will if it goes south.  The issue with a timeout here is what should
this layer expect for that time?

> 
> > +
> > +		entry = cdat_response_pl + 1;
> > +		entry_dw = task.rv / sizeof(u32);
> > +		/* Skip Header */
> > +		entry_dw -= 1;
> > +		entry_dw = min(length / 4, entry_dw);
> > +		memcpy(data, entry, entry_dw * sizeof(u32));
> > +		length -= entry_dw * sizeof(u32);
> > +		data += entry_dw;
> > +		entry_handle = FIELD_GET(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE, cdat_response_pl[0]);
> 
> [0] looks suspicious...

Actually I have to claim ignorance on this one.  I've carried this from
Jonathan's original patches.  I'm not as worried about the [0] as that is just
the first dword.  But I'm confused as to this entry handle now.

Jonathan?  Help?

> 
> > +
> > +	} while (entry_handle != 0xFFFF);
> > +
> > +	return rc;
> > +}
> > +
> > +void read_cdat_data(struct cxl_port *port)
> 
> I think you need kdoc here, specifically because you've opted not to do a
> timed wait, which means its possible to wait forever.

Sure but we are not going to wait forever due to the DOE spec.  But I'll
document that, sure.

> 
> > +{
> > +	struct device *dev = &port->dev;
> > +	size_t cdat_length;
> > +	int ret;
> > +
> > +	if (cxl_cdat_get_length(port, &cdat_length))
> > +		return;
> > +
> > +	port->cdat.table = devm_kzalloc(dev, cdat_length, GFP_KERNEL);
> > +	if (!port->cdat.table) {
> > +		ret = -ENOMEM;
> > +		goto error;
> > +	}
> > +
> > +	port->cdat.length = cdat_length;
> > +	ret = cxl_cdat_read_table(port, &port->cdat);
> > +	if (ret) {
> > +		devm_kfree(dev, port->cdat.table);
> 
> Usually, when I see devm_kfree, it's a sign that it might not be a good
> candidate for devm. You could consider plain kzalloc, and then putting the kfree
> in the port destructor. I don't see anything incorrect though, so it's up to
> you.

I like it this way because we are really only doing this as an error condition.
And it is less error prone to use devm.  Technically devm_kfree() does not even
need to be here except that then we could potentially have a lot of cdat tables
floating around until the port goes away.

I can put in a comment to indicate why this was an anti-pattern.

[snip]

> > diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> > index ddbb8b77752e..71009a167a92 100644
> > --- a/drivers/cxl/cxlpci.h
> > +++ b/drivers/cxl/cxlpci.h
> > @@ -75,4 +75,5 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
> >  struct cxl_dev_state;
> >  int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm);
> >  void cxl_cache_cdat_mb(struct cxl_port *port);
> > +void read_cdat_data(struct cxl_port *port);
> >  #endif /* __CXL_PCI_H__ */
> > diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> > index 04f3d1fc6e07..fdff20cf79e6 100644
> > --- a/drivers/cxl/port.c
> > +++ b/drivers/cxl/port.c
> > @@ -50,6 +50,8 @@ static int cxl_port_probe(struct device *dev)
> >  		return PTR_ERR(cxlhdm);
> >  
> >  	cxl_cache_cdat_mb(port);
> > +	/* Cache the data early to ensure is_visible() works */
> > +	read_cdat_data(port);
> >  
> >  	if (is_cxl_endpoint(port)) {
> >  		struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport);
> > @@ -80,10 +82,58 @@ static int cxl_port_probe(struct device *dev)
> >  	return 0;
> >  }
> >  
> > +static ssize_t cdat_read(struct file *filp, struct kobject *kobj,
> > +			 struct bin_attribute *bin_attr, char *buf,
> > +			 loff_t offset, size_t count)
> > +{
> > +	struct device *dev = kobj_to_dev(kobj);
> > +	struct cxl_port *port = to_cxl_port(dev);
> > +
> > +	if (!port->cdat.table)
> > +		return 0;
> 
> With visibility setup below, do you need this?

Not currently.  I was envisioning a later dynamic state for cdat.table where on
error this could have been set to NULL.

Ira

[snip]