RE: [PATCH v8 5/5] remoteproc: Add initial zynqmp R5 remoteproc driver

Ben Levinsky <BLEVINSK@xxxxxxxxxx> · Thu, 20 Aug 2020 15:13:17 +0000



> -----Original Message-----
> From: Stefano Stabellini <stefano.stabellini@xxxxxxxxxx>
> Sent: Wednesday, August 19, 2020 2:21 PM
> To: Ben Levinsky <BLEVINSK@xxxxxxxxxx>
> Cc: Stefano Stabellini <stefanos@xxxxxxxxxx>; Michal Simek
> <michals@xxxxxxxxxx>; devicetree@xxxxxxxxxxxxxxx;
> mathieu.poirier@xxxxxxxxxx; Ed T. Mooring <emooring@xxxxxxxxxx>; linux-
> remoteproc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> robh+dt@xxxxxxxxxx; Jiaying Liang <jliang@xxxxxxxxxx>; linux-arm-
> kernel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [PATCH v8 5/5] remoteproc: Add initial zynqmp R5 remoteproc
> driver
> 
> On Tue, 18 Aug 2020, Ben Levinsky wrote:
> > > > +/**
> > > > + * struct zynqmp_r5_mem - zynqmp rpu memory data
> > > > + * @pnode_id: TCM power domain ids
> > > > + * @res: memory resource
> > > > + * @node: list node
> > > > + */
> > > > +struct zynqmp_r5_mem {
> > > > +	u32 pnode_id[MAX_MEM_PNODES];
> > > > +	struct resource res;
> > > > +	struct list_head node;
> > > > +};
> > > > +
> > > > +/**
> > > > + * struct zynqmp_r5_pdata - zynqmp rpu remote processor private data
> > > > + * @dev: device of RPU instance
> > > > + * @rproc: rproc handle
> > > > + * @pnode_id: RPU CPU power domain id
> > > > + * @mems: memory resources
> > > > + * @is_r5_mode_set: indicate if r5 operation mode is set
> > > > + * @tx_mc: tx mailbox client
> > > > + * @rx_mc: rx mailbox client
> > > > + * @tx_chan: tx mailbox channel
> > > > + * @rx_chan: rx mailbox channel
> > > > + * @mbox_work: mbox_work for the RPU remoteproc
> > > > + * @tx_mc_skbs: socket buffers for tx mailbox client
> > > > + * @rx_mc_buf: rx mailbox client buffer to save the rx message
> > > > + */
> > > > +struct zynqmp_r5_pdata {
> > > > +	struct device dev;
> > > > +	struct rproc *rproc;
> > > > +	u32 pnode_id;
> > > > +	struct list_head mems;
> > > > +	bool is_r5_mode_set;
> > > > +	struct mbox_client tx_mc;
> > > > +	struct mbox_client rx_mc;
> > > > +	struct mbox_chan *tx_chan;
> > > > +	struct mbox_chan *rx_chan;
> > > > +	struct work_struct mbox_work;
> > > > +	struct sk_buff_head tx_mc_skbs;
> > > > +	unsigned char rx_mc_buf[RX_MBOX_CLIENT_BUF_MAX];
> > >
> > > A simple small reordering of the struct fields would lead to small
> > > memory savings due to alignment.
> > >
> > >
> > [Ben Levinsky] will do. Do you mean ordering in either ascending or
> descending order?
> 
> Each field has a different alignment in the struct, so for example after
> pnode_id there are probably 4 unused bytes because mems is 64bit
> aligned.
> 
> 
[Ben Levinsky] ok will update this so the alignments are done with less unused memory per struct allocation.
> > > > +/*
> > > > + * TCM needs mapping to R5 relative address and xilinx platform mgmt
> call
> > > > + */
> > > > +struct rproc_mem_entry *handle_tcm_parsing(struct device *dev,
> > > > +					    struct reserved_mem *rmem,
> > > > +					    struct device_node *node,
> > > > +					    int lookup_idx)
> > > > +{
> > > > +	void *va;
> > > > +	dma_addr_t dma;
> > > > +	resource_size_t size;
> > > > +	int ret;
> > > > +	u32 pnode_id;
> > > > +	struct resource rsc;
> > > > +	struct rproc_mem_entry *mem;
> > > > +
> > > > +	pnode_id =  tcm_addr_to_pnode[lookup_idx][1];
> > > > +	ret = zynqmp_pm_request_node(pnode_id,
> > > > +				     ZYNQMP_PM_CAPABILITY_ACCESS, 0,
> > > > +				     ZYNQMP_PM_REQUEST_ACK_BLOCKING);
> > > > +	if (ret < 0) {
> > > > +		dev_err(dev, "failed to request power node: %u\n",
> > > pnode_id);
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	ret = of_address_to_resource(node, 0, &rsc);
> > > > +	if (ret < 0) {
> > > > +		dev_err(dev, "failed to get resource of memory %s",
> > > > +			of_node_full_name(node));
> > > > +		return -EINVAL;
> > > > +	}
> > > > +	size = resource_size(&rsc);
> > > > +	va = devm_ioremap_wc(dev, rsc.start, size);
> > > > +	if (!va)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	/* zero out tcm base address */
> > > > +	if (rsc.start & 0xffe00000) {
> > > > +		/* R5 can't see anything past 0xfffff so wipe it */
> > > > +		rsc.start &= 0x000fffff;
> > >
> > > If that is the case why not do:
> > >
> > >   rsc.start &= 0x000fffff;
> > >
> > > unconditionally? if (rsc.start & 0xffe00000) is superfluous.
> > >
> > > But I thought that actually the R5s could see TCM at both the low
> > > address (< 0x000fffff) and also at the high address (i.e. 0xffe00000).
> > >
> > >
> > [Ben Levinsky] Here yes can make rsc.start &= 0x000fffff undconditional. Will
> update in v9 as such
> > 		Also, this logic is because this is only for the Xilinx R5
> mappings of TCM banks that are at (from the TCM point of view) 0x0 and
> 0x2000
> 
> What I meant is that as far as I understand from the TRM the RPU should
> also be able to access the same banks at the same address of the APU,
> which I imagine is in the 0xffe00000 range. But I could be wrong on
> this.
> 
> If we could use the same addresses for RPU and APU, it would simplify
> this driver.
> 
> 
> > > > +		/*
> > > > +		 * handle tcm banks 1 a and b (0xffe9000 and
> > > > +		 * 0xffeb0000)
> > > > +		 */
> > > > +		if (rsc.start & 0x80000)
> > > > +			rsc.start -= 0x90000;
> > >
> > > It is very unclear to me why we have to do this
> > >
> > >
> > [Ben Levinsky] This is for TCM bank 0B and bank 1B to map them to
> 0x00020000 so that the TCM relative addressing lines up. For example
> (0xffe90000 & 0x000fffff) - 0x90000 = 0x20000
> 
> Could you please explain the mapping in an in-code comment?
> The comment currently mentions 0xffe9000 and 0xffeb0000 but:
> 
> - 0xffe9000 & 0x000fffff = 0xe9000
>   0xe9000 - 0x90000 = 0x59000
> 
> - 0xffeb0000 & 0x000fffff = 0xeb000
>   0xeb000 - 0x90000 = 0xeb000
> 
> Either way we don't get 0x20000. What am I missing?
> 
[Ben Levinsky] I apologize there is typo in the comment... it should be 0xffe90000 and 0xffeb0000
The output is:
0xffe90000 & 0x000fffff = 0x90000
0x90000 - 0x90000 = 0x0

And 
0xffeb0000 & 0x000fffff = 0xB0000 
0xB0000 - 0x90000 = 0x20000

So these line up for the relative addressing for RPU's view of TCMs
> 
> 
> > > > +	}
> > > > +
> > > > +	dma = (dma_addr_t)rsc.start;
> > >
> > > Given the way the dma parameter is used by
> > > rproc_alloc_registered_carveouts, I think it might be best to pass the
> > > original start address (i.e. 0xffe00000) as dma.
> > >
> > >
> > > > +	mem = rproc_mem_entry_init(dev, va, dma, (int)size, rsc.start,
> > > > +				   NULL, zynqmp_r5_mem_release,
> > >
> > > I don't know too much about the remoteproc APIs, but shouldn't you be
> > > passing zynqmp_r5_rproc_mem_alloc to it instead of NULL?
> > >
> > >
> > [Ben Levinsky] the difference is that for TCM we have to do make the
> relative address work for TCM, so the dma input to rproc_mem_entry_init is
> different in TCM case.
> 
> The dma address is the address as seen by the RPU, is that right?
> So you are trying to set the dma address to something in the range 0 -
> 0x20000?
> 
> 
[Ben Levinsky] yes 
> > > > +				   rsc.name);
> > > > +	if (!mem)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	return mem;
> > > > +}
> > > > +
> > > > +static int parse_mem_regions(struct rproc *rproc)
> > > > +{
> > > > +	int num_mems, i;
> > > > +	struct zynqmp_r5_pdata *pdata = rproc->priv;
> > > > +	struct device *dev =  &pdata->dev;
> > > > +	struct device_node *np = dev->of_node;
> > > > +	struct rproc_mem_entry *mem;
> > > > +
> > > > +	num_mems = of_count_phandle_with_args(np, "memory-region",
> > > NULL);
> > > > +	if (num_mems <= 0)
> > > > +		return 0;
> > > > +	for (i = 0; i < num_mems; i++) {
> > > > +		struct device_node *node;
> > > > +		struct reserved_mem *rmem;
> > > > +
> > > > +		node = of_parse_phandle(np, "memory-region", i);
> > >
> > > Check node != NULL ?
> > >
> > [Ben Levinsky] will add this in v9
> > >
> > > > +		rmem = of_reserved_mem_lookup(node);
> > > > +		if (!rmem) {
> > > > +			dev_err(dev, "unable to acquire memory-region\n");
> > > > +			return -EINVAL;
> > > > +		}
> > > > +
> > > > +		if (strstr(node->name, "vdev0buffer")) {
> > >
> > > vdev0buffer is not described in the device tree bindings, is that
> > > normal/expected?
> > >
> > >
> > [Ben Levinsky] vdev0buffer is not required, as there might be simple
> firmware loading with no IPC. Vdev0buffer only needed for IPC.
> 
> OK, good. It should probably still be described in the device tree
> binding as optional property.
> 
> 
> > > > +			/* Register DMA region */
> > > > +			mem = rproc_mem_entry_init(dev, NULL,
> > > > +						   (dma_addr_t)rmem->base,
> > > > +						   rmem->size, rmem->base,
> > > > +						   NULL, NULL,
> > > > +						   "vdev0buffer");
> > > > +			if (!mem) {
> > > > +				dev_err(dev, "unable to initialize memory-
> > > region %s\n",
> > > > +					node->name);
> > > > +				return -ENOMEM;
> > > > +			}
> > > > +			dev_dbg(dev, "parsed %s at  %llx\r\n", mem->name,
> > > > +				mem->dma);
> > > > +		} else if (strstr(node->name, "vdev0vring")) {
> > >
> > > Same here
> > >
> > >
> > > > +			int vring_id;
> > > > +			char name[16];
> > > > +
> > > > +			/*
> > > > +			 * can be 1 of multiple vring IDs per IPC channel
> > > > +			 * e.g. 'vdev0vring0' and 'vdev0vring1'
> > > > +			 */
> > > > +			vring_id = node->name[14] - '0';
> > >
> > > Where does the "14" comes from? Are we sure it is not possible to have a
> > > node->name smaller than 14 chars?
> > >
> > [Ben Levinsky] Presently there are only 2 vrings used per Xilinx OpenAMP
> channel to RPU. In Xilinx kernel we have hard-coded names as these are the
> only nodes that use it. For example RPU0vdev0vring0 and RPU1vdev0vring0.
> Hence we only check for vdev0vring and not a sscanf("%*s%i") to parse out
> the vring ID or other, cleaner solution.
> 
> OK, but I think it is best if we use node->name[14] only if we
> explicitly check for a string of at least 14 characters. Using strstr,
> it could return the string "vdev0vring" which is less than 14 chars,
> leading to a bug.
> 
> 
> > >
> > > > +			snprintf(name, sizeof(name), "vdev0vring%d",
> > > vring_id);
> > > > +			/* Register vring */
> > > > +			mem = rproc_mem_entry_init(dev, NULL,
> > > > +						   (dma_addr_t)rmem->base,
> > > > +						   rmem->size, rmem->base,
> > > > +
> > > zynqmp_r5_rproc_mem_alloc,
> > > > +
> > > zynqmp_r5_rproc_mem_release,
> > > > +						   name);
> > > > +			dev_dbg(dev, "parsed %s at %llx\r\n", mem->name,
> > > > +				mem->dma);
> > > > +		} else {
> > > > +			int idx;
> > > > +
> > > > +			/*
> > > > +			 * if TCM update address space for R5 and
> > > > +			 * make xilinx platform mgmt call
> > > > +			 */
> > > > +			for (idx = 0; idx < ZYNQMP_R5_NUM_TCM_BANKS;
> > > idx++) {
> > > > +				if (tcm_addr_to_pnode[idx][0] == rmem-
> > > >base)
> > > > +					break;
> > >
> > > There is something I don't quite understand. If the memory region to use
> > > is TCM, why would it be also described under reserved-memory?
> > > Reserved-memory is for normal memory being reserved, while TCM is not
> > > normal memory. Am I missing something?
> > >
> > [Ben Levinsky] I can change this in v9 as discussed. That is, have no TCM
> under reserved mem. Instead have the banks as nodes in device tree with
> status="[enabled|disabled]" and if enabled, then try to add memories in
> parse_fw call.
> > >
> > > > +			}
> > > > +
> > > > +			if (idx != ZYNQMP_R5_NUM_TCM_BANKS) {
> > > > +				mem = handle_tcm_parsing(dev, rmem, node,
> > > idx);
> > > > +			} else {
> > > > +				mem = rproc_mem_entry_init(dev, NULL,
> > > > +						   (dma_addr_t)rmem->base,
> > > > +						   rmem->size, rmem->base,
> > > > +
> > > zynqmp_r5_rproc_mem_alloc,
> > > > +
> > > zynqmp_r5_rproc_mem_release,
> > > > +						   node->name);
> > >
> > > This case looks identical to the vdev0vring above. Is the difference
> > > really just in the "name"? If so, can we merge the two cases into one?
> > > no, because the devm_ioremap_wc call has to be done before changing
> the dma address to relative for TCM banks.
> > >
> > > > +			}
> > > > +
> > > > +			if (!mem) {
> > > > +				dev_err(dev,
> > > > +					"unable to init memory-region %s\n",
> > > > +					node->name);
> > > > +				return -ENOMEM;
> > > > +			}
> > > > +		}
> > > > +		rproc_add_carveout(rproc, mem);
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static int zynqmp_r5_parse_fw(struct rproc *rproc, const struct
> firmware
> > > *fw)
> > > > +{
> > > > +	int ret;
> > > > +	struct zynqmp_r5_pdata *pdata = rproc->priv;
> > > > +	struct device *dev = &pdata->dev;
> > > > +
> > > > +	ret = parse_mem_regions(rproc);
> > > > +	if (ret) {
> > > > +		dev_err(dev, "parse_mem_regions failed %x\n", ret);
> > > > +		return ret;
> > > > +	}
> > > > +
> > > > +	ret = rproc_elf_load_rsc_table(rproc, fw);
> > > > +	if (ret == -EINVAL) {
> > > > +		dev_info(dev, "no resource table found.\n");
> > > > +		ret = 0;
> > >
> > > Why do we want to continue ignoring the error in this case?
> > >
> > [Ben Levinsky] as there can be simple firmware loaded onto R5 that do not
> have resource table. Resource table only needed for specific IPC case.
> 
> OK, an in-code comment would be good
> 
> 
> > > > +	struct device *dev;
> > > > +	struct zynqmp_r5_mem *mem;
> > > > +	int ret;
> > > > +	struct property *prop;
> > > > +	const __be32 *cur;
> > > > +	u32 val;
> > > > +	int i;
> > > > +
> > > > +	dev = &pdata->dev;
> > > > +	mem = devm_kzalloc(dev, sizeof(*mem), GFP_KERNEL);
> > > > +	if (!mem)
> > > > +		return -ENOMEM;
> > > > +	ret = of_address_to_resource(node, 0, &mem->res);
> > > > +	if (ret < 0) {
> > > > +		dev_err(dev, "failed to get resource of memory %s",
> > > > +			of_node_full_name(node));
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	/* Get the power domain id */
> > > > +	i = 0;
> > > > +	if (of_find_property(node, "pnode-id", NULL)) {
> > > > +		of_property_for_each_u32(node, "pnode-id", prop, cur, val)
> > > > +			mem->pnode_id[i++] = val;
> > > > +	}
> > > > +	list_add_tail(&mem->node, &pdata->mems);
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +/**
> > > > + * zynqmp_r5_release() - ZynqMP R5 device release function
> > > > + * @dev: pointer to the device struct of ZynqMP R5
> > > > + *
> > > > + * Function to release ZynqMP R5 device.
> > > > + */
> > > > +static void zynqmp_r5_release(struct device *dev)
> > > > +{
> > > > +	struct zynqmp_r5_pdata *pdata;
> > > > +	struct rproc *rproc;
> > > > +	struct sk_buff *skb;
> > > > +
> > > > +	pdata = dev_get_drvdata(dev);
> > > > +	rproc = pdata->rproc;
> > > > +	if (rproc) {
> > > > +		rproc_del(rproc);
> > > > +		rproc_free(rproc);
> > > > +	}
> > > > +	if (pdata->tx_chan)
> > > > +		mbox_free_channel(pdata->tx_chan);
> > > > +	if (pdata->rx_chan)
> > > > +		mbox_free_channel(pdata->rx_chan);
> > > > +	/* Discard all SKBs */
> > > > +	while (!skb_queue_empty(&pdata->tx_mc_skbs)) {
> > > > +		skb = skb_dequeue(&pdata->tx_mc_skbs);
> > > > +		kfree_skb(skb);
> > > > +	}
> > > > +
> > > > +	put_device(dev->parent);
> > > > +}
> > > > +
> > > > +/**
> > > > + * event_notified_idr_cb() - event notified idr callback
> > > > + * @id: idr id
> > > > + * @ptr: pointer to idr private data
> > > > + * @data: data passed to idr_for_each callback
> > > > + *
> > > > + * Pass notification to remoteproc virtio
> > > > + *
> > > > + * Return: 0. having return is to satisfy the idr_for_each() function
> > > > + *          pointer input argument requirement.
> > > > + **/
> > > > +static int event_notified_idr_cb(int id, void *ptr, void *data)
> > > > +{
> > > > +	struct rproc *rproc = data;
> > > > +
> > > > +	(void)rproc_vq_interrupt(rproc, id);
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +/**
> > > > + * handle_event_notified() - remoteproc notification work funciton
> > > > + * @work: pointer to the work structure
> > > > + *
> > > > + * It checks each registered remoteproc notify IDs.
> > > > + */
> > > > +static void handle_event_notified(struct work_struct *work)
> > > > +{
> > > > +	struct rproc *rproc;
> > > > +	struct zynqmp_r5_pdata *pdata;
> > > > +
> > > > +	pdata = container_of(work, struct zynqmp_r5_pdata, mbox_work);
> > > > +
> > > > +	(void)mbox_send_message(pdata->rx_chan, NULL);
> > > > +	rproc = pdata->rproc;
> > > > +	/*
> > > > +	 * We only use IPI for interrupt. The firmware side may or may
> > > > +	 * not write the notifyid when it trigger IPI.
> > > > +	 * And thus, we scan through all the registered notifyids.
> > > > +	 */
> > > > +	idr_for_each(&rproc->notifyids, event_notified_idr_cb, rproc);
> > >
> > > This looks expensive. Should we at least check whether the notifyid was
> > > written as first thing before doing the scan?
> > >
> > [Ben Levinsky] this will be at most 2 vrings presently per firmware-load and
> only done when the firmware is loaded so the latency so should not impact
> performace or user
> 
> OK
> 
> 
> > > > +	/* Get R5 power domain node */
> > > > +	ret = of_property_read_u32(node, "pnode-id", &pdata->pnode_id);
> > > > +	if (ret) {
> > > > +		dev_err(dev, "failed to get power node id.\n");
> > > > +		goto error;
> > > > +	}
> > > > +
> > > > +	/* TODO Check if R5 is running */
> > > > +
> > > > +	/* Set up R5 if not already setup */
> > > > +	ret = pdata->is_r5_mode_set ? 0 : r5_set_mode(pdata);
> > > > +	if (ret) {
> > > > +		dev_err(dev, "failed to set R5 operation mode.\n");
> > > > +		return ret;
> > > > +	}
> > >
> > > is_r5_mode_set is set by r5_set_mode(), which is only called here.
> > > So it looks like this check is important in cases where
> > > zynqmp_r5_probe() is called twice for the same R5 node. But I don't
> > > think that is supposed to happen?
> > >
> > [Ben Levinsky] this is needed as there are cases where user can repeatedly
> load different firmware so the check is needed in cases like this where rpu is
> already configured. It is also possible that a user might repeatedly
> load/unload the module
> 
> OK