Hi Hans, Thank you for your advice. I prepared some description of DNN accelerator and its usage. #### Handling memory blocks for Visconti5 accelerators Visconti5 Image-Processing-Accelerators do not have fine grained IOMMU, as CPU have. Therefore, memory region to be passed to the accelerators should be physically contiguous. We use DMA-BUF backed by CMA (Contiguous Memory Allocator) to allocate memory regions for sharing between CPU/IPAs. Originally, in v4.19 based implementation, the ION allocator was used to allocate DMA-BUF instances. For the latest implementation, DMA-BUF HEAPS is used. Two structure types are used to represent memory region passed to drivers. * struct drv_ipa_buffer_info * to describe whole DMA-BUF instance * struct drv_ipa_addr * to describe a memory region in a DMA-BUF instance for details, see usage sample of each IPA driver #### Image Processing Accelerators overview Visconti5 SoC has following image processing accererators * AFFINE: 1 input image, 1 output image; Affine transform, Homography transform, Polynomial lens distortion, LUT transform * DNN: N input feature vector, N output feature vector; Deep neural network operation * PYRAMID 3 input image, 3 * N output image; Resize grayscale/color image with N different parameters * DSPIF: M input image, N output image; Various opeations on images * HOX: 1 input image (multi ROI), 1 input dictionary1 likelihood/feature vector; Extended Histogram of Oriented Gradient based pattern matching * HAMAT: 2 input feature vectors: 1 output corrdinate vector; Hamming distance matching for stereo vision * FLMAT: 3 input image, N input feature point, N output matched point; Optical flow matching * SMLDB: 1 input image, N input feature point, N output feature vector; Accelerated-KAZE feature descriptor accelerator * STMAT: 2 input image, 1 output disparity image; Stereo disparity see [0] Fig 7.2.1 for block diagram (of prototype chip) #### DNN accelerator overview DNN accelerator is a proprietary CNN/DCNN processing accelerator developed by Toshiba. Visconti5 SoC has 2 instances of DNN acclerator hardware. Users convert existing Caffe/ONNX models to Visconti compatible models with an offline tool. A converted model "Configuration Binary" includes: * instruction sequence for given network * weight/bias information * DMA configuration from/to global memory (for input/output feature) DNN acccelerator can handle either 1 plane or multiple ROIs at a single call. see [0] Fig 7.2.2 for block diagram of DNN accelerator CNN: Convolutional Neural Network DCNN: Deep Convolutional Neural Network #### Input / Output Input image or feature: base type is either of FP16, FP32, INT8, UINT8, INT16 Output feature vector: base type is either of FP16, FP32, INT8, UINT8, INT16 Input, Output, Weight, Bias can be placed on global memory and loaded/stored with DMA within DNN accelerator. These data on global memory can be specified as either of: * single address to point single data block * list of address to point multiple data blocks (i.e. ROIs) DNN acclerator driver accepts an instance of "struct drv_dnn_descriptor" which includes addresses of input/output features and a configuration binary. #### Descriptor Builder at userland Following APIs are provided to build a descriptor instance at userland. /* defined in drv_dnn_util.h */ int32_t drv_DNN_config_descript_init(struct drv_dnn_descriptor *desc, struct drv_ipa_buffer_info *buffer, int32_t buffer_num); int32_t drv_DNN_config_exec_configuration(struct drv_dnn_descriptor *desc, const void *configuration_binary, struct drv_ipa_addr configuration_binary_addr, struct drv_ipa_addr *src_list, struct drv_ipa_addr *dst_list, int32_t list_num, struct drv_ipa_addr temporary_addr, int32_t temporary_size); int32_t drv_DNN_config_descript_finalize(struct drv_dnn_descriptor *desc); struct drv_dnn_descriptor is defined in drivers/soc/visconti/uapi/dnn.h. I think this header should be placed anywhere else to be collected on "make headers_install" action of kernel building. #### Usage sample (without error handlers) #include <linux/dma-heap.h> #include "drv_ipa.h" #include "drv_dnn.h" #include "drv_dnn_util.h" int allocate_buffer(int fd_heap, int size) { struct dma_heap_allocation_data heap_data_in={0}; int ret; heap_data_in.len = ROUNDUP_POW2(size); heap_data_in.fd_flags = O_RDWR | O_CLOEXEC; ret = ioctl(fd_heap, DMA_HEAP_IOCTL_ALLOC, &heap_data_in); if (ret <0) return -1; else return heap_data_in.fd; } void dnn_sample(int fd_dnn, int fd_conf, int fd_src, int fd_dst, int fd_temp) { int32_t ret; struct drv_ipa_buffer_info bufinfo[4] = { {.fd=fd_conf, .coherent=true, .direction=DRV_IPA_DIR_TO_DEVICE}, {.fd=fd_src, .coherent=true, .direction=DRV_IPA_DIR_TO_DEVICE}, {.fd=fd_dst, .coherent=true, .direction=DRV_IPA_DIR_FROM_DEVICE}, {.fd=fd_temp, .coherent=true, .direction=DRV_IPA_DIR_FROM_DEVICE}, }; struct drv_ipa_addr conf_addr = {.buffer_index=0, .offset=0}; struct drv_ipa_addr src_addr = {.buffer_index=1, .offset=0}; struct drv_ipa_addr dst_addr = {.buffer_index=2, .offset=0}; struct drv_ipa_addr temp_addr = {.buffer_index=3, .offset=0}; struct drv_dnn_descriptor desc; struct drv_ipa_addr src_list[] = {src_addr}; struct drv_ipa_addr dst_list[] = {dst_addr}; uint8_t *config = (uint8_t*)mmap(NULL, DNN_CONF_BIN_SIZE, PROT_READ, MAP_SHARED, fd_conf, 0); drv_DNN_config_descript_init(&desc, bufinfo, 4); drv_DNN_config_exec_configuration(&desc, config, conf_addr, src_list, dst_list, 1, temp_addr, TEMP_BUF_SIZE); drv_DNN_config_descript_finalize(&desc); ioctl(fd_dnn, IOC_IPA_START, &desc); { struct pollfd fds[] = {.fd=fd_dnn, .events=POLL_IN, .revents=0}; poll(fds, 1, 1000); } } void sample() { int fd_dnn, fd_heap, fd_conf, fd_src, fd_dst, fd_temp; fd_dnn = open("/dev/dnn0", O_RDWR); fd_heap = open("/dev/dma_heap/linux,cma", O_RDWR); fd_conf = allocate_buffer(fd_heap, DNN_CONF_BIN_ALLOC_SIZE); fd_src = allocate_buffer(fd_heap, INPUT_IMG_ALLOC_SIZE); fd_dst = allocate_buffer(fd_heap, OUTPUT_IMG_ALLOC_SIZE); fd_temp = allocate_buffer(fd_heap, TEMP_BUF_ALLOC_SIZE); /* fill in input image and configuration here */ dnn_sample(fd_dnn, fd_conf, fd_src, fd_dst, fd_temp); ... }; #### Reference * [0] https://toshiba.semicon-storage.com/content/dam/toshiba-ss-v2/master/en/company/technical-review/pdf/technical-review-18_e.pdf * Fig 7.2.1 shows the whole architecture of prototype chip * Fig 7.2.2 shows the architecture of DNN accelerator Regards, Yuji > -----Original Message----- > From: Hans Verkuil <hverkuil@xxxxxxxxx> > Sent: Friday, May 20, 2022 7:03 PM > To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開) > <yuji2.ishikawa@xxxxxxxxxxxxx>; robh+dt@xxxxxxxxxx; iwamatsu nobuhiro(岩松 > 信洋 □SWC◯ACT) <nobuhiro1.iwamatsu@xxxxxxxxxxxxx>; > sumit.semwal@xxxxxxxxxx; christian.koenig@xxxxxxx > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > linux-media@xxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx; > linaro-mm-sig@xxxxxxxxxxxxxxxx > Subject: Re: [PATCH 0/4] Add Toshiba Visconti DNN image processing > accelerator driver > > Hi Yuji, > > On 5/20/22 11:48, yuji2.ishikawa@xxxxxxxxxxxxx wrote: > > Hi Hans, > > > > Thank you for your comment. > > I agree that this submission lacks documents sharing basic idea of the > accelerators; what do they accept and what do they yield. > > Where can I put a new document? Can I put it as a comment in a source? Can > I add a file under Documentation/misc-devices directory? > > Start with explaining it by replying to this mail. Without knowing anything about > the hardware, it is difficult to say what the best place is. Usually it is either the > public API header, or somewhere in Documentation. > > The first step is to have a better understanding of the Visconti image hardware > and to see what the best subsystem would be to support that hardware. > > Regards, > > Hans > > > > > Thanks, > > Yuji Ishikawa > > > >> -----Original Message----- > >> From: Hans Verkuil <hverkuil@xxxxxxxxx> > >> Sent: Thursday, May 12, 2022 8:15 PM > >> To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開) > >> <yuji2.ishikawa@xxxxxxxxxxxxx>; Rob Herring <robh+dt@xxxxxxxxxx>; > >> iwamatsu nobuhiro(岩松 信洋 □SWC◯ACT) > >> <nobuhiro1.iwamatsu@xxxxxxxxxxxxx>; Sumit Semwal > >> <sumit.semwal@xxxxxxxxxx>; Christian König > <christian.koenig@xxxxxxx> > >> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; > >> linux-kernel@xxxxxxxxxxxxxxx; linux-media@xxxxxxxxxxxxxxx; > >> dri-devel@xxxxxxxxxxxxxxxxxxxxx; linaro-mm-sig@xxxxxxxxxxxxxxxx > >> Subject: Re: [PATCH 0/4] Add Toshiba Visconti DNN image processing > >> accelerator driver > >> > >> Hi Yuji, > >> > >> On 4/28/22 15:11, Yuji Ishikawa wrote: > >>> This series is the DNN image processing accelerator driver for > >>> Toshiba's ARM > >> SoC, Visconti[0]. > >>> This provides DT binding documentation, device driver, MAINTAINER > files. > >>> > >>> The second patch "soc: visconti: Add Toshiba Visconti image > >>> processing > >> accelerator common source" > >>> and the fourth patch "MAINTAINERS: ..." are the same as the ones in > >>> the > >> preceding post for affine driver. > >> > >> There appears to be no documentation whatsoever, unless I am missing > >> something. > >> > >> How is the uAPI supposed to be used? What does it do? What formats > >> does it accept or produce? > >> > >> If this processes images, then (as Laurent mentioned) this is more > >> suitable as a > >> V4L2 mem2mem driver. > >> > >> See > >> https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dev- > >> me > >> m2mem.html > >> and the many drivers in drivers/media that use it (git grep > v4l2-mem2mem.h). > >> > >> But without any explanation whatsoever I have no idea what does or > >> does not make sense. > >> > >> Regards, > >> > >> Hans > >> > >>> > >>> Best regards, > >>> Yuji > >>> > >>> [0]: > >>> > >> > https://toshiba.semicon-storage.com/ap-en/semiconductor/product/image > >> - > >>> recognition-processors-visconti.html > >>> > >>> Yuji Ishikawa (4): > >>> dt-bindings: soc: visconti: Add Toshiba Visconti DNN image processing > >>> accelerator bindings > >>> soc: visconti: Add Toshiba Visconti image processing accelerator > >>> common source > >>> soc: visconti: Add Toshiba Visconti DNN image processing accelerator > >>> MAINTAINERS: Add entries for Toshiba Visconti DNN image processing > >>> accelerator > >>> > >>> .../soc/visconti/toshiba,visconti-dnn.yaml | 54 ++ > >>> MAINTAINERS | 2 + > >>> drivers/soc/Kconfig | 1 + > >>> drivers/soc/Makefile | 1 + > >>> drivers/soc/visconti/Kconfig | 7 + > >>> drivers/soc/visconti/Makefile | 8 + > >>> drivers/soc/visconti/dnn/Makefile | 6 + > >>> drivers/soc/visconti/dnn/dnn.c | 533 > >> ++++++++++++++++++ > >>> drivers/soc/visconti/dnn/hwd_dnn.c | 183 ++++++ > >>> drivers/soc/visconti/dnn/hwd_dnn.h | 68 +++ > >>> drivers/soc/visconti/dnn/hwd_dnn_reg.h | 228 ++++++++ > >>> drivers/soc/visconti/ipa_common.c | 55 ++ > >>> drivers/soc/visconti/ipa_common.h | 18 + > >>> drivers/soc/visconti/uapi/dnn.h | 77 +++ > >>> drivers/soc/visconti/uapi/ipa.h | 88 +++ > >>> 15 files changed, 1329 insertions(+) create mode 100644 > >>> Documentation/devicetree/bindings/soc/visconti/toshiba,visconti-dnn. > >>> ya ml create mode 100644 drivers/soc/visconti/Kconfig create mode > >>> 100644 drivers/soc/visconti/Makefile create mode 100644 > >>> drivers/soc/visconti/dnn/Makefile create mode 100644 > >>> drivers/soc/visconti/dnn/dnn.c create mode 100644 > >>> drivers/soc/visconti/dnn/hwd_dnn.c > >>> create mode 100644 drivers/soc/visconti/dnn/hwd_dnn.h > >>> create mode 100644 drivers/soc/visconti/dnn/hwd_dnn_reg.h > >>> create mode 100644 drivers/soc/visconti/ipa_common.c create mode > >>> 100644 drivers/soc/visconti/ipa_common.h create mode 100644 > >>> drivers/soc/visconti/uapi/dnn.h create mode 100644 > >>> drivers/soc/visconti/uapi/ipa.h > >>>