My apologies for the late reply... On 01/06/2022 03:40, yuji2.ishikawa@xxxxxxxxxxxxx wrote: > Hi Hans, > > Thank you for your advice. > I prepared some description of DNN accelerator and its usage. > > #### Handling memory blocks for Visconti5 accelerators > > Visconti5 Image-Processing-Accelerators do not have fine grained IOMMU, as CPU have. > Therefore, memory region to be passed to the accelerators should be physically contiguous. > We use DMA-BUF backed by CMA (Contiguous Memory Allocator) to allocate memory regions for sharing between CPU/IPAs. > Originally, in v4.19 based implementation, the ION allocator was used to allocate DMA-BUF instances. > For the latest implementation, DMA-BUF HEAPS is used. > > Two structure types are used to represent memory region passed to drivers. > * struct drv_ipa_buffer_info > * to describe whole DMA-BUF instance > * struct drv_ipa_addr > * to describe a memory region in a DMA-BUF instance > > for details, see usage sample of each IPA driver > > > #### Image Processing Accelerators overview > > Visconti5 SoC has following image processing accererators > > * AFFINE: 1 input image, 1 output image; Affine transform, Homography transform, Polynomial lens distortion, LUT transform > * DNN: N input feature vector, N output feature vector; Deep neural network operation > * PYRAMID 3 input image, 3 * N output image; Resize grayscale/color image with N different parameters > * DSPIF: M input image, N output image; Various opeations on images > * HOX: 1 input image (multi ROI), 1 input dictionary1 likelihood/feature vector; Extended Histogram of Oriented Gradient based pattern matching > * HAMAT: 2 input feature vectors: 1 output corrdinate vector; Hamming distance matching for stereo vision > * FLMAT: 3 input image, N input feature point, N output matched point; Optical flow matching > * SMLDB: 1 input image, N input feature point, N output feature vector; Accelerated-KAZE feature descriptor accelerator > * STMAT: 2 input image, 1 output disparity image; Stereo disparity It's not really easy to decide what is best. I would say that if both input and output are images (RGB, YUV, Grayscale), then a V4L2 memory-to-memory driver is what I would expect to see. Where that is not the case, then a more custom approach makes sense. In the list above I would put AFFINE, PYRAMID, DSPIF and possible STMAT in the V4L2 driver group, and the others more as custom drivers. I think it also depends on how it is used: if a captured sensor image is typically passed in for further processing, i.e. it is closely related to the video ISP, then V4L2 is a reasonable choice. A DNN driver, on the other hand, isn't using images at all, so for that something like this driver makes sense. Regards, Hans > > see [0] Fig 7.2.1 for block diagram (of prototype chip) > > > #### DNN accelerator overview > > DNN accelerator is a proprietary CNN/DCNN processing accelerator developed by Toshiba. > Visconti5 SoC has 2 instances of DNN acclerator hardware. > Users convert existing Caffe/ONNX models to Visconti compatible models with an offline tool. > A converted model "Configuration Binary" includes: > * instruction sequence for given network > * weight/bias information > * DMA configuration from/to global memory (for input/output feature) > > DNN acccelerator can handle either 1 plane or multiple ROIs at a single call. > > see [0] Fig 7.2.2 for block diagram of DNN accelerator > > CNN: Convolutional Neural Network > DCNN: Deep Convolutional Neural Network > > > #### Input / Output > > Input image or feature: base type is either of FP16, FP32, INT8, UINT8, INT16 > Output feature vector: base type is either of FP16, FP32, INT8, UINT8, INT16 > > Input, Output, Weight, Bias can be placed on global memory and loaded/stored with DMA within DNN accelerator. > These data on global memory can be specified as either of: > * single address to point single data block > * list of address to point multiple data blocks (i.e. ROIs) > > DNN acclerator driver accepts an instance of "struct drv_dnn_descriptor" which includes addresses of input/output features and a configuration binary. > > > #### Descriptor Builder at userland > > Following APIs are provided to build a descriptor instance at userland. > > /* defined in drv_dnn_util.h */ > int32_t drv_DNN_config_descript_init(struct drv_dnn_descriptor *desc, struct drv_ipa_buffer_info *buffer, int32_t buffer_num); > int32_t drv_DNN_config_exec_configuration(struct drv_dnn_descriptor *desc, const void *configuration_binary, > struct drv_ipa_addr configuration_binary_addr, struct drv_ipa_addr *src_list, > struct drv_ipa_addr *dst_list, int32_t list_num, struct drv_ipa_addr temporary_addr, > int32_t temporary_size); > int32_t drv_DNN_config_descript_finalize(struct drv_dnn_descriptor *desc); > > struct drv_dnn_descriptor is defined in drivers/soc/visconti/uapi/dnn.h. > I think this header should be placed anywhere else to be collected on "make headers_install" action of kernel building. > > > #### Usage sample (without error handlers) > > #include <linux/dma-heap.h> > #include "drv_ipa.h" > #include "drv_dnn.h" > #include "drv_dnn_util.h" > > int allocate_buffer(int fd_heap, int size) > { > struct dma_heap_allocation_data heap_data_in={0}; > int ret; > > heap_data_in.len = ROUNDUP_POW2(size); > heap_data_in.fd_flags = O_RDWR | O_CLOEXEC; > > ret = ioctl(fd_heap, DMA_HEAP_IOCTL_ALLOC, &heap_data_in); > if (ret <0) > return -1; > else > return heap_data_in.fd; > } > > void dnn_sample(int fd_dnn, int fd_conf, int fd_src, int fd_dst, int fd_temp) > { > int32_t ret; > struct drv_ipa_buffer_info bufinfo[4] = { > {.fd=fd_conf, .coherent=true, .direction=DRV_IPA_DIR_TO_DEVICE}, > {.fd=fd_src, .coherent=true, .direction=DRV_IPA_DIR_TO_DEVICE}, > {.fd=fd_dst, .coherent=true, .direction=DRV_IPA_DIR_FROM_DEVICE}, > {.fd=fd_temp, .coherent=true, .direction=DRV_IPA_DIR_FROM_DEVICE}, > }; > struct drv_ipa_addr conf_addr = {.buffer_index=0, .offset=0}; > struct drv_ipa_addr src_addr = {.buffer_index=1, .offset=0}; > struct drv_ipa_addr dst_addr = {.buffer_index=2, .offset=0}; > struct drv_ipa_addr temp_addr = {.buffer_index=3, .offset=0}; > struct drv_dnn_descriptor desc; > > struct drv_ipa_addr src_list[] = {src_addr}; > struct drv_ipa_addr dst_list[] = {dst_addr}; > > uint8_t *config = (uint8_t*)mmap(NULL, DNN_CONF_BIN_SIZE, PROT_READ, MAP_SHARED, fd_conf, 0); > > drv_DNN_config_descript_init(&desc, bufinfo, 4); > drv_DNN_config_exec_configuration(&desc, config, conf_addr, src_list, dst_list, 1, temp_addr, TEMP_BUF_SIZE); > drv_DNN_config_descript_finalize(&desc); > > ioctl(fd_dnn, IOC_IPA_START, &desc); > > { > struct pollfd fds[] = {.fd=fd_dnn, .events=POLL_IN, .revents=0}; > poll(fds, 1, 1000); > } > } > > void sample() > { > int fd_dnn, fd_heap, fd_conf, fd_src, fd_dst, fd_temp; > > fd_dnn = open("/dev/dnn0", O_RDWR); > fd_heap = open("/dev/dma_heap/linux,cma", O_RDWR); > fd_conf = allocate_buffer(fd_heap, DNN_CONF_BIN_ALLOC_SIZE); > fd_src = allocate_buffer(fd_heap, INPUT_IMG_ALLOC_SIZE); > fd_dst = allocate_buffer(fd_heap, OUTPUT_IMG_ALLOC_SIZE); > fd_temp = allocate_buffer(fd_heap, TEMP_BUF_ALLOC_SIZE); > > /* fill in input image and configuration here */ > > dnn_sample(fd_dnn, fd_conf, fd_src, fd_dst, fd_temp); > > ... > }; > > > #### Reference > > * [0] https://toshiba.semicon-storage.com/content/dam/toshiba-ss-v2/master/en/company/technical-review/pdf/technical-review-18_e.pdf > * Fig 7.2.1 shows the whole architecture of prototype chip > * Fig 7.2.2 shows the architecture of DNN accelerator > > > Regards, > Yuji > >> -----Original Message----- >> From: Hans Verkuil <hverkuil@xxxxxxxxx> >> Sent: Friday, May 20, 2022 7:03 PM >> To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開) >> <yuji2.ishikawa@xxxxxxxxxxxxx>; robh+dt@xxxxxxxxxx; iwamatsu nobuhiro(岩松 >> 信洋 □SWC◯ACT) <nobuhiro1.iwamatsu@xxxxxxxxxxxxx>; >> sumit.semwal@xxxxxxxxxx; christian.koenig@xxxxxxx >> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; >> linux-media@xxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx; >> linaro-mm-sig@xxxxxxxxxxxxxxxx >> Subject: Re: [PATCH 0/4] Add Toshiba Visconti DNN image processing >> accelerator driver >> >> Hi Yuji, >> >> On 5/20/22 11:48, yuji2.ishikawa@xxxxxxxxxxxxx wrote: >>> Hi Hans, >>> >>> Thank you for your comment. >>> I agree that this submission lacks documents sharing basic idea of the >> accelerators; what do they accept and what do they yield. >>> Where can I put a new document? Can I put it as a comment in a source? Can >> I add a file under Documentation/misc-devices directory? >> >> Start with explaining it by replying to this mail. Without knowing anything about >> the hardware, it is difficult to say what the best place is. Usually it is either the >> public API header, or somewhere in Documentation. >> >> The first step is to have a better understanding of the Visconti image hardware >> and to see what the best subsystem would be to support that hardware. >> >> Regards, >> >> Hans >> >>> >>> Thanks, >>> Yuji Ishikawa >>> >>>> -----Original Message----- >>>> From: Hans Verkuil <hverkuil@xxxxxxxxx> >>>> Sent: Thursday, May 12, 2022 8:15 PM >>>> To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開) >>>> <yuji2.ishikawa@xxxxxxxxxxxxx>; Rob Herring <robh+dt@xxxxxxxxxx>; >>>> iwamatsu nobuhiro(岩松 信洋 □SWC◯ACT) >>>> <nobuhiro1.iwamatsu@xxxxxxxxxxxxx>; Sumit Semwal >>>> <sumit.semwal@xxxxxxxxxx>; Christian König >> <christian.koenig@xxxxxxx> >>>> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; >>>> linux-kernel@xxxxxxxxxxxxxxx; linux-media@xxxxxxxxxxxxxxx; >>>> dri-devel@xxxxxxxxxxxxxxxxxxxxx; linaro-mm-sig@xxxxxxxxxxxxxxxx >>>> Subject: Re: [PATCH 0/4] Add Toshiba Visconti DNN image processing >>>> accelerator driver >>>> >>>> Hi Yuji, >>>> >>>> On 4/28/22 15:11, Yuji Ishikawa wrote: >>>>> This series is the DNN image processing accelerator driver for >>>>> Toshiba's ARM >>>> SoC, Visconti[0]. >>>>> This provides DT binding documentation, device driver, MAINTAINER >> files. >>>>> >>>>> The second patch "soc: visconti: Add Toshiba Visconti image >>>>> processing >>>> accelerator common source" >>>>> and the fourth patch "MAINTAINERS: ..." are the same as the ones in >>>>> the >>>> preceding post for affine driver. >>>> >>>> There appears to be no documentation whatsoever, unless I am missing >>>> something. >>>> >>>> How is the uAPI supposed to be used? What does it do? What formats >>>> does it accept or produce? >>>> >>>> If this processes images, then (as Laurent mentioned) this is more >>>> suitable as a >>>> V4L2 mem2mem driver. >>>> >>>> See >>>> https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dev- >>>> me >>>> m2mem.html >>>> and the many drivers in drivers/media that use it (git grep >> v4l2-mem2mem.h). >>>> >>>> But without any explanation whatsoever I have no idea what does or >>>> does not make sense. >>>> >>>> Regards, >>>> >>>> Hans >>>> >>>>> >>>>> Best regards, >>>>> Yuji >>>>> >>>>> [0]: >>>>> >>>> >> https://toshiba.semicon-storage.com/ap-en/semiconductor/product/image >>>> - >>>>> recognition-processors-visconti.html >>>>> >>>>> Yuji Ishikawa (4): >>>>> dt-bindings: soc: visconti: Add Toshiba Visconti DNN image processing >>>>> accelerator bindings >>>>> soc: visconti: Add Toshiba Visconti image processing accelerator >>>>> common source >>>>> soc: visconti: Add Toshiba Visconti DNN image processing accelerator >>>>> MAINTAINERS: Add entries for Toshiba Visconti DNN image processing >>>>> accelerator >>>>> >>>>> .../soc/visconti/toshiba,visconti-dnn.yaml | 54 ++ >>>>> MAINTAINERS | 2 + >>>>> drivers/soc/Kconfig | 1 + >>>>> drivers/soc/Makefile | 1 + >>>>> drivers/soc/visconti/Kconfig | 7 + >>>>> drivers/soc/visconti/Makefile | 8 + >>>>> drivers/soc/visconti/dnn/Makefile | 6 + >>>>> drivers/soc/visconti/dnn/dnn.c | 533 >>>> ++++++++++++++++++ >>>>> drivers/soc/visconti/dnn/hwd_dnn.c | 183 ++++++ >>>>> drivers/soc/visconti/dnn/hwd_dnn.h | 68 +++ >>>>> drivers/soc/visconti/dnn/hwd_dnn_reg.h | 228 ++++++++ >>>>> drivers/soc/visconti/ipa_common.c | 55 ++ >>>>> drivers/soc/visconti/ipa_common.h | 18 + >>>>> drivers/soc/visconti/uapi/dnn.h | 77 +++ >>>>> drivers/soc/visconti/uapi/ipa.h | 88 +++ >>>>> 15 files changed, 1329 insertions(+) create mode 100644 >>>>> Documentation/devicetree/bindings/soc/visconti/toshiba,visconti-dnn. >>>>> ya ml create mode 100644 drivers/soc/visconti/Kconfig create mode >>>>> 100644 drivers/soc/visconti/Makefile create mode 100644 >>>>> drivers/soc/visconti/dnn/Makefile create mode 100644 >>>>> drivers/soc/visconti/dnn/dnn.c create mode 100644 >>>>> drivers/soc/visconti/dnn/hwd_dnn.c >>>>> create mode 100644 drivers/soc/visconti/dnn/hwd_dnn.h >>>>> create mode 100644 drivers/soc/visconti/dnn/hwd_dnn_reg.h >>>>> create mode 100644 drivers/soc/visconti/ipa_common.c create mode >>>>> 100644 drivers/soc/visconti/ipa_common.h create mode 100644 >>>>> drivers/soc/visconti/uapi/dnn.h create mode 100644 >>>>> drivers/soc/visconti/uapi/ipa.h >>>>>