Hi Hans, I'm going to submit DNN driver (with misc driver style) again. Updated version will include corrections suggested in reviews. For example: * apply checkpatch.pl --strict * uppercase letters in function name -> make them lowercase * use of integer types: uint32_t -> u32 ( _u32 in user API header ) Please let me know further actions which should be done before submission. Regards, Yuji Ishikawa > -----Original Message----- > From: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開) > Sent: Tuesday, July 5, 2022 10:17 AM > To: Hans Verkuil <hverkuil@xxxxxxxxx>; robh+dt@xxxxxxxxxx; iwamatsu > nobuhiro(岩松 信洋 □SWC◯ACT) <nobuhiro1.iwamatsu@xxxxxxxxxxxxx>; > sumit.semwal@xxxxxxxxxx; christian.koenig@xxxxxxx > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > linux-media@xxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx; > linaro-mm-sig@xxxxxxxxxxxxxxxx > Subject: RE: [PATCH 0/4] Add Toshiba Visconti DNN image processing > accelerator driver > > Hi Hans, > > Thank you for your advice. > I understand that V4L2 M2M driver is recommended for a group including > Affine and Pyramid. > > As for DSPIF, our application SW engineer said that there is another version of > firmware to handle neural network. > Therefore, DSP can handle not only images, but also some array/list typed > data. > It would be better to handle DSPIF as a misc device as well as DNN. > > It is still difficult for me to imagine a whole structure of M2M type driver and > corresponding userland application. > What do they look like? Is there any existing driver suitable for reference? > > Let me think about a Pyramid HW which receives a planar (R/G/B as different > planes, not in interleaved RGBRGB format) image, and yields 3 planar images. > Is it covered by only 1 fd, or multiple fds are required? > * A fd for Pyramid driver: > * OUTPUT queue for receiving a picture? > * OUTPUT or OUTPUT_MPLANE? > * Q: Typically, RGB image is handled with single plane buffer in V4L2. RGB > images do not have fourcc for multi-planar format (while YUV images do). > * How should I explicitly request R/G/B planar buffer in V4L2 > semantics? > * CAPTURE queue for yielding a picture? > * CAPTURE or CAPTURE_MPLANE? > * Again: RGB interleaved vs R/G/B planar > * Q: If I want to capture multiple images (with different resolution) at a time, > how should I do? > * Do I have to add other fds for 2nd, 3rd CAPTURE queue? > * Or, multiple images can be packed in the same CAPTURE buffer? > * Or, multiple images can be placed in CAPTURE_MPLANE buffer ? > * Compound control to receive a configuration. > > Regards, > Yuji > > > -----Original Message----- > > From: Hans Verkuil <hverkuil@xxxxxxxxx> > > Sent: Wednesday, June 29, 2022 10:44 PM > > To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開) > > <yuji2.ishikawa@xxxxxxxxxxxxx>; robh+dt@xxxxxxxxxx; iwamatsu > > nobuhiro(岩松 > > 信洋 □SWC◯ACT) <nobuhiro1.iwamatsu@xxxxxxxxxxxxx>; > > sumit.semwal@xxxxxxxxxx; christian.koenig@xxxxxxx > > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; > > linux-kernel@xxxxxxxxxxxxxxx; linux-media@xxxxxxxxxxxxxxx; > > dri-devel@xxxxxxxxxxxxxxxxxxxxx; linaro-mm-sig@xxxxxxxxxxxxxxxx > > Subject: Re: [PATCH 0/4] Add Toshiba Visconti DNN image processing > > accelerator driver > > > > My apologies for the late reply... > > > > On 01/06/2022 03:40, yuji2.ishikawa@xxxxxxxxxxxxx wrote: > > > Hi Hans, > > > > > > Thank you for your advice. > > > I prepared some description of DNN accelerator and its usage. > > > > > > #### Handling memory blocks for Visconti5 accelerators > > > > > > Visconti5 Image-Processing-Accelerators do not have fine grained > > > IOMMU, > > as CPU have. > > > Therefore, memory region to be passed to the accelerators should be > > physically contiguous. > > > We use DMA-BUF backed by CMA (Contiguous Memory Allocator) to > > > allocate > > memory regions for sharing between CPU/IPAs. > > > Originally, in v4.19 based implementation, the ION allocator was > > > used to > > allocate DMA-BUF instances. > > > For the latest implementation, DMA-BUF HEAPS is used. > > > > > > Two structure types are used to represent memory region passed to drivers. > > > * struct drv_ipa_buffer_info > > > * to describe whole DMA-BUF instance > > > * struct drv_ipa_addr > > > * to describe a memory region in a DMA-BUF instance > > > > > > for details, see usage sample of each IPA driver > > > > > > > > > #### Image Processing Accelerators overview > > > > > > Visconti5 SoC has following image processing accererators > > > > > > * AFFINE: 1 input image, 1 output image; > > Affine transform, Homography transform, Polynomial lens distortion, > > LUT transform > > > * DNN: N input feature vector, N output feature vector; > > Deep neural network operation > > > * PYRAMID 3 input image, 3 * N output image; > > Resize grayscale/color image with N different parameters > > > * DSPIF: M input image, N output image; > > Various opeations on images > > > * HOX: 1 input image (multi ROI), 1 input dictionary1 likelihood/feature > > vector; Extended Histogram of Oriented Gradient based pattern > > matching > > > * HAMAT: 2 input feature vectors: 1 output corrdinate vector; > > Hamming distance matching for stereo vision > > > * FLMAT: 3 input image, N input feature point, N output matched > > > point; > > Optical flow matching > > > * SMLDB: 1 input image, N input feature point, N output feature > > > vector; > > Accelerated-KAZE feature descriptor accelerator > > > * STMAT: 2 input image, 1 output disparity image; > > Stereo disparity > > > > It's not really easy to decide what is best. I would say that if both > > input and output are images (RGB, YUV, Grayscale), then a V4L2 > > memory-to-memory driver is what I would expect to see. > > > > Where that is not the case, then a more custom approach makes sense. > > > > In the list above I would put AFFINE, PYRAMID, DSPIF and possible > > STMAT in the V4L2 driver group, and the others more as custom drivers. > > > > I think it also depends on how it is used: if a captured sensor image > > is typically passed in for further processing, i.e. it is closely > > related to the video ISP, then > > V4L2 is a reasonable choice. > > > > A DNN driver, on the other hand, isn't using images at all, so for > > that something like this driver makes sense. > > > > Regards, > > > > Hans > > > > > > > > see [0] Fig 7.2.1 for block diagram (of prototype chip) > > > > > > > > > #### DNN accelerator overview > > > > > > DNN accelerator is a proprietary CNN/DCNN processing accelerator > > developed by Toshiba. > > > Visconti5 SoC has 2 instances of DNN acclerator hardware. > > > Users convert existing Caffe/ONNX models to Visconti compatible > > > models > > with an offline tool. > > > A converted model "Configuration Binary" includes: > > > * instruction sequence for given network > > > * weight/bias information > > > * DMA configuration from/to global memory (for input/output > > > feature) > > > > > > DNN acccelerator can handle either 1 plane or multiple ROIs at a single call. > > > > > > see [0] Fig 7.2.2 for block diagram of DNN accelerator > > > > > > CNN: Convolutional Neural Network > > > DCNN: Deep Convolutional Neural Network > > > > > > > > > #### Input / Output > > > > > > Input image or feature: base type is either of FP16, FP32, INT8, > > > UINT8, INT16 Output feature vector: base type is either of FP16, > > > FP32, INT8, UINT8, INT16 > > > > > > Input, Output, Weight, Bias can be placed on global memory and > > loaded/stored with DMA within DNN accelerator. > > > These data on global memory can be specified as either of: > > > * single address to point single data block > > > * list of address to point multiple data blocks (i.e. ROIs) > > > > > > DNN acclerator driver accepts an instance of "struct drv_dnn_descriptor" > > which includes addresses of input/output features and a configuration > binary. > > > > > > > > > #### Descriptor Builder at userland > > > > > > Following APIs are provided to build a descriptor instance at userland. > > > > > > /* defined in drv_dnn_util.h */ > > > int32_t drv_DNN_config_descript_init(struct drv_dnn_descriptor > > > *desc, struct drv_ipa_buffer_info *buffer, int32_t buffer_num); > > > int32_t > > drv_DNN_config_exec_configuration(struct drv_dnn_descriptor *desc, > > const void *configuration_binary, > > > struct drv_ipa_addr > > configuration_binary_addr, struct drv_ipa_addr *src_list, > > > struct drv_ipa_addr > > > *dst_list, > > int32_t list_num, struct drv_ipa_addr temporary_addr, > > > int32_t temporary_size); > > > int32_t drv_DNN_config_descript_finalize(struct drv_dnn_descriptor > > > *desc); > > > > > > struct drv_dnn_descriptor is defined in drivers/soc/visconti/uapi/dnn.h. > > > I think this header should be placed anywhere else to be collected > > > on "make > > headers_install" action of kernel building. > > > > > > > > > #### Usage sample (without error handlers) > > > > > > #include <linux/dma-heap.h> > > > #include "drv_ipa.h" > > > #include "drv_dnn.h" > > > #include "drv_dnn_util.h" > > > > > > int allocate_buffer(int fd_heap, int size) > > > { > > > struct dma_heap_allocation_data heap_data_in={0}; > > > int ret; > > > > > > heap_data_in.len = ROUNDUP_POW2(size); > > > heap_data_in.fd_flags = O_RDWR | O_CLOEXEC; > > > > > > ret = ioctl(fd_heap, DMA_HEAP_IOCTL_ALLOC, &heap_data_in); > > > if (ret <0) > > > return -1; > > > else > > > return heap_data_in.fd; > > > } > > > > > > void dnn_sample(int fd_dnn, int fd_conf, int fd_src, int fd_dst, > > > int > > fd_temp) > > > { > > > int32_t ret; > > > struct drv_ipa_buffer_info bufinfo[4] = { > > > > > {.fd=fd_conf, .coherent=true, .direction=DRV_IPA_DIR_TO_DEVICE}, > > > > > {.fd=fd_src, .coherent=true, .direction=DRV_IPA_DIR_TO_DEVICE}, > > > > > {.fd=fd_dst, .coherent=true, .direction=DRV_IPA_DIR_FROM_DEVICE}, > > > > > {.fd=fd_temp, .coherent=true, .direction=DRV_IPA_DIR_FROM_DEVICE}, > > > }; > > > struct drv_ipa_addr conf_addr = {.buffer_index=0, .offset=0}; > > > struct drv_ipa_addr src_addr = {.buffer_index=1, .offset=0}; > > > struct drv_ipa_addr dst_addr = {.buffer_index=2, .offset=0}; > > > struct drv_ipa_addr temp_addr = {.buffer_index=3, .offset=0}; > > > struct drv_dnn_descriptor desc; > > > > > > struct drv_ipa_addr src_list[] = {src_addr}; > > > struct drv_ipa_addr dst_list[] = {dst_addr}; > > > > > > uint8_t *config = (uint8_t*)mmap(NULL, DNN_CONF_BIN_SIZE, > > > PROT_READ, MAP_SHARED, fd_conf, 0); > > > > > > drv_DNN_config_descript_init(&desc, bufinfo, 4); > > > drv_DNN_config_exec_configuration(&desc, config, conf_addr, > > src_list, dst_list, 1, temp_addr, TEMP_BUF_SIZE); > > > drv_DNN_config_descript_finalize(&desc); > > > > > > ioctl(fd_dnn, IOC_IPA_START, &desc); > > > > > > { > > > struct pollfd fds[] = > > {.fd=fd_dnn, .events=POLL_IN, .revents=0}; > > > poll(fds, 1, 1000); > > > } > > > } > > > > > > void sample() > > > { > > > int fd_dnn, fd_heap, fd_conf, fd_src, fd_dst, fd_temp; > > > > > > fd_dnn = open("/dev/dnn0", O_RDWR); > > > fd_heap = open("/dev/dma_heap/linux,cma", O_RDWR); > > > fd_conf = allocate_buffer(fd_heap, > DNN_CONF_BIN_ALLOC_SIZE); > > > fd_src = allocate_buffer(fd_heap, INPUT_IMG_ALLOC_SIZE); > > > fd_dst = allocate_buffer(fd_heap, OUTPUT_IMG_ALLOC_SIZE); > > > fd_temp = allocate_buffer(fd_heap, TEMP_BUF_ALLOC_SIZE); > > > > > > /* fill in input image and configuration here */ > > > > > > dnn_sample(fd_dnn, fd_conf, fd_src, fd_dst, fd_temp); > > > > > > ... > > > }; > > > > > > > > > #### Reference > > > > > > * [0] > > > https://toshiba.semicon-storage.com/content/dam/toshiba-ss-v2/master/e > > n /company/technical-review/pdf/technical-review-18_e.pdf > > > * Fig 7.2.1 shows the whole architecture of prototype chip > > > * Fig 7.2.2 shows the architecture of DNN accelerator > > > > > > > > > Regards, > > > Yuji > > > > > >> -----Original Message----- > > >> From: Hans Verkuil <hverkuil@xxxxxxxxx> > > >> Sent: Friday, May 20, 2022 7:03 PM > > >> To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開) > > >> <yuji2.ishikawa@xxxxxxxxxxxxx>; robh+dt@xxxxxxxxxx; iwamatsu > > >> nobuhiro(岩松 > > >> 信洋 □SWC◯ACT) <nobuhiro1.iwamatsu@xxxxxxxxxxxxx>; > > >> sumit.semwal@xxxxxxxxxx; christian.koenig@xxxxxxx > > >> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; > > >> linux-kernel@xxxxxxxxxxxxxxx; linux-media@xxxxxxxxxxxxxxx; > > >> dri-devel@xxxxxxxxxxxxxxxxxxxxx; linaro-mm-sig@xxxxxxxxxxxxxxxx > > >> Subject: Re: [PATCH 0/4] Add Toshiba Visconti DNN image processing > > >> accelerator driver > > >> > > >> Hi Yuji, > > >> > > >> On 5/20/22 11:48, yuji2.ishikawa@xxxxxxxxxxxxx wrote: > > >>> Hi Hans, > > >>> > > >>> Thank you for your comment. > > >>> I agree that this submission lacks documents sharing basic idea of > > >>> the > > >> accelerators; what do they accept and what do they yield. > > >>> Where can I put a new document? Can I put it as a comment in a > > >>> source? Can > > >> I add a file under Documentation/misc-devices directory? > > >> > > >> Start with explaining it by replying to this mail. Without knowing > > >> anything about the hardware, it is difficult to say what the best > > >> place is. Usually it is either the public API header, or somewhere > > >> in > > Documentation. > > >> > > >> The first step is to have a better understanding of the Visconti > > >> image hardware and to see what the best subsystem would be to > > >> support > > that hardware. > > >> > > >> Regards, > > >> > > >> Hans > > >> > > >>> > > >>> Thanks, > > >>> Yuji Ishikawa > > >>> > > >>>> -----Original Message----- > > >>>> From: Hans Verkuil <hverkuil@xxxxxxxxx> > > >>>> Sent: Thursday, May 12, 2022 8:15 PM > > >>>> To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開) > > >>>> <yuji2.ishikawa@xxxxxxxxxxxxx>; Rob Herring <robh+dt@xxxxxxxxxx>; > > >>>> iwamatsu nobuhiro(岩松 信洋 □SWC◯ACT) > > >>>> <nobuhiro1.iwamatsu@xxxxxxxxxxxxx>; Sumit Semwal > > >>>> <sumit.semwal@xxxxxxxxxx>; Christian König > > >> <christian.koenig@xxxxxxx> > > >>>> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; > > >>>> linux-kernel@xxxxxxxxxxxxxxx; linux-media@xxxxxxxxxxxxxxx; > > >>>> dri-devel@xxxxxxxxxxxxxxxxxxxxx; linaro-mm-sig@xxxxxxxxxxxxxxxx > > >>>> Subject: Re: [PATCH 0/4] Add Toshiba Visconti DNN image > > >>>> processing accelerator driver > > >>>> > > >>>> Hi Yuji, > > >>>> > > >>>> On 4/28/22 15:11, Yuji Ishikawa wrote: > > >>>>> This series is the DNN image processing accelerator driver for > > >>>>> Toshiba's ARM > > >>>> SoC, Visconti[0]. > > >>>>> This provides DT binding documentation, device driver, > > >>>>> MAINTAINER > > >> files. > > >>>>> > > >>>>> The second patch "soc: visconti: Add Toshiba Visconti image > > >>>>> processing > > >>>> accelerator common source" > > >>>>> and the fourth patch "MAINTAINERS: ..." are the same as the ones > > >>>>> in the > > >>>> preceding post for affine driver. > > >>>> > > >>>> There appears to be no documentation whatsoever, unless I am > > >>>> missing something. > > >>>> > > >>>> How is the uAPI supposed to be used? What does it do? What > > >>>> formats does it accept or produce? > > >>>> > > >>>> If this processes images, then (as Laurent mentioned) this is > > >>>> more suitable as a > > >>>> V4L2 mem2mem driver. > > >>>> > > >>>> See > > >>>> https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/ > > >>>> de > > >>>> v- > > >>>> me > > >>>> m2mem.html > > >>>> and the many drivers in drivers/media that use it (git grep > > >> v4l2-mem2mem.h). > > >>>> > > >>>> But without any explanation whatsoever I have no idea what does > > >>>> or does not make sense. > > >>>> > > >>>> Regards, > > >>>> > > >>>> Hans > > >>>> > > >>>>> > > >>>>> Best regards, > > >>>>> Yuji > > >>>>> > > >>>>> [0]: > > >>>>> > > >>>> > > >> > > https://toshiba.semicon-storage.com/ap-en/semiconductor/product/image > > >>>> - > > >>>>> recognition-processors-visconti.html > > >>>>> > > >>>>> Yuji Ishikawa (4): > > >>>>> dt-bindings: soc: visconti: Add Toshiba Visconti DNN image > > processing > > >>>>> accelerator bindings > > >>>>> soc: visconti: Add Toshiba Visconti image processing accelerator > > >>>>> common source > > >>>>> soc: visconti: Add Toshiba Visconti DNN image processing > > accelerator > > >>>>> MAINTAINERS: Add entries for Toshiba Visconti DNN image > > processing > > >>>>> accelerator > > >>>>> > > >>>>> .../soc/visconti/toshiba,visconti-dnn.yaml | 54 ++ > > >>>>> MAINTAINERS | 2 + > > >>>>> drivers/soc/Kconfig | 1 + > > >>>>> drivers/soc/Makefile | 1 + > > >>>>> drivers/soc/visconti/Kconfig | 7 + > > >>>>> drivers/soc/visconti/Makefile | 8 + > > >>>>> drivers/soc/visconti/dnn/Makefile | 6 + > > >>>>> drivers/soc/visconti/dnn/dnn.c | 533 > > >>>> ++++++++++++++++++ > > >>>>> drivers/soc/visconti/dnn/hwd_dnn.c | 183 ++++++ > > >>>>> drivers/soc/visconti/dnn/hwd_dnn.h | 68 +++ > > >>>>> drivers/soc/visconti/dnn/hwd_dnn_reg.h | 228 > ++++++++ > > >>>>> drivers/soc/visconti/ipa_common.c | 55 ++ > > >>>>> drivers/soc/visconti/ipa_common.h | 18 + > > >>>>> drivers/soc/visconti/uapi/dnn.h | 77 +++ > > >>>>> drivers/soc/visconti/uapi/ipa.h | 88 +++ > > >>>>> 15 files changed, 1329 insertions(+) create mode 100644 > > >>>>> > > Documentation/devicetree/bindings/soc/visconti/toshiba,visconti-dnn. > > >>>>> ya ml create mode 100644 drivers/soc/visconti/Kconfig create > > >>>>> mode > > >>>>> 100644 drivers/soc/visconti/Makefile create mode 100644 > > >>>>> drivers/soc/visconti/dnn/Makefile create mode 100644 > > >>>>> drivers/soc/visconti/dnn/dnn.c create mode 100644 > > >>>>> drivers/soc/visconti/dnn/hwd_dnn.c > > >>>>> create mode 100644 drivers/soc/visconti/dnn/hwd_dnn.h > > >>>>> create mode 100644 drivers/soc/visconti/dnn/hwd_dnn_reg.h > > >>>>> create mode 100644 drivers/soc/visconti/ipa_common.c create > > mode > > >>>>> 100644 drivers/soc/visconti/ipa_common.h create mode 100644 > > >>>>> drivers/soc/visconti/uapi/dnn.h create mode 100644 > > >>>>> drivers/soc/visconti/uapi/ipa.h > > >>>>>