On Fri, Jul 28, 2023 at 4:25 PM Hsia-Jun Li <Randy.Li@xxxxxxxxxxxxx> wrote: > > > > On 7/28/23 15:03, Tomasz Figa wrote: > > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > > > On Fri, Jul 28, 2023 at 3:55 PM Hsia-Jun Li <Randy.Li@xxxxxxxxxxxxx> wrote: > >> > >> > >> > >> On 7/28/23 14:46, Tomasz Figa wrote: > >>> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe. > >>> > >>> > >>> On Mon, Jul 17, 2023 at 4:44 PM Hsia-Jun Li <Randy.Li@xxxxxxxxxxxxx> wrote: > >>>> > >>>> > >>>> On 7/12/23 18:48, Tomasz Figa wrote: > >>>>> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe. > >>>>> > >>>>> > >>>>> On Mon, Jul 03, 2023 at 04:35:30PM +0800, Hsia-Jun Li wrote: > >>>>>> On 7/3/23 16:09, Benjamin Gaignard wrote: > >>>>>>> CAUTION: Email originated externally, do not click links or open > >>>>>>> attachments unless you recognize the sender and know the content is > >>>>>>> safe. > >>>>>>> > >>>>>>> > >>>>>>> Le 30/06/2023 à 11:51, Hsia-Jun Li a écrit : > >>>>>>>> On 6/22/23 21:13, Benjamin Gaignard wrote: > >>>>>>>>> CAUTION: Email originated externally, do not click links or open > >>>>>>>>> attachments unless you recognize the sender and know the content is > >>>>>>>>> safe. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> After changing bufs arrays to a dynamic allocated array > >>>>>>>>> VB2_MAX_FRAME doesn't mean anything for videobuf2 core. > >>>>>>>> I think make it 64 which is the VB2_MAX_FRAME in Android GKI kernel is > >>>>>>>> more reasonable. > >>>>>>>> > >>>>>>>> It would be hard to iterate the whole array, it would go worse with a > >>>>>>>> filter. Such iterate may need to go twice because you mix > >>>>>>>> post-processing buffer and decoding buffer(with MV) in the same array. > >>>>>>> Here I don't want to change drivers behavior so I keep the same value. > >>>>>>> If it happens that they need more buffers, like for dynamic resolution > >>>>>>> change > >>>>>>> feature for Verisilicon VP9 decoder, case by case patches will be needed. > >>>>>>> > >>>>>> I just don't like the idea that using a variant length array here. > >>>>>> > >>>>> "I don't like" is not an argument. We had a number of arguments for > >>>>> using a generic helper (originally idr, but later decided to go with > >>>>> XArray, because the former is now deprecated) that we pointed out in > >>>>> our review comments for previous revisions. It wasn't really about the > >>>>> size being variable, but rather avoiding open coding things in vb2 and > >>>>> duplicating what's already implemented in generic code. > >>>> > >>>> I just want to say I don't think we need a variable length array to > >>>> store the buffer here. > >>>> > >>>> And the below is the reason that such a case could be avoided in the > >>>> first place. > >>>> > >>>>> > >>>>>> And I could explain why you won't need so many buffers for the performance > >>>>>> of decoding. > >>>>>> > >>>>>> VP9 could support 10 reference frames in dpb. > >>>>>> > >>>>>> Even for those frequent resolution changing test set, it would only happen > >>>>>> to two resolutions, > >>>>>> > >>>>>> 32 would be enough for 20 buffers of two resolution plus golden frames. It > >>>>>> also leaves enough slots for re-order latency. > >>>>>> > >>>>>> If your case had more two resolutions, likes low->medium->high. > >>>>>> > >>>>>> I would suggest just skip the medium resolutions, just allocate the lower > >>>>>> one first for fast playback then the highest for all the possible > >>>>>> > >>>>>> medium cases. Reallocation happens frequently would only cause memory > >>>>>> fragment, nothing benefits your performance. > >>>>>> > >>>>> We have mechanisms in the kernel to deal with memory fragmentation > >>>>> (migration/compaction) and it would still only matters for the > >>>>> pathologic cases of hardware that require physically contiguous memory. > >>>>> Modern hardware with proper DMA capabilities can either scatter-gather > >>>>> or are equipped with an IOMMU, so the allocation always happens in page > >>>>> granularity and fragmentation is avoided. > >>>> > >>>> Unfortunately, there are more devices that didn't have a IOMMU attached > >>>> to it, supporting scatter gather is more odd. > >>>> > >>>> It would be more likely that IOMMU would be disabled for the performance > >>>> reason. > >>> > >>> These days IOMMU is totally mandatory if you want to think about > >>> having any level of security in your system. Sure, there could be some > >>> systems that are completely isolated from any external environment, > >>> like some offline industry automation machines, but then arguably > >>> their running conditions would also be quite static and require very > >>> little memory re-allocation. > >> Vendor just decided not to included such hardware. > >> That is why From ION to DMA-heap, people like to allocate from a cavout > >> out memory. > >>> > >>> I also don't buy the performance reason. CPUs have been behind MMUs > >>> for ages and nobody is running them with paging disabled for > >>> performance reasons. Similarly, most of the modern consumer systems > >> Page lookup would increase the delay. Besides a few upstream devices > >> prove them only use a level 1 page table without TBL. > > > > That's just an excuse for a bad hardware design/implementation. As I > > said, there are good IOMMU implementations out there that don't suffer > > from performance issues. > > > I could do nothing about that. > Besides, even with TLB, cache missing could happen frequently, > especially we need to access many (5~16, 10 usually) buffers and more > 11MBytes each in a hardware processing. > You can't have a very large TLB. Right, but as I wrote in my previous emails, we have the right methods in the kernel for providing drivers with contiguous memory and those can be used for those special cases. > >>> (mobile phones, PCs) run with IOMMUs enabled for pretty much anything > >>> because of the security reason and they don't seem to be having any > >> If the page is secure, you can't operate it in a insecure IOMMU or MMU. > >> The most security way here, we should use a dedicated memory(or a zone > >> in unified memory). > > > > You still need something to enforce that the hardware is not accessing > > memory that it's not supposed to access. How do you do that without an > > IOMMU? > > > If you know the arm security pipeline and security controller, you could > found we could reserved a range of memory for a security id(devices in > secure world may be a different security domain). > Besides, a MMU or security MPU could mark some pages for the secure > world access only, it doesn't mean the device need an IOMMU to access > them. The MPU could filter the access through the AXI id. > >> I believe there are more users in mobile for DMA-heap than kernel's dma > >> allocation API. > > > > Yes, but that's completely separate from whether there is an IOMMU or > > not. It's just a different allocation API. > > > The memory heap would mean a dedicated memory usually(we don't talk > about system heap or why there are many vendor heaps). Dedicated memory > means contiguous memory in the most of cases. No, and no. First no - DMA-buf heap doesn't imply dedicated memory and usually one wants to completely avoid carving out memory, because it becomes useless if specific use case is not active. Second no - there are ways to provide dedicated memory regions to the DMA mapping API, such as shared or restricted DMA pool [1]. [1] https://elixir.bootlin.com/linux/latest/source/Documentation/devicetree/bindings/reserved-memory/shared-dma-pool.yaml Best regards, Tomasz > >>> performance issues. In fact, it can improve the performance, because > >>> memory allocation is much easier and without contiguous careouts (as > >>> we used to have long ago on Android devices) the extra memory can be > >>> used for buffers and caches to improve system performance. > >>> > >>> Best regards, > >>> Tomasz > >>> > >>>> > >>>>> > >>>>> Best regards, > >>>>> Tomasz > >>>>> > >>>>>>>>> Remove it from the core definitions but keep it for drivers internal > >>>>>>>>> needs. > >>>>>>>>> > >>>>>>>>> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@xxxxxxxxxxxxx> > >>>>>>>>> --- > >>>>>>>>> drivers/media/common/videobuf2/videobuf2-core.c | 2 ++ > >>>>>>>>> drivers/media/platform/amphion/vdec.c | 1 + > >>>>>>>>> .../media/platform/mediatek/vcodec/vdec/vdec_vp9_req_lat_if.c | 2 ++ > >>>>>>>>> drivers/media/platform/qcom/venus/hfi.h | 2 ++ > >>>>>>>>> drivers/media/platform/verisilicon/hantro_hw.h | 2 ++ > >>>>>>>>> drivers/staging/media/ipu3/ipu3-v4l2.c | 2 ++ > >>>>>>>>> include/media/videobuf2-core.h | 1 - > >>>>>>>>> include/media/videobuf2-v4l2.h | 4 ---- > >>>>>>>>> 8 files changed, 11 insertions(+), 5 deletions(-) > >>>>>>>>> > >>>>>>>>> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c > >>>>>>>>> b/drivers/media/common/videobuf2/videobuf2-core.c > >>>>>>>>> index 86e1e926fa45..899783f67580 100644 > >>>>>>>>> --- a/drivers/media/common/videobuf2/videobuf2-core.c > >>>>>>>>> +++ b/drivers/media/common/videobuf2/videobuf2-core.c > >>>>>>>>> @@ -31,6 +31,8 @@ > >>>>>>>>> > >>>>>>>>> #include <trace/events/vb2.h> > >>>>>>>>> > >>>>>>>>> +#define VB2_MAX_FRAME 32 > >>>>>>>>> + > >>>>>>>>> static int debug; > >>>>>>>>> module_param(debug, int, 0644); > >>>>>>>>> > >>>>>>>>> diff --git a/drivers/media/platform/amphion/vdec.c > >>>>>>>>> b/drivers/media/platform/amphion/vdec.c > >>>>>>>>> index 3fa1a74a2e20..b3219f6d17fa 100644 > >>>>>>>>> --- a/drivers/media/platform/amphion/vdec.c > >>>>>>>>> +++ b/drivers/media/platform/amphion/vdec.c > >>>>>>>>> @@ -28,6 +28,7 @@ > >>>>>>>>> > >>>>>>>>> #define VDEC_MIN_BUFFER_CAP 8 > >>>>>>>>> #define VDEC_MIN_BUFFER_OUT 8 > >>>>>>>>> +#define VB2_MAX_FRAME 32 > >>>>>>>>> > >>>>>>>>> struct vdec_fs_info { > >>>>>>>>> char name[8]; > >>>>>>>>> diff --git > >>>>>>>>> a/drivers/media/platform/mediatek/vcodec/vdec/vdec_vp9_req_lat_if.c > >>>>>>>>> b/drivers/media/platform/mediatek/vcodec/vdec/vdec_vp9_req_lat_if.c > >>>>>>>>> index 6532a69f1fa8..a1e0f24bb91c 100644 > >>>>>>>>> --- a/drivers/media/platform/mediatek/vcodec/vdec/vdec_vp9_req_lat_if.c > >>>>>>>>> +++ b/drivers/media/platform/mediatek/vcodec/vdec/vdec_vp9_req_lat_if.c > >>>>>>>>> @@ -16,6 +16,8 @@ > >>>>>>>>> #include "../vdec_drv_if.h" > >>>>>>>>> #include "../vdec_vpu_if.h" > >>>>>>>>> > >>>>>>>>> +#define VB2_MAX_FRAME 32 > >>>>>>>>> + > >>>>>>>>> /* reset_frame_context defined in VP9 spec */ > >>>>>>>>> #define VP9_RESET_FRAME_CONTEXT_NONE0 0 > >>>>>>>>> #define VP9_RESET_FRAME_CONTEXT_NONE1 1 > >>>>>>>>> diff --git a/drivers/media/platform/qcom/venus/hfi.h > >>>>>>>>> b/drivers/media/platform/qcom/venus/hfi.h > >>>>>>>>> index f25d412d6553..bd5ca5a8b945 100644 > >>>>>>>>> --- a/drivers/media/platform/qcom/venus/hfi.h > >>>>>>>>> +++ b/drivers/media/platform/qcom/venus/hfi.h > >>>>>>>>> @@ -10,6 +10,8 @@ > >>>>>>>>> > >>>>>>>>> #include "hfi_helper.h" > >>>>>>>>> > >>>>>>>>> +#define VB2_MAX_FRAME 32 > >>>>>>>>> + > >>>>>>>>> #define VIDC_SESSION_TYPE_VPE 0 > >>>>>>>>> #define VIDC_SESSION_TYPE_ENC 1 > >>>>>>>>> #define VIDC_SESSION_TYPE_DEC 2 > >>>>>>>>> diff --git a/drivers/media/platform/verisilicon/hantro_hw.h > >>>>>>>>> b/drivers/media/platform/verisilicon/hantro_hw.h > >>>>>>>>> index e83f0c523a30..9e8faf7ba6fb 100644 > >>>>>>>>> --- a/drivers/media/platform/verisilicon/hantro_hw.h > >>>>>>>>> +++ b/drivers/media/platform/verisilicon/hantro_hw.h > >>>>>>>>> @@ -15,6 +15,8 @@ > >>>>>>>>> #include <media/v4l2-vp9.h> > >>>>>>>>> #include <media/videobuf2-core.h> > >>>>>>>>> > >>>>>>>>> +#define VB2_MAX_FRAME 32 > >>>>>>>>> + > >>>>>>>>> #define DEC_8190_ALIGN_MASK 0x07U > >>>>>>>>> > >>>>>>>>> #define MB_DIM 16 > >>>>>>>>> diff --git a/drivers/staging/media/ipu3/ipu3-v4l2.c > >>>>>>>>> b/drivers/staging/media/ipu3/ipu3-v4l2.c > >>>>>>>>> index e530767e80a5..6627b5c2d4d6 100644 > >>>>>>>>> --- a/drivers/staging/media/ipu3/ipu3-v4l2.c > >>>>>>>>> +++ b/drivers/staging/media/ipu3/ipu3-v4l2.c > >>>>>>>>> @@ -10,6 +10,8 @@ > >>>>>>>>> #include "ipu3.h" > >>>>>>>>> #include "ipu3-dmamap.h" > >>>>>>>>> > >>>>>>>>> +#define VB2_MAX_FRAME 32 > >>>>>>>>> + > >>>>>>>>> /******************** v4l2_subdev_ops ********************/ > >>>>>>>>> > >>>>>>>>> #define IPU3_RUNNING_MODE_VIDEO 0 > >>>>>>>>> diff --git a/include/media/videobuf2-core.h > >>>>>>>>> b/include/media/videobuf2-core.h > >>>>>>>>> index 77921cf894ef..080b783d608d 100644 > >>>>>>>>> --- a/include/media/videobuf2-core.h > >>>>>>>>> +++ b/include/media/videobuf2-core.h > >>>>>>>>> @@ -20,7 +20,6 @@ > >>>>>>>>> #include <media/media-request.h> > >>>>>>>>> #include <media/frame_vector.h> > >>>>>>>>> > >>>>>>>>> -#define VB2_MAX_FRAME (32) > >>>>>>>>> #define VB2_MAX_PLANES (8) > >>>>>>>>> > >>>>>>>>> /** > >>>>>>>>> diff --git a/include/media/videobuf2-v4l2.h > >>>>>>>>> b/include/media/videobuf2-v4l2.h > >>>>>>>>> index 5a845887850b..88a7a565170e 100644 > >>>>>>>>> --- a/include/media/videobuf2-v4l2.h > >>>>>>>>> +++ b/include/media/videobuf2-v4l2.h > >>>>>>>>> @@ -15,10 +15,6 @@ > >>>>>>>>> #include <linux/videodev2.h> > >>>>>>>>> #include <media/videobuf2-core.h> > >>>>>>>>> > >>>>>>>>> -#if VB2_MAX_FRAME != VIDEO_MAX_FRAME > >>>>>>>>> -#error VB2_MAX_FRAME != VIDEO_MAX_FRAME > >>>>>>>>> -#endif > >>>>>>>>> - > >>>>>>>>> #if VB2_MAX_PLANES != VIDEO_MAX_PLANES > >>>>>>>>> #error VB2_MAX_PLANES != VIDEO_MAX_PLANES > >>>>>>>>> #endif > >>>>>>>>> -- > >>>>>>>>> 2.39.2 > >>>>>>>>> > >>>>>> -- > >>>>>> Hsia-Jun(Randy) Li > >>>>>> > >>>> -- > >>>> Hsia-Jun(Randy) Li > >>>> > >> > >> -- > >> Hsia-Jun(Randy) Li > > -- > Hsia-Jun(Randy) Li