Testing the RK3288 VPU with static data on mainline kernels (Re: VPU tests)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm adding the Linux Rockchip LKML and Linux IOMMU LKML since mimicking
old 4.4 code leads me to other issues.

ayaka a écrit :
> Have you tried my new driver?
MPP_service ? I'd like to but since the 4.4 Rockchip branch is being a
bit difficult to recompile these days, I have to make do with the old
prepackaged Rockchip-linux specific 4.4 kernel and its "vpu-service" driver.
I could try to port the code, but then I'll have other issues, as stated
below.
>
> I don't see the configure to the iommu and the iommu is not set to
> bypass either.
Well, trying to do a simple iommu_get_dma_cookie triggered a ENODEV error.
Which leds me to an old issue with RK3288 systems and mainline kernels :
CONFIG_IOMMU_DMA is not set up by default when you select the Rockchip
IOMMU driver.
It's only enabled if you also enables the MediaTek IOMMU driver. So, I
guess that it's only enabled when using global configuration files that
target many boards at once.

I'm adding the Rockchip LKML, since I'd like to know why
CONFIG_IOMMU_DMA is not enabled, nor tested, by default when selecting
the Rockchip IOMMU driver ?
The old 4.4 drivers seems to heavily rely on it, making the whole
porting process more difficult.

Now, forcing CONFIG_IOMMU_DMA on mainline kernels breaks the Video
Output MMU initialization, which leads to a lot of BUG_ON from the DRM
drivers.
Unplugging the screen before the system starts allows me to boot the
system correctly (but without screen) and SSH into it.

That said, enabling this option doesn't solve my issues with my VPU
driver. Meaning that the VPU starts, triggers the IRQ, stops and nothing
is written in the output...
And now I also have no useable screen.

I tried adding the gool old dance :
iommu_domain_alloc(vpu_dev);
iommu_get_dma_cookie(driver_data->iommu_domain);
iommu_group_get(vpu_dev);
iommu_dma_init_domain(driver_data->iommu_domain, 0x10000000, SZ_2G,
vpu_dev);
iommu_group_put(group);

But that doesn't change anything. The output DMA buffer is still
untouched and my custom IOMMU Fault handler is not triggered.
I'll give the DMA-Debug API a try.

Meanwhile, I'm also adding the Linux IOMMU LKML, since I'd like to know
what's the recommended way to initialize a device to perform DMA
operations, when there's an IOMMU, on mainline kernels ?
I see a lot of legacy code (from 4.4 kernels) that tends to use the
IOMMU and DMA API in ways that have been removed, or seem rather unused
(grep or bootlin doesn't show much use).

For example, do I still need to do iommu_get_dma_cookie ?
rk_iommu_domain_alloc seems to perform the operation automatically, and
the domain allocation is also done automatically with
iommu_get_domain_for_dev .
Should I still call iommu_dma_init_domain ?
Also, does calling dma_set_max_seg_size makes sense for a device driver
? That function seems to be reserved for DMA drivers, yet I saw it on
multiple implementations of the VPU driver, in the 4.4 kernels :
https://github.com/rockchip-linux/kernel/blob/release-4.4/drivers/video/rockchip/vpu/vpu_iommu_drm.c#L139
https://github.com/rockchip-linux/kernel/blob/release-4.4/drivers/media/platform/rockchip-vpu/rockchip_vpu_hw.c#L179

Do you still need to attach the device you're using, using
iommu_attach_device, if the attached IOMMU device is declared in its DTS
node ?

>
>
> On 08/18/2018 09:41 AM, Miouyouyou (Myy) wrote:
>> Greetings,
>>
>> I'm currently testing the RK3288 VPU driver on mainline kernels 4.18+
>> (soon 4.19-rc1).
>> The boards I'm using to perform the tests are :
>> * A Tinkerboard with a mainline kernel patched by myself (
>> https://github.com/Miouyouyou/RockMyy )
>> * A MiQi with 4.4 kernel packaged by Armbian, MPV and a modified version
>> of RKMPP, version 20171218 .
>>
>> Right now I'm testing the unit that decode H264 frames. This unit seems
>> to be referred as "hw_vpu_4831" in the old VPU "vcodec_service.c" driver
>> used on Rockchip 4.4 kernels.
>> My current goal is to perform a single H264 decode pass using static
>> data, in order to avoid being bothered by issues that are not directly
>> related to the VPU.
>> If that works, then it means that main part works and I can use this as
>> a basis to port the MPP Service driver, and the V4L2 Chromium driver.
>> Static data allows for determinism, which is extremely useful when
>> dealing with something as complex as H264 decoders.
>>
>>
>> In order to get those static data what I did was :
>>
>> 1. Modify an old version of RKMPP ( mpp-release_20171218 ) to take
>> snapshots of :
>>   * the 101 registers sent to the VPU;
>>   * the encoded frame to decode;
>>   * the quantization table used for this frame;
>> when decoding the 120 first frames of an H264 movie (played through MPV,
>> with the RKMPP backend).
>>
>> 2. Write a kernel driver that :
>>
>>   * Incorporates these snapshots (registers, encoded frame, generated
>> quantization table) as static arrays
>>   (
>> https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test_static_data.h
>>
>> )
>>
>>   * Allocates 3 DMA buffers for the encoded frame, the quantization
>> table
>> and the output.
>>   (
>> https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L755
>>
>> )
>>
>>   * Copy the encoded frame and the quantization table into the
>> respective
>> DMA buffers.
>>   (
>> https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L771
>>
>> )
>>
>>   * Modifies the registers snapshot, by switching the file descriptors
>> references by the actual IOVA of the respective DMA buffers.
>>   (
>> https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L305
>>
>> )
>>
>>   * Setup the clocks and the IRQ handlers
>>   (
>> https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L445
>>
>> )
>>   (
>> https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L812
>>
>> )
>>
>>   * Execute a decode pass
>>   (
>> https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L830
>>
>> )
>>   (
>> https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L372
>>
>> )
>>   (
>> https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L424
>>
>> )
>>
>> What currently happens after the decode pass is that the IRQ handler
>> gets called.
>> When checking the first register (SwReg01) state in this handler, it is
>> always set 0x00010100 .
>> I write 0 to this register (SwReg01) in order to end the current VPU
>> job.
>>
>> However, my issue is that the output buffer remains untouched.
>> Nothing changed in the output buffer.
>> The content of the output buffer is memset to 0xff on initialization and
>> then checked by mmap'ing the DMA buffer from user-space, and writing the
>> content into a file, using the simple following program :
>> https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/user-mode/test-mmap.c
>>
>>
>> This simple program is also used to check the VPU first 60 registers,
>> which are always :
>>
>> uint32_t regs[60] = {
>>          0x67313688, 0x00000000, 0xfff80510, 0x00081201,
>>          0x3c022004, 0x00ef4000, 0xa40017f0, 0xb8040000,
>>          0x50050000, 0x00090007, 0x128398a4, 0x1ee6b16a,
>>          0x007ea00d, 0x00000000, 0x00000000, 0x00000000,
>>          0x00000000, 0x00000000, 0x00000000, 0x00000000,
>>          0x00000000, 0x00000000, 0x00000000, 0x00000000,
>>          0x00000000, 0x00000000, 0x00000000, 0x00000000,
>>          0x00000000, 0x00000000, 0x00000000, 0x00000000,
>>          0x00000000, 0x00000000, 0x00000000, 0x00000000,
>>          0x00000000, 0x00000000, 0x00000000, 0x00000000,
>>          0x007e9000, 0x0002fd00, 0x04208400, 0x0a521063,
>>          0x10839cc6, 0x16b52929, 0x1ce6b58c, 0x062081ef,
>>          0x00000000, 0x007fb050, 0xfbb56f80, 0x00000000,
>>          0x00000000, 0x00000000, 0xe5da0000, 0x00000008,
>>          0x00000000, 0x000000de, 0x00000001, 0x00000000,
>> };
>>
>> The IOVA used during the pass are :
>> Output : 0x00000000 ( 1920 * 1080 * 4 bytes long )
>> QTable : 0x007e9000
>> Input : 0x007ea000
>>
>> Note that the IOVA of the output buffer is 0x00000000 .
>> That's why regs[13] to regs[29] are set to 0x00000000 .
>>
>> I see that :
>> * regs[0] (SwReg00) is set to some value, but the register is not
>> documented.
>> * regs[3] (SwReg03) is set to 0x00081201 instead of 0x00081200.
>>     The last bit set is named "sw_dec_axi_wr_id" in the RKMPP sources
>> but
>> I have no idea what it means.
>> * I see that regs[12] (SwReg12) is set to 0x007ea00d after the decode
>> pass.
>>    Before the decode pass, it was set to 0x007ea000, the Input IOVA.
>>    What the "d" (0b1101) means here ?
>> * regs[50] (SwReg50) and regs[54] (SwReg54) are set to some value. Do
>> these values have any meaning ?
>> * regs[58] (SwReg58) is set to 1. What does it mean ?
>>
>>
>> I've setup an IOMMU fault handler to catch potential DMA issues but the
>> fault handler is never called.
>> (
>> https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L740
>>
>> )
>>
>>
>> So, basically, I got a VPU that runs, calls the IRQ handler and provides
>> zero output for reasons I do not understand.
>> And I got no useful error messages. No crashes. No freezes. No warnings
>> in dmesg logs.
>> Nothing. It just runs, calls the IRQ handler, stops and does nothing
>> useful.
>> The only messages I get in the logs are the "printk" I setup in the IRQ
>> handler. (IRQ : 60 - State : 0x00010100).
>> https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L140
>>
>>
>> Since I'm using only static data, the result is deterministic. Meaning
>> that there should not be any random changes.
>>
>> Therefore I got a few questions, since you are more knowledgeable than
>> me about the internals of the VPU.
>>
>>
>> 1. If the VPU fails to decode a frame, which registers are set ?
>> Or to rephrase it : How do I know that the VPU failed to decode a
>> frame ?
>> And does the VPU provides some information about why it failed ?
>>
>> 2. What needs to be enabled to perform a VPU decode pass, beside setting
>> the VPU registers ?
>> Meaning :
>> * What clocks are needed and what are their default rates ?
>>    "Aclk" and "Iface" clocks are enabled and setup to 200 Mhz and 50 Mhz
>> respectively, in my driver.
>>    The video power domain (pd_video) is also set during the
>> initialization of the VPU IOMMU but I have no idea of its clockrate.
>>    Note that I'm no using the HEVC unit, so I don't enable the HEVC
>> related clocks.
>> * What else needs to be enabled ?
>>    In the Chromium V4L2 driver for RK3288 VPU, it seems that Tomasz Figa
>> only enables these two clocks, setup the IOMMU (it seems to be done
>> automatically now, in mainline kernels, but I have to contact Jeffy Chen
>> just to be sure), setup the registers, write them and get its result.
>> https://github.com/rockchip-linux/kernel/blob/release-4.4/drivers/media/platform/rockchip-vpu/rockchip_vpu_hw.c
>>
>> https://github.com/rockchip-linux/kernel/blob/release-4.4/drivers/media/platform/rockchip-vpu/rk3288_vpu_hw_h264d.c
>>
>>
>> 3. To rephrase question 2 : Is there a checklist of actions to perform
>> to be sure that RK3288 VPU will decode correctly.
>>
>> 4. Do you have any files to perform Single H264 Frame Decoding tests ? I
>> see that the recent RKMPP releases have "Single Frame Decoding" IOCTL.
>> Is there any files to test this with ?
>>
>>
>>
>> Note that the snapshots I'm using are available here :
>> https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/refs_dumps/refs.tar.xz
>>
>> https://github.com/Miouyouyou/Mainline-Rockchip-VPU/tree/dev/refs_dumps
>>
>> The snapshots were done by modifying :
>> mpp/hal/rkdec/h264d/hal_h264d_vdpu1.c
>>
>> And adding the following function :
>>
>> static void myy_dump_frame_and_regs(
>>      H264dHalCtx_t *p_hal,
>>      H264dVdpu1Regs_t *p_regs)
>> {
>>      static uint8_t dumps = 0;
>>      char regs_name[25];
>>      char frame_name[25];
>>      char qtable_name[25];
>>
>>      //mpp_err_f("%s", "dumping");
>>      if (dumps < 120)
>>      {
>>          snprintf(regs_name, 24, "/tmp/mpp_dump_%04d_regs", dumps);
>>          snprintf(frame_name, 24, "/tmp/mpp_dump_%04d_frame", dumps);
>>          snprintf(qtable_name, 24, "/tmp/mpp_dump_%04d_qtbl", dumps);
>>
>>          int fd = open(regs_name, O_CREAT | O_RDWR, 00644);
>>          if (fd > 0) {
>>              int const bytes_written = write(fd,
>>                  p_regs, sizeof(H264dVdpu1Regs_t));
>>              //mpp_err_f("Logging regs to %s", regs_name);
>>              //mpp_err_f("Wrote %d bytes", bytes_written);
>>              close(fd);
>>          }
>>          fd = open(frame_name, O_CREAT | O_RDWR, 00644);
>>          if (fd > 0) {
>>              int const bytes_written = write(fd,
>>                  p_hal->bitstream, p_hal->strm_len);
>>              //mpp_err_f("Logging frames to %s", frame_name);
>>              //mpp_err_f("Wrote %d bytes", bytes_written);
>>              close(fd);
>>          }
>>          fd = open(qtable_name, O_CREAT | O_RDWR, 00644);
>>          if (fd > 0) {
>>              int const bytes_written = write(fd,
>>                  p_hal->cabac_buf,
>>                  VDPU_CABAC_TAB_SIZE
>>                  + VDPU_SCALING_LIST_SIZE
>>                  + VDPU_POC_BUF_SIZE);
>>              //mpp_err_f("Logging qtable to %s", frame_name);
>>              //mpp_err_f("Wrote %d bytes", bytes_written);
>>              close(fd);
>>          }
>>          dumps++;
>>      }
>> }
>>
>> And executing this at the end of the vdpu1_h264d_gen_regs phase :
>> MPP_RET vdpu1_h264d_gen_regs(void *hal, HalTaskInfo *task)
>> {
>>      // ...
>>
>>      myy_dump_frame_and_regs(p_hal, (H264dVdpu1Regs_t *) p_hal->regs);
>> __RETURN:
>>      return ret = MPP_OK;
>> __FAILED:
>>      return ret;
>> }
>>
>> And then using the RKMPP backend of MPV to read an H264 movie.
>> My modified copy of RKMPP to perform the snapshots is available here :
>> https://github.com/Miouyouyou/rkmpp-reverse-engineering
>>
>



_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-rockchip




[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux