Greetings,
I'm currently testing the RK3288 VPU driver on mainline kernels 4.18+
(soon 4.19-rc1).
The boards I'm using to perform the tests are :
* A Tinkerboard with a mainline kernel patched by myself (
https://github.com/Miouyouyou/RockMyy )
* A MiQi with 4.4 kernel packaged by Armbian, MPV and a modified version
of RKMPP, version 20171218 .
Right now I'm testing the unit that decode H264 frames. This unit seems
to be referred as "hw_vpu_4831" in the old VPU "vcodec_service.c" driver
used on Rockchip 4.4 kernels.
My current goal is to perform a single H264 decode pass using static
data, in order to avoid being bothered by issues that are not directly
related to the VPU.
If that works, then it means that main part works and I can use this as
a basis to port the MPP Service driver, and the V4L2 Chromium driver.
Static data allows for determinism, which is extremely useful when
dealing with something as complex as H264 decoders.
In order to get those static data what I did was :
1. Modify an old version of RKMPP ( mpp-release_20171218 ) to take
snapshots of :
* the 101 registers sent to the VPU;
* the encoded frame to decode;
* the quantization table used for this frame;
when decoding the 120 first frames of an H264 movie (played through MPV,
with the RKMPP backend).
2. Write a kernel driver that :
* Incorporates these snapshots (registers, encoded frame, generated
quantization table) as static arrays
(
https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test_static_data.h
)
* Allocates 3 DMA buffers for the encoded frame, the quantization
table
and the output.
(
https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L755
)
* Copy the encoded frame and the quantization table into the
respective
DMA buffers.
(
https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L771
)
* Modifies the registers snapshot, by switching the file descriptors
references by the actual IOVA of the respective DMA buffers.
(
https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L305
)
* Setup the clocks and the IRQ handlers
(
https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L445
)
(
https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L812
)
* Execute a decode pass
(
https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L830
)
(
https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L372
)
(
https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L424
)
What currently happens after the decode pass is that the IRQ handler
gets called.
When checking the first register (SwReg01) state in this handler, it is
always set 0x00010100 .
I write 0 to this register (SwReg01) in order to end the current VPU
job.
However, my issue is that the output buffer remains untouched.
Nothing changed in the output buffer.
The content of the output buffer is memset to 0xff on initialization and
then checked by mmap'ing the DMA buffer from user-space, and writing the
content into a file, using the simple following program :
https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/user-mode/test-mmap.c
This simple program is also used to check the VPU first 60 registers,
which are always :
uint32_t regs[60] = {
0x67313688, 0x00000000, 0xfff80510, 0x00081201,
0x3c022004, 0x00ef4000, 0xa40017f0, 0xb8040000,
0x50050000, 0x00090007, 0x128398a4, 0x1ee6b16a,
0x007ea00d, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x007e9000, 0x0002fd00, 0x04208400, 0x0a521063,
0x10839cc6, 0x16b52929, 0x1ce6b58c, 0x062081ef,
0x00000000, 0x007fb050, 0xfbb56f80, 0x00000000,
0x00000000, 0x00000000, 0xe5da0000, 0x00000008,
0x00000000, 0x000000de, 0x00000001, 0x00000000,
};
The IOVA used during the pass are :
Output : 0x00000000 ( 1920 * 1080 * 4 bytes long )
QTable : 0x007e9000
Input : 0x007ea000
Note that the IOVA of the output buffer is 0x00000000 .
That's why regs[13] to regs[29] are set to 0x00000000 .
I see that :
* regs[0] (SwReg00) is set to some value, but the register is not
documented.
* regs[3] (SwReg03) is set to 0x00081201 instead of 0x00081200.
The last bit set is named "sw_dec_axi_wr_id" in the RKMPP sources
but
I have no idea what it means.
* I see that regs[12] (SwReg12) is set to 0x007ea00d after the decode
pass.
Before the decode pass, it was set to 0x007ea000, the Input IOVA.
What the "d" (0b1101) means here ?
* regs[50] (SwReg50) and regs[54] (SwReg54) are set to some value. Do
these values have any meaning ?
* regs[58] (SwReg58) is set to 1. What does it mean ?
I've setup an IOMMU fault handler to catch potential DMA issues but the
fault handler is never called.
(
https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L740
)
So, basically, I got a VPU that runs, calls the IRQ handler and provides
zero output for reasons I do not understand.
And I got no useful error messages. No crashes. No freezes. No warnings
in dmesg logs.
Nothing. It just runs, calls the IRQ handler, stops and does nothing
useful.
The only messages I get in the logs are the "printk" I setup in the IRQ
handler. (IRQ : 60 - State : 0x00010100).
https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/test-devicetree-dma-to-from-user.c#L140
Since I'm using only static data, the result is deterministic. Meaning
that there should not be any random changes.
Therefore I got a few questions, since you are more knowledgeable than
me about the internals of the VPU.
1. If the VPU fails to decode a frame, which registers are set ?
Or to rephrase it : How do I know that the VPU failed to decode a
frame ?
And does the VPU provides some information about why it failed ?
2. What needs to be enabled to perform a VPU decode pass, beside setting
the VPU registers ?
Meaning :
* What clocks are needed and what are their default rates ?
"Aclk" and "Iface" clocks are enabled and setup to 200 Mhz and 50 Mhz
respectively, in my driver.
The video power domain (pd_video) is also set during the
initialization of the VPU IOMMU but I have no idea of its clockrate.
Note that I'm no using the HEVC unit, so I don't enable the HEVC
related clocks.
* What else needs to be enabled ?
In the Chromium V4L2 driver for RK3288 VPU, it seems that Tomasz Figa
only enables these two clocks, setup the IOMMU (it seems to be done
automatically now, in mainline kernels, but I have to contact Jeffy Chen
just to be sure), setup the registers, write them and get its result.
https://github.com/rockchip-linux/kernel/blob/release-4.4/drivers/media/platform/rockchip-vpu/rockchip_vpu_hw.c
https://github.com/rockchip-linux/kernel/blob/release-4.4/drivers/media/platform/rockchip-vpu/rk3288_vpu_hw_h264d.c
3. To rephrase question 2 : Is there a checklist of actions to perform
to be sure that RK3288 VPU will decode correctly.
4. Do you have any files to perform Single H264 Frame Decoding tests ? I
see that the recent RKMPP releases have "Single Frame Decoding" IOCTL.
Is there any files to test this with ?
Note that the snapshots I'm using are available here :
https://github.com/Miouyouyou/Mainline-Rockchip-VPU/blob/dev/refs_dumps/refs.tar.xz
https://github.com/Miouyouyou/Mainline-Rockchip-VPU/tree/dev/refs_dumps
The snapshots were done by modifying :
mpp/hal/rkdec/h264d/hal_h264d_vdpu1.c
And adding the following function :
static void myy_dump_frame_and_regs(
H264dHalCtx_t *p_hal,
H264dVdpu1Regs_t *p_regs)
{
static uint8_t dumps = 0;
char regs_name[25];
char frame_name[25];
char qtable_name[25];
//mpp_err_f("%s", "dumping");
if (dumps < 120)
{
snprintf(regs_name, 24, "/tmp/mpp_dump_%04d_regs", dumps);
snprintf(frame_name, 24, "/tmp/mpp_dump_%04d_frame", dumps);
snprintf(qtable_name, 24, "/tmp/mpp_dump_%04d_qtbl", dumps);
int fd = open(regs_name, O_CREAT | O_RDWR, 00644);
if (fd > 0) {
int const bytes_written = write(fd,
p_regs, sizeof(H264dVdpu1Regs_t));
//mpp_err_f("Logging regs to %s", regs_name);
//mpp_err_f("Wrote %d bytes", bytes_written);
close(fd);
}
fd = open(frame_name, O_CREAT | O_RDWR, 00644);
if (fd > 0) {
int const bytes_written = write(fd,
p_hal->bitstream, p_hal->strm_len);
//mpp_err_f("Logging frames to %s", frame_name);
//mpp_err_f("Wrote %d bytes", bytes_written);
close(fd);
}
fd = open(qtable_name, O_CREAT | O_RDWR, 00644);
if (fd > 0) {
int const bytes_written = write(fd,
p_hal->cabac_buf,
VDPU_CABAC_TAB_SIZE
+ VDPU_SCALING_LIST_SIZE
+ VDPU_POC_BUF_SIZE);
//mpp_err_f("Logging qtable to %s", frame_name);
//mpp_err_f("Wrote %d bytes", bytes_written);
close(fd);
}
dumps++;
}
}
And executing this at the end of the vdpu1_h264d_gen_regs phase :
MPP_RET vdpu1_h264d_gen_regs(void *hal, HalTaskInfo *task)
{
// ...
myy_dump_frame_and_regs(p_hal, (H264dVdpu1Regs_t *) p_hal->regs);
__RETURN:
return ret = MPP_OK;
__FAILED:
return ret;
}
And then using the RKMPP backend of MPV to read an H264 movie.
My modified copy of RKMPP to perform the snapshots is available here :
https://github.com/Miouyouyou/rkmpp-reverse-engineering