RFC MPEG encoding and decoding V4L2/DVB API additions Version 0.3 (This third revision incorporates the comments and suggestions that resulted from discussing this RFC with Ralph Metzler. This should be pretty much the final version. I hope.) This RFC adds new functionality to the V4L2/DVB API in order to properly support MPEG hardware encoders and decoders. This is mostly driven by the work to get the ivtv driver (www.ivtvdriver.org) into the kernel, but it can also benefit other hardware encoders and decoders. Which is why this RFC is cross-posted to the dxr3-devel mailinglist as well. A general note: while MPEG-1/2/4 is currently the codec most often found, this RFC should also work for other compressed-stream format, possibly with some later additions. This RFC only deals with the encoding and decoding part. The cx23415 also supports and On-Screen Display (OSD). Another RFC will appear for that later. I need to do some more research on that first before I can issue that. This RFC is divided into several sections. The first section describes a few additional MPEG compression controls. It is followed by a description of the new MPEG Index functionality. Then a description is given of the actual MPEG encoding commands (start, stop, pause, resume) and how to handle timing information. This is followed by a description of the MPEG decoding API, in particular how the DVB decoding API maps to what is needed for the ivtv driver, and how it can be extended to support the functionality of the driver. Changes since 0.2: Added VIDEO_COMMAND and VIDEO_TRY_COMMAND. Part I: MPEG encoding ===================== This API has been reviewed by Mauro and his suggestions have been incorporated. As far as I am concerned this is pretty much the definitive API as far as MPEG encoding is concerned. MPEG compression controls ------------------------- V4L2_CID_MPEG_VIDEO_MUTE Type: integer Description: Mutes the video to a fixed color when capturing. This is useful for testing as it creates a fixed and reproducable video bitstream. The supplied 32-bit integer has the following value: 0 '0'=video not muted '1'=video muted, creates frames with the YUV color defined below 1:7 Unused, set to 0. 8:15 V chrominance information 16:23 U chrominance information 24:31 Y luminance information V4L2_CID_MPEG_AUDIO_MUTE Type: bool Description: Mutes the audio when capturing. This is not done by muting audio hardware, which can still produce a slight hiss, but in the encoder itself, guaranteeing a fixed and reproducable audio bitstream. 0 = unmuted, 1 = muted. V4L2_CID_MPEG_CX2341X_STREAM_INSERT_NAV_PACKETS Type: bool Description: this control is specific to the CX23415/6. If set, then it enables navigation pack insertion for DVD. To be precise: it adds 0xbf (private stream 2) packets to the MPEG. The size of these packets is 2048 bytes (including the 6-byte header). The payload is zeroed and it is up to the application to fill them in. These packets are inserted every four frames. 0 = do not insert, 1 = insert DVD navigation packets. MPEG Index ---------- #define V4L2_ENC_IDX_FRAME_I (0) #define V4L2_ENC_IDX_FRAME_P (1) #define V4L2_ENC_IDX_FRAME_B (2) #define V4L2_ENC_IDX_FRAME_MASK (0xf) struct v4l2_enc_idx_entry { u64 offset; u64 pts; u32 length; u32 flags; u32 reserved[2]; }; #define V4L2_ENC_IDX_ENTRIES (64) struct v4l2_enc_idx { u32 entries; u32 entries_cap; u32 reserved[4]; struct v4l2_enc_idx_entry entry[V4L2_ENC_IDX_ENTRIES]; }; #define VIDIOC_G_ENC_INDEX _IOR('V', 64, struct v4l2_enc_idx) Return MPEG stream indices. I.e. at the given offset a frame starts (P/I/B according to the flags) and with the given PTS (Presentation Time Stamp) and length. The offset may never exceed the number of bytes actually read. I.e. it should never return 'future events'. 'entries' is the number of entries filled in the entry array. 'entries_cap' is the capacity of the index in the driver. This may be larger or smalled than V4L2_ENC_IDX_ENTRIES. 'entries' will always be less or equal to min(entries_cap, V4L2_ENC_IDX_ENTRIES). If this ioctl is called when no capture is in progress, then 'entries' is 0 and 'entries_cap' should be set to the capacity. This way applications can check beforehand how frequently the index should be obtained. MPEG Encoding commands ---------------------- #define V4L2_ENC_CMD_START (0) #define V4L2_ENC_CMD_STOP (1) #define V4L2_ENC_CMD_PAUSE (2) #define V4L2_ENC_CMD_RESUME (3) /* Flags for V4L2_ENC_CMD_STOP */ #define V4L2_ENC_CMD_STOP_AT_GOP_END (1 << 0) struct v4l2_encoder_cmd { __u32 cmd; __u32 flags; union { struct { __u32 data[8]; } raw; }; }; #define VIDIOC_ENCODER_CMD _IORW('V', 69, struct v4l2_encoder_cmd) #define VIDIOC_TRY_ENCODER_CMD _IORW('V', 69, struct v4l2_encoder_cmd) Before calling this ioctl the unused fields of v4l2_encoder_cmd must be zeroed. 'cmd' is set by the user and is the command for the encoder. 'flags' is currently only used by the STOP command and contains one bit: If V4L2_ENC_CMD_STOP_AT_GOP_END is set, then the capture continues until the end of the GOP, otherwise it stops immediately. These ioctl wills check whether the command is supported (-EINVAL is returned if not) and modify any arguments if needed to make it a valid call for the available hardware. The modified arguments are returned. The VIDIOC_TRY_ENCODER_CMD is identical to VIDIOC_ENCODER_CMD, except that the TRY ioctl does not actually execute the command. Note that a read() to a stopped encoder implies a V4L2_ENC_CMD_START. A close() of an encoder that is currently encoding implies an immediate V4L2_ENC_CMD_STOP. When the encoder has no more pending data after issuing a STOP the read() call will return 0 to indicate that the encoder has stopped. The next read will start the encoder again. MPEG Timing ----------- The dvb API contains two ioctls: AUDIO_GET_PTS and VIDEO_GET_PTS. For the conexant chips the way to obtain PTS values during MPEG encoding is through the VIDIOC_G_ENC_INDEX ioctl. The only time when the PTS is needed in ivtv is when capturing raw PCM and YUV. Since these two raw streams are not in sync you need the actual PTS value from each in order to synchronize them. For that you can use the dvb API. The PCM device will change anyway to an ALSA device in the future. And this feature is of very limited interest. Part II: MPEG decoding ====================== For MPEG decoding there is a DVB API available (media/video.h). After researching this API it's become clear that it can be used for most of the ivtv functionality. Especially if some small additions can be made. Together with Ralph Metzler I arrived at the following additions: MPEG Decoding commands ---------------------- In this section I will examine how to implement the decoding functionality of the conexant cx23415 in terms of the DVB API, and what, if any, additions to that API are needed to support it fully. 1) Start/Stop/Pause/Resume decoding After discussing this with Ralph it became clear that it was best to add two new ioctls (as designed in the first version of this RFC) since the existing VIDEO_PLAY/STOP/FREEZE/CONTINUE did not provide the required functionality. The existing ioctls can still be used, but only do the simple action. For more refined control (and better support for future extensions) new VIDEO_COMMAND and VIDEO_TRY_COMMAND ioctls are added. This ensures that existing apps won't break, but that the cx23415 is still fully supported. Also future extensions are much easier. #define VIDEO_CMD_PLAY (0) #define VIDEO_CMD_STOP (1) #define VIDEO_CMD_FREEZE (2) #define VIDEO_CMD_CONTINUE (3) /* Flags for VIDEO_CMD_CONTINUE */ #define VIDEO_CMD_PAUSE_TO_BLACK (1 << 0) /* Flags for VIDEO_CMD_STOP */ #define VIDEO_CMD_STOP_TO_BLACK (1 << 0) /* Flags for VIDEO_CMD_PLAY */ #define VIDEO_CMD_PLAY_SPEED_MUTE_AUDIO (1 << 0) /* Play input formats: */ /* The decoder has no special format requirements */ #define VIDEO_PLAY_FMT_NONE (0) /* The decoder requires full GOPs */ #define VIDEO_PLAY_FMT_GOP (1) struct video_command { __u32 cmd; __u32 flags; union { struct { __u64 pts; } stop; struct { __u32 speed; __u32 format; } play; struct { __u32 data[16]; } raw; }; }; #define VIDEO_COMMAND _IORW('o', 58, struct video_command) #define VIDEO_TRY_COMMAND _IORW('o', 59, struct video_command) Before calling this ioctl the unused fields of video_command must be zeroed. 'cmd' is set by the user and is the command for the decoder. 'flags' is used by several commands: PAUSE and STOP can either leave the last frame or clear the output to black at the end depending on the specified flag. VIDEO_CMD_PLAY_SPEED_MUTE_AUDIO selects whether the audio should be muted when decoding at non-standard speed. Some extra arguments are available for specific commands: Stop can set the PTS it should stop at. If pts == 0, then the decoder stops accepting new data immediately. In order to wait until the decoder has finished a new event is added: VIDEO_EVENT_DECODER_STOPPED. You can select() or poll() on the video device to wait for an exception and use VIDEO_GET_EVENT to query it. This is valid for both the stop VIDEO_COMMAND and for the VIDEO_STOP ioctl. Play has a speed setting as extra argument. PLAY can be called again when already playing in order to change the speed. For the speed setting to the play command I suggest that the DVB_VIDEO_PLAY proposal from the DVB V4 API document is followed: the speed argument would be interpreted as follows: speed == 0 || speed == 1000: normal speed speed == 1: single step forward speed == -1: single step backward 1 < speed < 1000: slow forward speed > 1000: fast forward speed == -1000: reverse play at normal speed -1000 < speed < -1: slow reverse speed < -1000: fast reverse. The driver will return the closest actual speed that the driver can handle, together with the required input format. E.g. for reverse playback the cx23415 requires full GOPs, fed into the decoder in reverse order. An error is returned if the requested feature is completely unsupported (e.g. if the hardware cannot do single stepping or reverse playback). These ioctls will check whether the command is supported (-EINVAL is returned if not) and modify any arguments if needed to make it a valid call for the available hardware. The modified arguments are returned. The VIDEO_TRY_COMMAND is identical to VIDEO_COMMAND, except that the TRY ioctl does not actually execute the command. Note that a write() to a stopped decoder implies a VIDEO_CMD_PLAY. A close() of a decoder that is currently decoding implies an immediate VIDEO_CMD_STOP. When the decoder stops accepting data after issuing a STOP the write() call will return 0 to indicate that the decoder has stopped and accepts no more data. The next write will start the decoder again. 2) Passthrough The Passthrough feature of the cx23415 does the following: if the passthrough mode is started then the video/audio input from the MPEG encoder is routed straight to the video/audio output. This is done internally in the cx23415. While Passthrough is on, it is still possible to record from the input at the same time. It's basically live TV functionality. For this the VIDEO_SELECT_SOURCE is actually a good choice. Selecting VIDEO_SOURCE_DEMUX will select passthrough mode, selecting VIDEO_SOURCE_MEMORY will use MPEG/YUV input. 3) Timing information on the displayed frame Use VIDEO_GET_PTS. There is current no method of retrieving the SCR/PCR clock, though. But I don't think anyone is using that. In the future it might be possible to use DMX_GET_STC for this. More problematic is that MythTV is using the frame counter (i.e. how many frames have been played back since the start of the stream). For that I would need a VIDEO_GET_FRAME_COUNT ioctl: #define VIDEO_GET_FRAME_COUNT _IOR('o', 60, __u64) 4) Wait for next frame to be displayed Several applications need to know when a new frame is displayed. This usually triggers some On Screen Display update or something like that. This too is easy to implement using event. All that is needed is a new event VIDEO_EVENT_DECODER_VSYNC. 5) Audio mode selection The cx23415 allows automatic selection of the audio mode (stereo, left, right, mono or swapped channels) for both a normal stereo capture and a bilingual capture. The AUDIO_CHANNEL_SELECT ioctl comes close. If the audio_channel_select_t enum was extended with AUDIO_MONO and AUDIO_STEREO_SWAPPED and a AUDIO_BILINGUAL_CHANNEL_SELECT ioctl was added, then this would fully implement this feature. 6) Scaling and positioning of the video The cx23415 can take the MPEG stream and scale it to an arbitrary width and height and position it at anywhere in the TV-out screen. So you can get effects like having the MPEG output to the top left corner and an OSD in the lower right corner. With VIDIOC_S_FMT I can set the width and height, but there is no provision for an x and y coordinate. Can the struct v4l2_pix_format be expanded to include this? It would be the logical place for it. For most devices the x and y would always to 0, so I don't think it would be a problem. This concludes this RFC. Comments are welcome! Regards, Hans Verkuil _______________________________________________ linux-dvb mailing list linux-dvb@xxxxxxxxxxx http://www.linuxtv.org/cgi-bin/mailman/listinfo/linux-dvb