RFC MPEG encoding and decoding V4L2/DVB API additions Version 0.2 (The latest version of this RFC can be found here as well: http://ivtvdriver.org/viewcvs/ivtv/trunk/doc/) This RFC adds new functionality to the V4L2/DVB API in order to properly support MPEG hardware encoders and decoders. This is mostly driven by the work to get the ivtv driver (www.ivtvdriver.org) into the kernel, but it can also benefit other hardware encoders and decoders. Which is why this RFC is cross-posted to the dxr3-devel mailinglist as well. A general note: while MPEG-1/2/4 is currently the codec most often found, this RFC should also work for other compressed-stream format, possibly with some later additions. This RFC only deals with the encoding and decoding part. The cx23415 also supports and On-Screen Display (OSD). Another RFC will appear for that later. I need to do some more research on that first before I can issue that. This RFC is divided into several sections. The first section describes a few additional MPEG compression controls. It is followed by a description of the new MPEG Index functionality. Then a description is given of the actual MPEG encoding commands (start, stop, pause, resume) and how to handle timing information. This is followed by a description of the MPEG decoding API, in particular how the DVB decoding API maps to what is needed for the ivtv driver, and how it can be extended to support the functionality of the driver. Part I: MPEG encoding ===================== This API has been reviewed by Mauro and his suggestions have been incorporated. As far as I am concerned this is pretty much the definitive API as far as MPEG encoding is concerned. MPEG compression controls ------------------------- V4L2_CID_MPEG_VIDEO_MUTE Type: integer Description: Mutes the video to a fixed color when capturing. This is useful for testing as it creates a fixed and reproducable video bitstream. The supplied 32-bit integer has the following value: 0 '0'=video not muted '1'=video muted, creates frames with the YUV color defined below 1:7 Unused, set to 0. 8:15 V chrominance information 16:23 U chrominance information 24:31 Y luminance information V4L2_CID_MPEG_AUDIO_MUTE Type: bool Description: Mutes the audio when capturing. This is not done by muting audio hardware, which can still produce a slight hiss, but in the encoder itself, guaranteeing a fixed and reproducable audio bitstream. 0 = unmuted, 1 = muted. V4L2_CID_MPEG_CX2341X_STREAM_INSERT_NAV_PACKETS Type: bool Description: this control is specific to the CX23415/6. If set, then it enables navigation pack insertion for DVD. To be precise: it adds 0xbf (private stream 2) packets to the MPEG. The size of these packets is 2048 bytes (including the 6-byte header). The payload is zeroed and it is up to the application to fill them in. These packets are inserted every four frames. 0 = do not insert, 1 = insert DVD navigation packets. MPEG Index ---------- #define V4L2_ENC_IDX_FRAME_I (0) #define V4L2_ENC_IDX_FRAME_P (1) #define V4L2_ENC_IDX_FRAME_B (2) #define V4L2_ENC_IDX_FRAME_MASK (0xf) struct v4l2_enc_idx_entry { u64 offset; u64 pts; u32 length; u32 flags; u32 reserved[2]; }; #define V4L2_ENC_IDX_ENTRIES (64) struct v4l2_enc_idx { u32 entries; u32 entries_cap; u32 reserved[4]; struct v4l2_enc_idx_entry entry[V4L2_ENC_IDX_ENTRIES]; }; #define VIDIOC_G_ENC_INDEX _IOR('V', 64, struct v4l2_enc_idx) Return MPEG stream indices. I.e. at the given offset a frame starts (P/I/B according to the flags) and with the given PTS (Presentation Time Stamp) and length. The offset may never exceed the number of bytes actually read. I.e. it should never return 'future events'. 'entries' is the number of entries filled in the entry array. 'entries_cap' is the capacity of the index in the driver. This may be larger or smalled than V4L2_ENC_IDX_ENTRIES. 'entries' will always be less or equal to min(entries_cap, V4L2_ENC_IDX_ENTRIES). If this ioctl is called when no capture is in progress, then 'entries' is 0 and 'entries_cap' should be set to the capacity. This way applications can check beforehand how frequently the index should be obtained. MPEG Encoding commands ---------------------- #define V4L2_ENC_CMD_START (0) #define V4L2_ENC_CMD_STOP (1) #define V4L2_ENC_CMD_PAUSE (2) #define V4L2_ENC_CMD_RESUME (3) /* Flags for V4L2_ENC_CMD_STOP */ #define V4L2_ENC_CMD_STOP_AT_GOP_END (1 << 0) struct v4l2_encoder_cmd { __u32 cmd; __u32 flags; union { struct { __u32 data[8]; } raw; }; }; #define VIDIOC_ENCODER_CMD _IORW('V', 69, struct v4l2_encoder_cmd) #define VIDIOC_TRY_ENCODER_CMD _IORW('V', 69, struct v4l2_encoder_cmd) Before calling this ioctl the unused fields of v4l2_encoder_cmd must be zeroed. 'cmd' is set by the user and is the command for the encoder. 'flags' is currently only used by the STOP command and contains one bit: If V4L2_ENC_CMD_STOP_AT_GOP_END is set, then the capture continues until the end of the GOP, otherwise it stops immediately. These ioctl wills check whether the command is supported (-EINVAL is returned if not) and modify any arguments if needed to make it a valid call for the available hardware. The modified arguments are returned. The VIDIOC_TRY_ENCODER_CMD is identical to VIDIOC_ENCODER_CMD, except that the TRY ioctl does not actually execute the command. Note that a read() to a stopped encoder implies a V4L2_ENC_CMD_START. A close() of an encoder that is currently encoding implies an immediate V4L2_ENC_CMD_STOP. When the encoder has no more pending data after issuing a STOP the read() call will return 0 to indicate that the encoder has stopped. The next read will start the encoder again. MPEG Timing ----------- The dvb API contains two ioctls: AUDIO_GET_PTS and VIDEO_GET_PTS. For the conexant chips the way to obtain PTS values during MPEG encoding is through the VIDIOC_G_ENC_INDEX ioctl. The only time when the PTS is needed in ivtv is when capturing raw PCM and YUV. Since these two raw streams are not in sync you need the actual PTS value from each in order to synchronize them. For that you can use the dvb API. The PCM device will change anyway to an ALSA device in the future. And this feature is of very limited interest. Part II: MPEG decoding ====================== For MPEG decoding there is a DVB API available (media/video.h). After researching this API it's become clear that it can be used for most of the ivtv functionality. Especially if some small additions can be made. This has been discussed with Mauro, but needs review from Ralph Metzler and Mauro. MPEG Decoding commands ---------------------- In this section I will examine how to implement the decoding functionality of the conexant cx24315 in terms of the DVB API, and what, if any, additions to that API are needed to support it fully. 1) Start decoding Use VIDEO_PLAY (but see item 5, Speed control, for extra changes). 2) Stop decoding Use VIDEO_STOP. The cx23415 can keep showing the last frame or go to black. That can be implemented by VIDEO_SET_BLANK. However, I would suggest an addition to VIDEO_STOP: pass a STOP_TO_BLACK flag as argument, that puts this setting in the place where it belongs, instead of requiring the application to keep track of the previous SET_BLANK setting. ivtv currently uses a similar mechanism as SET_BLANK and it is very awkward to work with. In practice you have to first call SET_BLANK, followed by STOP to be sure you have the correct BLANK setting. ivtv also has an option to wait until the decoder has finished with all pending MPEG data. This can be perfectly implemented using the EVENT mechanism. All that is needed is a new event: VIDEO_EVENT_DECODER_STOPPED. You can select() or poll() on that, and it is much better than my original proposal. Finally, you can specify a PTS value at which the decoder should stop. There is currently no way of doing that in the DVB API. One option might be to add a VIDEO_S_PTS and add a USE_PTS flag that can be specified with VIDEO_STOP. Not terribly elegant, though. A VIDEO_STOP_AT_PTS ioctl might be better. 3) Pause decoding Use VIDEO_FREEZE. The cx23415 can keep showing the last frame or go to black. That can be implemented by VIDEO_SET_BLANK. However, I would suggest an addition to VIDEO_FREEZE: pass a PAUSE_TO_BLACK flag as argument, that puts this setting in the place where it belongs. 4) Resume decoding Use VIDEO_CONTINUE. 5) Speed control. The DVB API has two relevant ioctls: VIDEO_FAST_FORWARD and VIDEO_SLOWMOTION. Currently the argument of these ioctls is ignored in the av7110 implementation. The cx23415 can do fast forward and backward at 1.5 and 2x normal speed, and slow motion at various speeds. It can also single step forwards or backwards. Furthermore it can specify whether audio should be muted or not (only relevant for 1.5x normal speed). My suggestion would be to follow the DVB_VIDEO_PLAY ioctl as proposed in the DVB V4 API document: the VIDEO_PLAY argument would be interpreted as follows: speed == 0 || speed == 1000: normal speed speed == 1: single step forward speed == -1: single step backward 1 < speed < 1000: slow forward speed > 1000: fast forward speed == -1000: reverse play at normal speed -1000 < speed < -1: slow reverse speed < -1000: fast reverse. This change implies that it is possible to call VIDEO_PLAY when already playing in order to change the speed/direction. VIDEO_PLAY will map the speed to the closest speed setting possible. It will return an error if the requested functionality is not possible (e.g. if no reverse playback is supported, or if there is no single step). Should VIDEO_GET_CAPABILITIES return which of the above speed combinations are possible? A method of retrieving the actual speed would also be nice. Unfortunately, struct video_status has no room for additional fields. The audio mute could be implemented through AUDIO_MUTE. It has a similar problem as the STOP_TO_BLACK flag in that it really belongs to the VIDEO_PLAY ioctl as an atomic action. 6) Passthrough The Passthrough feature of the cx23415 does the following: if the passthrough mode is started then the video/audio input from the MPEG encoder is routed straight to the video/audio output. This is done internally in the cx23415. While Passthrough is on, it is still possible to record from the input at the same time. It's basically live TV functionality. For this the VIDEO_SELECT_SOURCE is actually a good choice, provided I can add VIDEO_SOURCE_ENCODER as new source to the video_stream_source_t enum. It think the current _DEMUX source has not quite the same meaning. I might be wrong on that, though. 7) Timing information on the displayed frame Use VIDEO_GET_PTS. There is current no method of retrieving the SCR/PCR clock, though. But I don't think anyone is using that. More problematic is that MythTV is using the frame counter (i.e. how many frames have been played back since the start of the stream). For that I would need a VIDEO_GET_FRAME_COUNT. 8) Wait for next frame to be displayed Several applications need to know when a new frame is displayed. This usually triggers some On Screen Display update or something like that. This too is easy to implement using event. All that is needed is a new event VIDEO_EVENT_DECODER_VSYNC. 9) Audio mode selection The cx23415 allows automatic selection of the audio mode (stereo, left, right, mono or swapped channels) for both a normal stereo capture and a bilingual capture. The AUDIO_CHANNEL_SELECT ioctl comes close. If the audio_channel_select_t enum was extended with AUDIO_MONO and AUDIO_STEREO_SWAPPED and a AUDIO_BILINGUAL_CHANNEL_SELECT ioctl was added, then this would fully implement this feature. An alternative approach is if AUDIO_CHANNEL_SELECT received a bitmask, e.g. the low 8 bits is the channel select for a stereo MPEG, and bits 8-15 is the channel select for a bilingual MPEG. 10) Scaling and positioning of the video The cx23415 can take the MPEG stream and scale it to an arbitrary width and height and position it at anywhere in the TV-out screen. So you can get effects like having the MPEG output to the top left corner and an OSD in the lower right corner. With VIDIOC_S_FMT I can set the width and height, but there is no provision for an x and y coordinate. Can the struct v4l2_pix_format be expanded to include this? It would be the logical place for it. For most devices the x and y would always to 0, so I don't think it would be a problem. This concludes this RFC. Comments are welcome! Regards, Hans Verkuil _______________________________________________ linux-dvb mailing list linux-dvb@xxxxxxxxxxx http://www.linuxtv.org/cgi-bin/mailman/listinfo/linux-dvb