Re: [ANN v2] Complex Camera Workshop - Tokyo - Jun, 19

Mauro Carvalho Chehab <mchehab+samsung@xxxxxxxxxx> · Wed, 6 Jun 2018 13:26:10 -0300

Em Wed, 6 Jun 2018 13:19:39 +0900
Tomasz Figa <tfiga@xxxxxxxxxxxx> escreveu:

> On Mon, Jun 4, 2018 at 10:33 PM Mauro Carvalho Chehab
> <mchehab+samsung@xxxxxxxxxx> wrote:
> >
> > Hi all,
> >
> > I consolidated hopefully all comments I receive on the past announcement
> > with regards to the complex camera workshop we're planning to happen in
> > Tokyo, just before the Open Source Summit in Japan.
> >
> > The main focus of the workshop is to allow supporting devices with MC-based
> > hardware connected to a camera.
> >
> > I'm enclosing a detailed description of the problem, in order to
> > allow the interested parties to be at the same page.
> >
> > We need to work towards an agenda for the meeting.
> >
> > From my side, I think we should have at least the following topics at
> > the agenda:
> >
> > - a quick review about what's currently at libv4l2;
> > - a presentation about PipeWire solution;
> > - a discussion with the requirements for the new solution;
> > - a discussion about how we'll address - who will do what.  
> 
> I believe Intel's Jian Xu would be able to give us some brief
> introduction to IPU3 hardware architecture and possibly also upcoming
> hardware generations as well.

That would be great!

> My experience with existing generations of ISPs from other vendors is
> that the main principles of operation are very similar to the model
> represented by IPU3 and very much different to the OMAP3 example
> mentioned by Mauro below. I further commented on it below.
> 
> >
> > Comments? Suggestions?
> >
> > Are there anyone else planning to either be there physically or via
> > Google Hangouts?
> >
> > Tomaz,
> >
> > Do you have any limit about the number of people that could join us
> > via Google Hangouts?
> >  
> 
> Technically, Hangouts should be able to work with really huge
> multi-party conferences. There is obviously some limitation on client
> side, since thumbnails of participants need to be decoded at real
> time, so even if the resolution is low, if the client is very slow,
> there might be some really bad frame drop happening on client side.
> 
> However, I often have meetings with around 8 parties and it tends to
> work fine. We can also disable video of all participants, who don't
> need to present anything at the moment and the problem would go away
> completely.

Ok, good!

> > Regards,
> > Mauro
> >
> > ---
> >
> > 1. Introduction
> > ===============
> >
> > 1.1 V4L2 Kernel aspects
> > -----------------------
> >
> > The media subsystem supports two types of devices:
> >
> > - "traditional" media hardware, supported via V4L2 API. On such hardware,
> >   opening a single device node (usually /dev/video0) is enough to control
> >   the entire device. We call it as devnode-based devices.
> >   An application sometimes may need to use multiple video nodes with
> >   devnode-based drivers to capture multiple streams in parallel
> >   (when the hardware allows it of course). That's quite common for
> >   Analog TV devices, where both /dev/video0 and /dev/vbi0 are opened
> >   at the same time.
> >
> > - Media-controller based devices. On those devices, there are typically
> >   several /dev/video? nodes and several /dev/v4l2-subdev? nodes, plus
> >   a media controller device node (usually /dev/media0).
> >   We call it as mc-based devices. Controlling the hardware require
> >   opening the media device (/dev/media0), setup the pipeline and adjust
> >   the sub-devices via /dev/v4l2-subdev?. Only streaming is controlled
> >   by /dev/video?.
> >
> > In other words, both configuration and streaming go through the video
> > device node on devnode-based drivers, while video device nodes are used
> > used for streaming on mc-based drivers.
> >
> > With devnode-based drivers, "standard" media applications, including open
> > source ones (Camorama, Cheese, Xawtv, Firefox, Chromium, ...) and closed
> > source ones (Skype, Chrome, ...) support devnode-based devices[1]. Also,
> > when just one media device is connected, the streaming/control device
> > is typically /dev/video0.
> >
> > [1] It should be noticed that closed-source applications tend to have
> > various bugs that prevent them from working properly on many devnode-based
> > devices. Due to that, some additional blocks were requred at libv4l to
> > support some of them. Skype is a good example, as we had to include a
> > software scaler in libv4l to make it happy. So in practice not everything
> > works smoothly with closed-source applications with devnode-based drivers.
> > A few such adjustments were also made on some drivers and/or libv4l, in
> > order to fulfill some open-source app requirements.
> >
> > Support for mc-based devices currently require an specialized application
> > in order to prepare the device for its usage (setup pipelines, adjust
> > hardware controls, etc). Once pipeline is set, the streaming goes via
> > /dev/video?, although usually some /dev/v4l2-subdev? devnodes should also
> > be opened, in order to implement algorithms designed to make video quality
> > reasonable.  
> 
> To further complicate the problem, on many modern imaging subsystems
> (Intel IPU3, Rockchip RKISP1), there is more than 1 video output
> (CAPTURE device), for example:
> 1) full resolution capture stream and
> 2) downscaled preview stream.

OMAP3 also has both full-res and downscaled for previews, on separate
/dev/video.

On a "simple" usecase (like PC apps and videoconference), just one
/dev/video is enough. For devices used to take static pictures,
both streams are important.

> Moreover, many ISPs also produce per-frame metadata (statistics) for
> 3A algorithms, which then produces per-frame metadata (parameters) for
> processing of next frame. These would be also exposed as /dev/video?
> nodes with respective V4L2_BUF_TYPE_META_* queues.

True, but the metadata frames don't need to go the the application,
as it will be consumed by the 3A logic.

> 
> It is complicated even more on systems with separate input (e.g. CSI2)
> and processing (ISP) hardware, such as Intel IPU3. In such case, the
> raw frames captured from the CSI2 interface directly are not usable
> for end-user applications. This means that some component in userspace
> needs to forward the raw frames to the ISP and only the output of the
> ISP can be passed to the application.

Yes. Nowadays, several devices do the same.

> > On such devices, it is not uncommon that the device used by the
> > application to be a random number (on OMAP3 driver, typically, is either
> > /dev/video4 or /dev/video6).
> >
> > One example of such hardware is at the OMAP3-based hardware:
> >
> >         http://www.infradead.org/~mchehab/mc-next-gen/omap3-igepv2-with-tvp5150.png
> >
> > On the picture, there's a graph with the hardware blocks in blue/dark/blue
> > and the corresponding devnode interfaces in yellow.
> >
> > The mc-based approach was taken when support for Nokia N9/N900 cameras
> > was added (with has OMAP3 SoC). It is required because the camera hardware
> > on SoC comes with a media processor (ISP), with does a lot more than just
> > capturing, allowing complex algorithms to enhance image quality in runtime.
> > Those algorithms are known as 3A - an acronym for 3 other acronyms:
> >
> >         - AE (Auto Exposure);
> >         - AF (Auto Focus);
> >         - AWB (Auto White Balance).
> >
> > The main reason that drove the MC design is that the 3A algorithms (that is
> > the 3A control loop, and sometimes part of the image processing itself) often
> > need to run, at least partially, on the CPU. As a kernel-space implementation
> > wasn't possible, we needed a lower-level UAPI.
> >
> > Setting a camera with such ISPs are harder because the pipelines to be
> > set actually depends the requirements for those 3A algorithms to run.
> > Also, usually, the 3A algorithms use some chipset-specific userspace API,
> > that exports some image properties, calculated by the ISP, to speed up
> > the convergence of those algorithms.
> >
> > Btw, usually, the 3A algorithms are IP-protected, provided by vendors
> > as binary only blobs, although there are a few OSS implementations.
> >
> > Part of the problem is that, so far, there isn't a proper userspace API
> > to implement 3A libraries. Once we have an userspace camera stack, we
> > hope that we'll gradually increase the number and quality of open-source
> > 3A stacks.
> >  
> [snip]
> >
> > 2.2 Modern hardware is starting to come with "complex" camera ISP
> > -----------------------------------------------------------------
> >
> > While mc-based devices were limited to SoC, it was easy to
> > "delegate" the task of talking with the hardware to the
> > embedded hardware designers.
> >
> > However, this is changing. Dell Latitude 5285 laptop is a standard
> > PC with an i3-core, i5-core or i7-core CPU, with comes with the
> > Intel IMU3 ISP hardware[2].  
> 
> IPU3 :)

Yeah, I noticed the typo too late. I actually fixed it at the announcement
at linuxtv.org.

> >
> > [2] https://www.spinics.net/lists/linux-usb/msg167478.html
> >
> > There, instead of an USB camera, the hardware is equipped with a
> > MC-based ISP, connected to its camera. Currently, despite having
> > a Kernel driver for it, the camera doesn't work with any
> > userspace application.
> >
> > I'm also aware of other projects that are considering the usage of
> > mc-based devices for non-dedicated hardware.
> >  
> [snip]
> >
> > 3.2 libv4l2 support for 3A algorithms
> > =====================================
> >
> > The 3A algorithm handing is highly dependent on the hardware. The
> > idea here is to allow libv4l to have a set of 3A algorithms that
> > will be specific to certain mc-based hardware.
> >
> > One requirement, if we want vendor stacks to use our solution, is that
> > it should allow allow external closed-source algorithms to run as well.
> >
> > The 3A library API must be standardized, to allow the closed-source
> > vendor implementation to be replaced by an open-source implementation
> > should someone have the time and energy (and qualifications) to write
> > one.
> >
> > Sandboxed execution of the 3A library must be possible as closed-source
> > can't always be blindly trusted. This includes the ability to wrap the
> > library in a daemon should the platform's multimedia stack wishes
> > and to avoid any direct access to the kernel devices by the 3A library
> > itself (all accesses should be marshaled by the camera stack).
> >
> > Please note that this daemon is *not* a camera daemon that would
> > communicates with the V4L2 driver through a custom back channel.
> >
> > The decision to run the 3A library in a sandboxed process or to call
> > it directly from the camera stack should be left to the camera stack
> > and to the platform integrator, and should not be visible by the 3A
> > library.
> >
> > The 3A library must be usable on major Linux-based camera stacks (the
> > Android and Chrome OS camera HALs are certainly important targets,
> > more can be added) unmodified, which will allow usage of the vendor
> > binary provided for Chrome OS or Android on regular Linux systems.  
> 
> This is quite an interesting idea and it would be really useful if it
> could be done. I'm kind of worried, though, about Android in
> particular, since the execution environment in Android differs
> significantly from a regular Linux distributions (including Chrome OS,
> which is not so far from such), namely:
> - different libc (bionic) and dynamic linker - I guess this could be
> solved by static linking?

Static link is one possible solution. IMHO, we should try to make it
use just a C library (if possible) and be sure that it will also compile
with bionic/ulibc in order to make it easier to be used by Android and
other embedded distros.

> - dedicated toolchains - perhaps not much of a problem if the per-arch
> ABI is the same?

Depending on library dependency, we could likely make it work with more
than one toolchain. I guess acconfig works with Android, right?
If so, it could auto-adjust to the different toolchains everywhere.

> > It would make sense to design a modular camera stack, and try to make
> > most components as platform-independent as possible. This should include:
> >
> > - the kernel drivers (V4L2-compliant and usable without any closed-source
> >   userspace component);
> > - the 3A library
> > - any other component that could be shared (for instance a possible
> >   request API library).
> >
> > The rest of the code will mostly be glue around those components to
> > integrate them in a particular camera stack, and should be as
> > platform-agnostic as possible.
> >
> > In the case of the Android camera HAL, ideally it would be a glue that
> > could be used with different camera vendors (probably with some kind of
> > vendor-specific configuration, or possibly with a separate vendor-specific
> > component to handle pipeline configuration).
> >
> > 4 Complex camera workshop
> > =========================
> >
> > The workshop will be happening in Tokyo, Japan, at Jun, 19, at the
> > google offices. The location is:
> >
> > 〒106-6126 Tokyo, Minato, Roppongi, 6 Chome−10−1 Roppongi Hills Mori Tower 44F  
> 
> Nearest station exits:
> - Hibiya line Roppongi station exit 1c (recommended)
> - Oedo line Roppongi station exit 3 (and few minutes walk)
> 
> >
> > 4.1 Physical Attendees
> > ======================
> >
> > Tomasz Figa <tfiga@xxxxxxxxxx>
> > Mauro Carvalho Chehab <Mauro Carvalho Chehab <mchehab+samsung@xxxxxxxxxx>
> > Kieran Bingham <kieran.bingham@xxxxxxxxxxxxxxxx>
> > Laurent Pinchart <laurent.pinchart@xxxxxxxxxxxxxxxx>
> > Niklas Söderlund <niklas.soderlund@xxxxxxxxxxxx>
> > Zheng, Jian Xu Zheng <jian.xu.zheng@xxxxxxxxx>
> >
> > Anywone else?  
> 
> Looking at latest reply in this thread:
> 
> jacopo mondi <jacopo@xxxxxxxxxx>
> 
> Anyone else, please tell me beforehand (at least 1-2 days before), as
> I need to take care of building access, since it's a multi-tenant
> office building. I'll contact each attendee separately with further
> details by email.
> 
> >
> > 4.2. Attendees Via Google Hangouts
> > ==================================
> >
> > Hans Verkuil <hverkuil@xxxxxxxxx> - Via Google Hangouts - maybe only on afternoon
> > Javier Martinez Canillas <javier@xxxxxxxxxxxx> - Via Google Hangouts - only on reasonable TZ-compatible-hours  
> 
> What time zone would that be? I guess we could try to tweak the agenda
> to take this into account.

Thanks,
Mauro