On Mon, Oct 24, 2016 at 08:53:04AM +0200, Daniel Vetter wrote: > On Mon, Oct 24, 2016 at 11:58:00AM +0530, Archit Taneja wrote: > > On 10/22/2016 03:25 PM, Russell King - ARM Linux wrote: > > > Looking at drm_bridge_disable() and drm_bridge_enable(), the control > > > model appears to be: > > > > > > crtc -> encoder -> connector > > > `-> bridge > > > `-> bridge > > > `-> bridge > > > > > > Bridges are always attached to an encoder, and there can be multiple > > > bridges attached to one encoder. Bridges can't be attached to the > > > connector. > > In helpers connectors are no-op objects. We _never_ call any connector > callbacks when doing a modeset. Connectors are only used to probe output > state, and as the userspace-visisble endpoint representation. Hence the > real graph is > > crtc -> encoder [ -> bridge [ -> bridge [...]]] -> connector > > with the last bridge owning the connector. And that last bridge probably > needs to store a pointer to its connector(s). That model can't work for TDA998x if the TDA998x is followed by another "bridge" (eg, to convert the TDMS signals to something else) unless there's some way to tell a bridge that it isn't the last in the chain. However, my graph is accurate as it's reflecting the software modelling - it reflects how the various objects are bound together in DRM. The DRM encoder has pointers to the DRM bridge, which has a pointer to the next DRM bridge. The DRM connector doesn't have any pointers to the connector, only to the DRM encoder. So, DRM bridges are childs of the encoder, and the encoder (and attached encoder bridge chain) can be selected by the DRM connector. However, you are correct that for different "tasks" like mode setting, or output probing, the representation is somewhat different - that's not really what I was talking about though, and I certainly was not talking about the userspace representation. What I'm 100% concerned about is how this stuff looks in kernel space and what the driver(s) should look like. > > > So, in the case of TDA998x, we end up with the TDA998x providing a > > > connector, because it has connector functionality, and providing a > > > bridge. The encoder is left to the KMS driver, which adds additional > > > complexity (100+ lines) to each and every KMS driver, requiring the > > > KMS driver to have much more knowledge of what's attached to the > > > "CRTC", so it can create these encoders itself. I still think this > > > is a backwards step - maybe one step forwards, two backwards. > > We've stubbed out everything that's in an encoder, you definitely don't > need hundreds of lines to write one any more. If there's still a callback > left around drm_encoder which has not yet suitable default handling, then > that's a bug. Sorry, but I do need exactly what I've written above, I can talk rather definitively because I've actually got the code in front of me. Most of the additional lines is due to the complexity added to the KMS driver to locate (actually for a third time) all the components in the system, specifically parsing the DT tree to find the "encoders" (or rather the TDA998x in this case), creating the DRM encoder objects, and binding the TDA998x bridge. Here's the _exact_ diffstat for the hacky conversion so far (including something like the 10 patches I posted last weekend, which haven't had any comments yet): drivers/gpu/drm/armada/armada_drv.c | 125 +++++ drivers/gpu/drm/i2c/tda998x_drv.c | 904 +++++++++++++++++------------------- 2 files changed, 560 insertions(+), 469 deletions(-) The actual bridge conversion on its own is: drivers/gpu/drm/armada/armada_drv.c | 125 ++++++++++++++++++++++++++++++++ drivers/gpu/drm/i2c/tda998x_drv.c | 139 ++++++++++++++---------------------- 2 files changed, 180 insertions(+), 84 deletions(-) > Note: Only applies to atomic though, I'm not going to bother with old > legacy crtc helpers. I guess armada needs to switch to atomic, otherwise > encoders are indeed a bit a pain. That's not going to happen - and you know exactly why that's not going to happen - I've tried to do it and it failed misterably with all sorts of problems. The idea that it can be done piecemeal as per your guide is a falacy - it can't be. There is no progressive way to do a conversion. It seems that KMS drivers need to be rewritten from scratch, and that means there is a high risk of introducing lots of new bugs. I'm just not prepared to go through that - I'd much rather have a stable kernel driver that actually works than spend the next six months rewriting and debugging stuff just for the latest ideas about how stuff should be done. _OR_ there could be more help from DRM to ease the transition pain from non-atomic to atomic KMS drivers, so that it can be done in appropriately sized steps, so that the driver can be adequately tested to ensure that things don't totally fall apart... you know, like imx-drm has gone from being a stable driver to keep on falling apart now that it's been converted to atomic modeset. > Imo encoders should be that part which is baked into your core ip. If > there's nothing, then you're perfectly fine with a no-op encoder. >From my point of view, the TDA998x _is_ an encoder - it takes RGB and sync signals, and encodes them into the TDMS format for DVI or HDMI. I guess what I call an encoder is not what DRM calls an encoder though. What's in the Dove is effectively a pair of CRTCs, some muxes, a set of VGA DACs and a parallel RGB bus with pixel clock and sync signals. Apart from the VGA DACs (which aren't used in the TDA998x path) it's pretty hard to imagine what piece of hardware could be called an encoder. So what does the DRM encoder represent, hardware wise in this case? As I say, in my mind, the TDA998x _is_ the encoder. > Maybe we > could do a helper for creating those, if the few lines are copypasted too > often. Then all the external IP should be bridges (and chained). And with > chains either you need another bridge, or you're the last bridge, and then > you're supposed to register the connector as the final endpoint. Let me repeat: the "DRM connector" is part of the TDA998x - the TDA998x provides the EDID reading capabilities, and the connection detection capabilities. It also provides the CEC communication capabilities as well, but that's not too relevant to this discussion, apart from illustrating that it's an all-in-one single chip solution to providing a full HDMI source implementation. The TDA998x is not a stand-alone "bridge" which just _encodes_ a parallel RGB bus to TDMS signals, it's much more than that. That's why I'm saying we can't separate out the connector functionality from the encoder functionality. > > I do agree that it's a step backward that we now have to search for > > a corresponding bridge, which we didn't have to do when the chip > > was represented as an encoder. > > You can still do the exact same thing with bridges as with encoders using > the component framework. Should not be a step back at all. Sorry, no you can't at the moment. As I've already said, grep for "bridge_list". Read the code in drivers/gpu/drm/drm_bridge.c, and notice that there's two places that this list is accessed: 1. inside drm_bridge_add() 2. inside of_drm_find_bridge() which is only available when CONFIG_OF is enabled, and requires a DT struct device_node pointer to perform the lookup. struct device_node's do not exist without DT. > > > There's another issue with TDA998x - we now have audio support in > > > TDA998x, and converting TDA998x to be a DRM bridge introduces some > > > fundamental (and as yet unsolved) races between the ASoC code and > > > the attachment of the DRM bridge to the DRM encoder, and the detachment > > > when the DRM bridge/connector gets cleaned up. Right now, there's no > > > bridge callback when the encoder or drm_device goes away, so doing > > > stuff like: > > > > > > static int tda998x_audio_get_eld(struct device *dev, void *data, > > > uint8_t *buf, size_t len) > > > { > > > struct tda998x_priv *priv = dev_get_drvdata(dev); > > > struct drm_mode_config *config; > > > struct drm_connector *connector; > > > int ret = -ENODEV; > > > > > > /* FIXME: This is racy */ > > > if (!priv->bridge.encoder || !priv->bridge.encoder->dev) > > > return ret; > > > > > > config = &priv->bridge.encoder->dev->mode_config; > > > > > > mutex_lock(&config->mutex); > > > list_for_each_entry(connector, &config->connector_list, head) { > > > if (priv->bridge.encoder == connector->encoder) { > > > memcpy(buf, connector->eld, > > > min(sizeof(connector->eld), len)); > > > ret = 0; > > > } > > > } > > > mutex_unlock(&config->mutex); > > > > > > feels very unsafe - nothing really guarantees the validity of the > > > priv->bridge.encoder or priv->bridge.encoder->dev pointers. The > > > config->mutex lock does nothing to solve this. The same problem > > > also exists in tda998x_audio_hw_params(). > > > > Maybe we could ensure that the bridge attachment/detachment is > > contained within drm_encoder_init/cleanup funcs, so that their > > life is tied to the encoder drm_mode_object. It wouldn't be as > > straightforward, since the drm_bridges create connectors too. > > Will look more into this. > > I don't see any issue with the above at all. Or well, if there is one > there's a larger issue: You can't reach this code if you unregister your > driver's interface _before_ you tear down anything. This is fixed by > getting rid of the load/unload callbacks. And for additional interfaces > there's new register/unregister callbacks on connectors (which the bridge > also should own). That's easy to say if you're into the "lets rewrite everything all at the same time" mentality, which from your response I think sums up your position on everything from atomic mode set to this problem. Sorry, I really hate the rewrite mentality, that's not good programming practice, especially when existing implementations work. What's instead required are a series of incremental steps to effect the full outcome, especially when multiple drivers are involved. If you look at the problems surrounding the removal of the drm_connector_register() from TDA998x, you'll see why this is important: it's not the drivers _with_ the mid-layer that's a problem here, but those which were converted prematurely, or written without using the mid-layer that are blocking the removal of drm_connector_register(). And the removal of drm_connector_register() from TDA998x blocks the removal of the mid-layer from armada, because removing the mid-layer _now_ causes the kernel to WARN - I know, I've tried it already: [ 1.933854] WARNING: CPU: 0 PID: 13 at /home/rmk/git/linux-cubox/lib/kobject.c:244 kobject_add_internal+0xfc/0x2d8 [ 1.944286] kobject_add_internal failed for card0-HDMI-A-1 (error: -2 parent: card0) But... the mid-layer issue you raise is a complete red herring, the race has absolutely nothing to do with that. What causes the race is that during the KMS driver's probing, we get to the point where tda998x_bind() is called. This registers the DRM bridge so that the KMS driver can later find and attach to the bridge. However, just before creating the DRM bridge, it also creates a platform device for the audio codec side. As soon as that platform device is registered, ASoC is free to bind the audio subsystem and make it available to userspace. This means that any of the tda998x_audio_* functions are able to be called from the point that this platform device is registered. At this point, priv->bridge.encoder will be NULL, which means that some of the tda998x_audio_* functions should fail due to that. KMS driver initialisation can continue, writing the various pointers, and that happens without locking - but at that stage, it should only be going from NULL pointers to non-NULL pointers pointing at valid memory. However, there are no barriers to ensure that the various writes occur in the expected order (we're talking about writes in the KMS driver being visible to reads in the TDA998x audio side, possibly by another CPU, so locking isn't the answer - I can't see any way such a lock could be shared between TDA998x and various KMS drivers, or even some generic dummy DRM encoder helper. Barriers may be the answer, we need to ensure that encoder->dev is always valid before bridge->encoder is valid.) However, we need to also consider the initialisation failure and error clean up paths, assuming we have got this far - and that's where the worry is. drm_encoder_cleanup() memsets the entire encoder to zero. So, from the above, a compiler is perfectly at liberty to re-read the priv->bridge.encoder->dev pointer between these two statements: if (!priv->bridge.encoder || !priv->bridge.encoder->dev) return ret; config = &priv->bridge.encoder->dev->mode_config; and if such a re-read co-incides with the memset() in drm_encoder_cleanup() becoming visible, this is a possible oops waiting to happen. It gets worse if the KMS driver is responsible for freeing the DRM encoder that it created to attach to the TDA998x - if it frees that memory before tda998x_unbind() has been called, the audio subsystem will still be visible to userspace, and creates a potential use-after-free. So, none of this has anything what so ever to do with "is the KMS driver mid-layered or not" - this problem can exist irrespective of whether I have armada mid-layered or de-mid-layered. NB. It doesn't actually exist with armada, because armada is not used with the audio stuff on the cubox, we feed SPDIF to the TDA998x and let the TDA998x sort itself out, no audio codec is required there, but the point is that the complexities here are spread between TDA998x and associated KMS drivers - both have to be doing the right things for there not to be any subtle bugs here, and that is a really bad model. As I've already said, the problem does not exist as the driver stands in mainline today, only once it is converted to drm bridge, and it's purely down to the way the bridge code works. It is solvable, provided the connector remains part of TDA998x. So, like everything, we need to go through a series of steps to make these changes, and these steps need to happen in the right order, not as one huge great big lets-change-everything-at-once kind of approach. It's either going to take time, feeding changes into the kernel slowly, or it's going to need a lot of co-operation between different device driver authors, and sharing of stable commits between different git trees. Right now, the drm_connector_register() thing is basically blocking everything, and that needs to be handled in a way that's acceptable to all parties. The drm bridge conversion is something that can only happen once all the ducks are properly aligned - iow, drm_connector_register() gone, audio problems solved (eg, via the 10 patch series) and we have a way to convert TDA998x to a bridge without requiring every KMS user of TDA998x to simultaneously grow its own drm encoders. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel