Re: [PATCH v3 100/108] gpio: mvebu: Make use of devm_pwmchip_alloc() function

Uwe Kleine-König <u.kleine-koenig@xxxxxxxxxxxxxx> · Tue, 5 Dec 2023 10:31:45 +0100

Hello,

On Mon, Dec 04, 2023 at 10:28:15PM +0100, Bartosz Golaszewski wrote:
> On Mon, Dec 4, 2023 at 9:27 PM Bartosz Golaszewski <brgl@xxxxxxxx> wrote:
> >
> > On Sat, Dec 2, 2023 at 1:43 AM Uwe Kleine-König
> > <u.kleine-koenig@xxxxxxxxxxxxxx> wrote:
> > >
> > > On Fri, Dec 01, 2023 at 11:14:32AM +0100, Bartosz Golaszewski wrote:
> > > > On Tue, Nov 28, 2023 at 10:07 AM Uwe Kleine-König
> > > > <u.kleine-koenig@xxxxxxxxxxxxxx> wrote:
> > > > >
> > > >
> > > > [snip]
> > > >
> > > > >
> > > > > > I see the
> > > > > > chip->operational field that is set to false on release. In my
> > > > > > version, we just use a NULL-pointer to carry the same information.
> > > > >
> > > > > Yup, sounds obvious. Your usage of "just" sounds as if your variant was
> > > > > better. To give the alternative view where the suggested approach sounds
> > > > > better would be:
> > > > >
> > > > > You need a pointer and I "just" a bool that even has a name implying its
> > > > > function. You need to dereference the pointer in several places as the
> > > > > needed information is distributed over two structures while it's all
> > > > > together in a single struct for the usual foo_alloc() + foo_register()
> > > > > approach.
> > > > >
> > > >
> > > > There's another reason we do that. I'm no longer sure if I mentioned
> > > > it in my talk (I meant to anyway).
> > > >
> > > > In GPIO we have API functions that may be called from any context -
> > > > thus needing spinlocks for locking - but also driver callbacks that
> > > > may use mutexes internally or otherwise sleep. I don't know if this is
> > > > the case for PWM too but in GPIO we may end up in a situation where if
> > > > we used a spinlock to protect some kind of an "is_operational" field,
> > > > we'd end up sleeping with a spinlock taken and if we used a mutex, we
> > > > couldn't use API function from atomic contexts.
> > > >
> > > > This is the reason behind locking being so broken in GPIO at the
> > > > moment and why I'm trying to fix it this release cycle.
> > > >
> > > > Splitting the implementation into two structures and protecting the
> > > > pointer to the provider structure with SRCU has the benefit of not
> > > > limiting us in what locks we use underneath.
> > > >
> > > > Every subsystem has its own issues and we need to find something
> > > > generic enough to cover them all (or most of them anyway). I don't
> > > > think having a single structure cuts it.
> > >
> > > I'm convinced it works. I introduced a wrapper pwmchip_lock() that for
> > > now uses a mutex and once we have fast pwm_chips it uses a mutex for
> > > sleeping pwm_chips and a spinlock for the fast ones.
> > >
> > > That's similar to how struct irq_chip::irq_bus_lock works. For sleeping
> > > chips that callback uses a mutex, for fast chips a spinlock.
> > >
> >
> > Fair enough. I'd love to see a benchmark of what's faster one day
> > though: two structures with dereferencing and SRCU or one structure
> > with mutex/spinlock.

I think until the day has come that we have a SRCU+two and
mutex/spinlock+one implementation for one framework it's moot to discuss
which one is the faster, so I suggest we stop here. (Note, you can
already do mutex/spinlock+two already now. That's what I do for the
non-pure PWM drivers in the next iteration. Preview at
https://lore.kernel.org/linux-pwm/20231124215208.616551-4-u.kleine-koenig@xxxxxxxxxxxxxx/T/#u)
For me it's not so clear that SRCU is the winner. Also the winner might
vary depending on questions like:

 - How many PWM (or GPIO) lines does the chip in question expose?
 - Does the implementation of the callbacks need serialisation (because
   the bits for different lines are in common registers)?
 - Usage pattern (influencing the contention of the locks)

(But I adhere to my suggestion to stop now :-)

> Actually there is one thing that - while not technically wrong - makes
> the split solution better. In case of your abstracted lock, you find
> yourself in a very all-or-nothing locking situation, where all of the
> structure is locked or none is. With SRCU protecting just the pointer
> to implementation, we can easily factor that part out and leave
> whatever fine-grained locking is required to the subsystem.

The fine-grainedness of the locking scheme isn't fixed with my approach.

In fact you could just not use the offer to handle framework struct and
driver private data in a single memory chunk (and/or stop using the
knowledge that it is (or can be) a single chunk) and then the two
approaches are not fundamentally different and you can use the same
locking mechanisms.

The biggest difference between our approaches is that I handle
allocation of the framework struct and its registration in two steps in
the drivers while you do that in a single one.

My approach has the advantage that it allows to handle private data in
the same allocation and that the driver can fill in the framework struct
without the need for copying or pointer dereferencing if the framework
needs the driver provided information. Yours has the advantage that
drivers see less of the framework and so are better separated from the
core.

How you weight the different advantages is very subjective.

So if we rule out the subjective metrics we're left with: Both
approaches solve the technical challenge at hand (i.e. ensure unloading
a driver doesn't make the kernel crash if a character device is still
open) and my approach already exists for pwm.

> Additionally: the pointer to implementation has many readers but only
> one writer. I believe this to be the same for your "operational"
> field. I don't know the PWM code very well but I can only guess that
> the situation is similar, where subsystem data structures are read
> more often than they are modified and multiple readers could access
> the structure at the same time lowering latencies.

The lock serves to serialize access to .operational and ensures that the
driver doesn't go away until all callbacks have completed. Is this
serialized in your approach, too?
(If you don't, I wonder if you should. And if you do, I think this
better matches the use case spinlocks and mutexes are optimized for
compared to SRCU.)

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |
Attachment:
signature.asc

Description: PGP signature