Re: [PATCH 1/2][v3] gpiolib: allow simultaneous setting of multiple GPIO outputs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 27, 2014 at 1:14 PM, Rojhalat Ibrahim <imr@xxxxxxxxxxx> wrote:

> Introduce a new function gpiod_set_raw_array to the consumer interface which
> allows setting multiple outputs with just one function call. Also add an
> optional set_multiple function to the driver interface. Without an
> implementation of that function in the chip driver outputs are set
> sequentially.
>
> Implementing the set_multiple function in a chip driver allows for:
> - Improved performance for certain use cases. The original motivation for this
>   was the task of configuring an FPGA. In that specific case, where 9 GPIO
>   lines have to be set many times, configuration time goes down from 48 s to
>   19 s when using the new function.
> - Simultaneous glitch-free setting of multiple pins on any kind of parallel
>   bus attached to GPIOs provided they all reside on the same chip and bank.
>
> Limitations:
>   - Performance is only improved for normal high-low outputs. Open drain and
>     open source outputs are always set separately from each other. Those kinds
>     of outputs could probably be accelerated in a similar way if we could
>     forgo the error checking when setting GPIO directions.
>   - There is no gpiod_set_array function that regards the ACTIVE_LOW bits.
>     The raw function should be sufficient for many use cases. So I avoided
>     the code duplication the other functions would require.
>
> Signed-off-by: Rojhalat Ibrahim <imr@xxxxxxxxxxx>
> ---
> Change log:
>   v3: - add documentation
>       - change commit message
>   v2: - use descriptor interface
>       - allow arbitrary groups of GPIOs spanning multiple chips

This concept has been brought up before. Please keep Grant Likley in
the loop on these postings as he has some history with this.

As this is a speed optimization basically, do you have performance
numbers? Like how much quicker is your code in some critical
usecase as measured by perf or throughput whatever before and
after doing this?

Notice that the SPI bitbang driver in drivers/spi/spi-gpio.c was one
of the most important consumers of this API IIRC. So I'm CC:ing
Mark Brown.

> +The following two functions set the raw output values of an array of GPIOs:
> +
> +       void gpiod_set_raw_array(unsigned int array_size,
> +                                struct gpio_desc **desc_array,
> +                                int *value_array)
> +       void gpiod_set_raw_array_cansleep(unsigned int array_size,
> +                                         struct gpio_desc **desc_array,
> +                                         int *value_array)

So this interface looks quite nice.

I don't see why the complex loop in the raw setter
below cannot actually also handle non-raw (cooked? hehe)
values with some single-liner to invert the value if it
is active low.

> +static void _gpio_chip_set_multiple(struct gpio_chip *chip,

I hate _underscored function names, just call it
gpio_chip_set_multiple_offsets() or  whatever.

> +                                   u32 mask[ARCH_NR_GPIOS/32],
> +                                   u32 bits[ARCH_NR_GPIOS/32])
> +{

First: NO, you NEVER tie any code into ARCH_NR_GPIOS
because that is something we want to get rid of. Just use
regular pointers, add a count argument if need be.

> +       if (chip->set_multiple) {
> +               chip->set_multiple(chip, mask, bits);

Simple and quick OK. But even this is making a quite
nasty assumption...

> +       } else {
> +               int i;
> +               for (i = 0; i < ARCH_NR_GPIOS; i++) {

As I said you cannot depend on ARCH_NR_GPIOS and not
introduce more dependencies to it. Figure out something else.
Like using chip->ngpio to iterate.

> +                       if (i > chip->ngpio - 1)
> +                               break;
> +                       if (mask[i/32] == 0) {
> +                               /* skip ahead */
> +                               i = (i/32 + 1) * 32 - 1;
> +                               continue;
> +                       }
> +                       if (mask[i/32] & (1 << (i % 32))) {
> +                               chip->set(chip, i, (bits[i/32] >> (i % 32)) & 1);
> +                               mask[i/32] &= ~(1 << (i % 32));
> +                       }
> +               }
> +       }
> +}

This code is a bit convoluted but I *guess* the idea is to iterate
over the 32 bits in the argument word by word. Add some
comments describing what's going on.

Some chips have 8bit or 16bit registers actually but AFAICT
that is not what this is about, rather about the fact that the
function takes 32bit arguments.

> +static void _gpiod_set_raw_array(unsigned int array_size,

No underscored function name.

> +                                struct gpio_desc **desc_array,
> +                                int *value_array)
> +{
> +       struct gpio_chip *chip = desc_array[0]->chip;
> +       u32 mask[ARCH_NR_GPIOS/32];
> +       u32 bits[ARCH_NR_GPIOS/32];

Again dependency to ARCH_NR_GPIOS. Don't do it.

> +       int count = 0;
> +       int i;
> +
> +       memset(mask, 0, sizeof(mask));
> +       for (i = 0; i < array_size; i++) {
> +               struct gpio_desc *desc = desc_array[i];
> +               int hwgpio = gpio_chip_hwgpio(desc);
> +               int value = value_array[i];
> +
> +               /* another chip; push collected bits to outputs */
> +               if (desc->chip != chip) {

Clever logic!

> +                       if (count != 0) {
> +                               _gpio_chip_set_multiple(chip, mask, bits);
> +                               memset(mask, 0, sizeof(mask));
> +                               count = 0;
> +                       }
> +                       chip = desc->chip;
> +               }
> +               /* collect all normal outputs belonging to the same chip */
> +               /* open drain and open source outputs are set individually */
> +               trace_gpio_value(desc_to_gpio(desc), 0, value);
> +               if (test_bit(FLAG_OPEN_DRAIN, &desc->flags)) {
> +                       _gpio_set_open_drain_value(desc, value);
> +               } else if (test_bit(FLAG_OPEN_SOURCE, &desc->flags)) {
> +                       _gpio_set_open_source_value(desc, value);
> +               } else {
> +                       mask[hwgpio/32] |= (1 << (hwgpio % 32));
> +                       if (value) {
> +                               bits[hwgpio/32] |= (1 << (hwgpio % 32));
> +                       } else {
> +                               bits[hwgpio/32] &= ~(1 << (hwgpio % 32));
> +                       }
> +                       count++;
> +               }

This is also real smart and nice code.

The assumption is that this iteration hotloop will be quicker
to run, and the driver can take shortcuts improving performance.
As mentione above we need some proof of that as it complicates
the code a lot for this performance improvement.

> +       }
> +       if (count != 0) {
> +               _gpio_chip_set_multiple(chip, mask, bits);
> +       }
> +}
> +

(...)

>  static inline int gpiod_set_debounce(struct gpio_desc *desc, unsigned debounce)
>  {
> diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h
> index 1827b43..d7968c8 100644
> --- a/include/linux/gpio/driver.h
> +++ b/include/linux/gpio/driver.h
> @@ -32,6 +32,7 @@ struct seq_file;
>   * @get: returns value for signal "offset"; for output signals this
>   *     returns either the value actually sensed, or zero
>   * @set: assigns output value for signal "offset"
> + * @set_multiple: assigns output values for multiple signals defined by "mask"
>   * @set_debounce: optional hook for setting debounce time for specified gpio in
>   *      interrupt triggered gpio chips
>   * @to_irq: optional hook supporting non-static gpio_to_irq() mappings;
> @@ -84,6 +85,8 @@ struct gpio_chip {
>                                                 unsigned offset);
>         void                    (*set)(struct gpio_chip *chip,
>                                                 unsigned offset, int value);
> +       void                    (*set_multiple)(struct gpio_chip *chip,
> +                                               u32 *mask, u32 *bits);

This sets the chunk size to 32bits which means that any GPIO
controller with say 8bit registers will get a complex demuxing
scheme.

But thinking of it I think it's a reasonable compromise given that
so many GPIO drivers are actually 32bit.

However some of the real slow I2C expander GPIOs are 8bit.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-gpio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SPI]     [Linux Kernel]     [Linux ARM (vger)]     [Linux ARM MSM]     [Linux Omap]     [Linux Arm]     [Linux Tegra]     [Fedora ARM]     [Linux for Samsung SOC]     [eCos]     [Linux Fastboot]     [Gcc Help]     [Git]     [DCCP]     [IETF Announce]     [Security]     [Linux MIPS]     [Yosemite Campsites]

  Powered by Linux