Re: [bug report] cgroup: use irqsave in cgroup_rstat_flush_locked().

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023-08-04 10:28:28 [+0300], Dan Carpenter wrote:
> Hello Sebastian Andrzej Siewior,
Hi Dan,

> The patch b1e2c8df0f00: "cgroup: use irqsave in
> cgroup_rstat_flush_locked()." from Mar 23, 2022 (linux-next), leads
> to the following Smatch static checker warning:
> 
> 	kernel/cgroup/rstat.c:212 cgroup_rstat_flush_locked()
> 	warn: mixing irqsave and irq
> 
> kernel/cgroup/rstat.c
>     174 static void cgroup_rstat_flush_locked(struct cgroup *cgrp)
>     175         __releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock)
>     176 {
>     177         int cpu;
>     178 
>     179         lockdep_assert_held(&cgroup_rstat_lock);
>     180 
>     181         for_each_possible_cpu(cpu) {
>     182                 raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock,
>     183                                                        cpu);
>     184                 struct cgroup *pos = NULL;
>     185                 unsigned long flags;
>     186 
>     187                 /*
>     188                  * The _irqsave() is needed because cgroup_rstat_lock is
>     189                  * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring
>     190                  * this lock with the _irq() suffix only disables interrupts on
>     191                  * a non-PREEMPT_RT kernel. The raw_spinlock_t below disables
>     192                  * interrupts on both configurations. The _irqsave() ensures
>     193                  * that interrupts are always disabled and later restored.
>     194                  */
>     195                 raw_spin_lock_irqsave(cpu_lock, flags);
> 
> There is obviously a giant comment explaining why irqsave is required
> instead of irq.

You have to notice that the caller used spinlock_t while here
raw_spinlock_t is used. Different lock types, same implementation/
behaviour on !PREEMPT_RT. With PREEMPT_RT enabled those two use a
different implementation and therefore lead to different behaviour.
Therefore the comment and the explanation.

>     196                 while ((pos = cgroup_rstat_cpu_pop_updated(pos, cgrp, cpu))) {
>     197                         struct cgroup_subsys_state *css;
>     198 
>     199                         cgroup_base_stat_flush(pos, cpu);
>     200                         bpf_rstat_flush(pos, cgroup_parent(pos), cpu);
>     201 
>     202                         rcu_read_lock();
>     203                         list_for_each_entry_rcu(css, &pos->rstat_css_list,
>     204                                                 rstat_css_node)
>     205                                 css->ss->css_rstat_flush(css, cpu);
>     206                         rcu_read_unlock();
>     207                 }
>     208                 raw_spin_unlock_irqrestore(cpu_lock, flags);
>     209 
>     210                 /* play nice and yield if necessary */
>     211                 if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) {
> --> 212                         spin_unlock_irq(&cgroup_rstat_lock);
> 
> But it's sort of confusing that irqsave isn't used here.  It's so
> weird that irq doesn't disable interrupts on some configs where _irqsave
> does.  That seems like the naming is bad.

There are two lock types. Each type match lock/unlock-irq and
save/restore.

To explain/ demonstrate it in this context:
- !RT
  spin_lock_irq()		/* disables interrupts */
  raw_spin_lock_irqsave() 	/* keeps interrupts disabled */
  raw_spin_unlock_irqrestore() 	/* keeps interrupts disabled */
  spin_unlock_irq()		/* enabled interrupts */

- RT
  spin_lock_irq()		/* does not touch interrupts */
  raw_pin_lock_irqsave() 	/* disables interrupts, saves previous state */
  raw_spin_unlock_irqrestore() 	/* enables interrupts, restores previous state */
  spin_unlock_irq()		/* does not touch interrupts */

So the raw_* type needs to save/restore the state because it is used
within spin_lock.*.

The version:
  spin_lock_irq()
  raw_pin_lock()
  raw_spin_unlock()
  spin_unlock_irq()

would be fine _unless_ the inner raw_spinlock_t is used in IRQ context
(which is the case here).

Not sure if you want to take any of this to update the checker. It would
probably make sense to distinguish between spinlock_t and raw_spinlock_t
because they are different.
Also, something like
  spin_lock_irqsave()
  spin_unlock()
  local_irq_restore()

and
  spin_lock_irqsave()
  raw_spin_lock()
  spin_unlock();
  raw_spin_unlock_irqrestore()

is both invalid on a PREEMPT_RT enabled config. 

If you want to replace spin_lock_irq.*() with something else because it
does not disable interrupts on PREEMPT_RT be aware that this discussion
can trigger a longer debate for the need and the right naming. Also, as
of -rc4 we have over 20k user of spin_lock_irq.*. So the whole process
could exceed my best case lifespan ;)

>     213                         if (!cond_resched())
>     214                                 cpu_relax();
>     215                         spin_lock_irq(&cgroup_rstat_lock);
>     216                 }
>     217         }
>     218 }
> 
> regards,
> dan carpenter

Sebastian



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux