Re: [PATCH 1/3] bpf: Use spinlock_t in bpf_lru_list

Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> · Fri, 12 Apr 2019 18:14:07 +0200

On 2019-04-10 21:44:22 [+0200], Daniel Borkmann wrote:
> > Ah. I checked one or two of those and it looked like it was raw since
> > the beginning. Anyway, it would be nice to Cc: the RT developer while
> 
> The later ones like form LRU may have just been adapted to use the
> same after the conversion from ac00881f9221, but I presume there might
> unfortunately be little to no testing on RT kernels currently. Hopefully
> we'll get there such that at min bots would yell if something is off
> so it can be fixed.

Thanks. The boot part what made me look at lpm_trie.

> > fiddling with something that only concerns RT.
> 
> I think that was the case back then, see discussion / Cc list:
> 
> https://lore.kernel.org/netdev/1446243386-26582-1-git-send-email-yang.shi@xxxxxxxxxx/T/

So there was a discussion and I somehow missed it. Fair enough.

This memory allocation under the lock. Is that new or was it not seen
back then?

> > That usage pattern that is mentioned in ac00881f9221, is it true for all
> > data structure algorithms? In bpf_lru_list I was concerned about the
> > list loops. However hashtab and lpm_trie may perform memory allocations
> > while holding the lock and this isn't going to work.
> 
> Do you have some document or guide for such typical patterns and how to
> make them behave better for RT?

Let me put something simple together and once I have the pieces in
lockdep I hope that there will be also a document explaining things in
more detail.
For now: try to keep you preemptible.
- spin_lock() -> raw_spin_lock() is correct but
  raw_spin_lock() -> spin_lock() is not correct.

- interrupts handlers run threaded (as with "threadirqs" command line).
  Most code therefore never really disables interrupts. This includes
  spin_lock_irq().
  Therefore local_irq_disable() + spin_lock() != spin_lock_irq()

- preempt_disable(), local_irq_disable(), raw_spin_lock() enables atomic
  context on -RT which makes scheduling impossible.

- in atomic context

  - memory allocations are not possible (including GFP_ATOMIC).

  - a spin_lock() can not be acquired nor released.

  - unbounded loops add to task's max latency which should be avoided.

- architecture's core runs with disabled interrupts and it is attempted
  to keep this part short. This includes even hrtimer callbacks which
  are not invoked with disabled interrupts.

- core code uses raw_spin_lock() if it needs to protect something. If
  you think you need such a lock try to measure the worst case with
  cyclictest and see how it behaves. If it is visible then a different
  design should probably be used by shrinking the atomic section in
  order to become more deterministic.

- debugging code often increases latency (lockdep even by few ms).
  Please document if this introduced unbounded atomic section is
  intended only for debugging.

> Thanks,
> Daniel

Sebastian