On 2/15/22 1:43 PM, Bob Pearson wrote:
On 2/14/22 14:48, Bart Van Assche wrote:
On 2/14/22 12:25, Bob Pearson wrote:
It helps. I am trying to run blktests -q srp but I need to install
xfs first it seems. Do I need two nodes or can I run it with just
one?
XFS? All SRP tests use the null_blk driver if I remember correctly and hence don't need any physical block device. Some tests outside the SRP directory require xfstools but the SRP tests do not. If blktests are run as follows, XFS should not be required:
./check -q srp
Thanks,
Bart.
I am now able to reproduce what I think is the same trace you are seeing.
The first error is the warning:
[ 1808.574513] WARNING: CPU: 7 PID: 3887 at kernel/softirq.c:363 __local_bh_enable_ip+0xac/0x100
which is called from __local_bh_enable_ip()
void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
{
WARN_ON_ONCE(in_irq());
lockdep_assert_irqs_enabled();
#ifdef CONFIG_TRACE_IRQFLAGS
local_irq_disable();
#endif
in lockdep_assert_irqs_enabled()
and this is in turn called from __rxe_add_index() which looks like
int __rxe_add_index(struct rxe_pool_elem *elem)
{
struct rxe_pool *pool = elem->pool;
int err;
write_lock_bh(&pool->pool_lock);
err = __rxe_add_index_locked(elem);
write_unlock_bh(&pool->pool_lock);
return err;
}
in the write_unlock_bh() call. This appears to complain if hardirqs are not enabled on the current cpu.
Let's suppose only NIC is involved at the moment, once NIC driver has
switched to NAPI which means no hard irq is enabled, is it possible?
This only happens if CONFIG_PROVE_LOCKING=y. The problem with all this is that the pool->pool_lock is never held by anyone
else except __rxe_add_index when the first error occurs. Perhaps someone else has disabled hard irqs and lets us gain
control of this cpu. If instead of _bh locks we use _irqsave locks in rxe_pool.c, which was the case a while ago
the test is different and passes. If you don't set CONFIG_PROVE_LOCKING this error does not happen.
Somehow just using _irqsave locks because it makes this error vanish doesn't seem right. There should be a root
cause that makes sense.
At least I can find two similar fixes, just FYI.
4956b9eaad45 io_uring: rsrc ref lock needs to be IRQ safe
2800aadc18a6 iwlwifi: Fix softirq/hardirq disabling in
iwl_pcie_enqueue_hcmd()
Thanks,
Guoqing