Re: [PATCH] blk-mq: Fix races between iterating over requests and freeing requests

Bart Van Assche <bvanassche@xxxxxxx> · Tue, 23 Mar 2021 09:53:12 -0700

On 3/23/21 5:34 AM, John Garry wrote:
> Do we have any performance figures to say that the effect is negligible?

Jens has been so kind to run a quick performance test (thanks Jens!).

> FYI, I did some testing on this, and it looks to solve problems in all
> known scenarios, including interactions of blk_mq_tagset_busy_iter(),
> and blk_mq_queue_tag_busy_iter().

Thanks for the testing!

>>   +DEFINE_SRCU(blk_sched_srcu);
> 
> out of interest, any reason that this is global and not per tagset?

That's a great question. Making it global was the easiest approach to
evaluate and test an SRCU-based solution. I can change the approach to
one SRCU struct per tag set but that will increase the size of each tag
set. The size of an SRCU structure is significant, and in addition to
this structure SRCU allocates memory per CPU:

struct srcu_struct {
	struct srcu_node node[NUM_RCU_NODES];	/* Combining tree. */
	struct srcu_node *level[RCU_NUM_LVLS + 1];
						/* First node at each level. */
	struct mutex srcu_cb_mutex;		/* Serialize CB preparation. */
	spinlock_t __private lock;		/* Protect counters */
	struct mutex srcu_gp_mutex;		/* Serialize GP work. */
	unsigned int srcu_idx;			/* Current rdr array element. */
	unsigned long srcu_gp_seq;		/* Grace-period seq #. */
	unsigned long srcu_gp_seq_needed;	/* Latest gp_seq needed. */
	unsigned long srcu_gp_seq_needed_exp;	/* Furthest future exp GP. */
	unsigned long srcu_last_gp_end;		/* Last GP end timestamp (ns) */
	struct srcu_data __percpu *sda;		/* Per-CPU srcu_data array. */
	unsigned long srcu_barrier_seq;		/* srcu_barrier seq #. */
	struct mutex srcu_barrier_mutex;	/* Serialize barrier ops. */
	struct completion srcu_barrier_completion;
						/* Awaken barrier rq at end. */
	atomic_t srcu_barrier_cpu_cnt;		/* # CPUs not yet posting a */
						/*  callback for the barrier */
						/*  operation. */
	struct delayed_work work;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
	struct lockdep_map dep_map;
#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
};

See also the alloc_percpu() call in init_srcu_struct_fields(). Does
everyone agree with increasing the size of each tag set with a data
structure that has a size that is proportional to the number of CPUs?

Thanks,

Bart.