On 2/8/25 4:11 PM, Ming Lei wrote: > On Thu, Feb 06, 2025 at 07:24:02PM +0530, Nilay Shroff wrote: >> >> >> On 2/5/25 9:23 PM, Christoph Hellwig wrote: >>> On Wed, Feb 05, 2025 at 08:14:48PM +0530, Nilay Shroff wrote: >>>> The sysfs attributes are already protected with sysfs/kernfs internal >>>> locking. So acquiring q->sysfs_lock is not needed while accessing sysfs >>>> attribute files. So this change helps avoid holding q->sysfs_lock while >>>> accessing sysfs attribute files. >>> >>> the sysfs/kernfs locking only protects against other accesses using >>> sysfs. But that's not really the most interesting part here. We >>> also want to make sure nothing changes underneath in a way that >>> could cause crashes (and maybe even torn information). >>> >>> We'll really need to audit what is accessed in each method and figure >>> out what protects it. Chances are that sysfs_lock provides that >>> protection in some case right now, and chances are also very high >>> that a lot of this is pretty broken. >>> >> Yes that's possible and so I audited all sysfs attributes which are >> currently protected using q->sysfs_lock and I found some interesting >> facts. Please find below: >> >> 1. io_poll: >> Write to this attribute is ignored. So, we don't need q->sysfs_lock. >> >> 2. io_poll_delay: >> Write to this attribute is NOP, so we don't need q->sysfs_lock. >> >> 3. io_timeout: >> Write to this attribute updates q->rq_timeout and read of this attribute >> returns the value stored in q->rq_timeout Moreover, the q->rq_timeout is >> set only once when we init the queue (under blk_mq_init_allocated_queue()) >> even before disk is added. So that means that we may not need to protect >> it with q->sysfs_lock. >> >> 4. nomerges: >> Write to this attribute file updates two q->flags : QUEUE_FLAG_NOMERGES >> and QUEUE_FLAG_NOXMERGES. These flags are accessed during bio-merge which >> anyways doesn't run with q->sysfs_lock held. Moreover, the q->flags are >> updated/accessed with bitops which are atomic. So, I believe, protecting >> it with q->sysfs_lock is not necessary. >> >> 5. nr_requests: >> Write to this attribute updates the tag sets and this could potentially >> race with __blk_mq_update_nr_hw_queues(). So I think we should really >> protect it with q->tag_set->tag_list_lock instead of q->sysfs_lock. >> >> 6. read_ahead_kb: >> Write to this attribute file updates disk->bdi->ra_pages. The disk->bdi-> >> ra_pages is also updated under queue_limits_commit_update() which runs >> holding q->limits_lock; so I think this attribute file should be protected >> with q->limits_lock and protecting it with q->sysfs_lock is not necessary. >> Maybe we should move it under the same sets of attribute files which today >> runs with q->limits_lock held. >> >> 7. rq_affinity: >> Write to this attribute file makes atomic updates to q->flags: QUEUE_FLAG_SAME_COMP >> and QUEUE_FLAG_SAME_FORCE. These flags are also accessed from blk_mq_complete_need_ipi() >> using test_bit macro. As read/write to q->flags uses bitops which are atomic, >> protecting it with q->stsys_lock is not necessary. >> >> 8. scheduler: >> Write to this attribute actually updates q->elevator and the elevator change/switch >> code expects that the q->sysfs_lock is held while we update the iosched to protect >> against the simultaneous __blk_mq_update_nr_hw_queues update. So yes, this field needs >> q->sysfs_lock protection. >> >> However if we're thinking of protecting sched change/update using q->tag_sets-> >> tag_list_lock (as discussed in another thread), then we may use q->tag_set-> >> tag_list_lock instead of q->sysfs_lock here while reading/writing to this attribute >> file. > > This is one misuse of tag_list_lock, which is supposed to cover host > wide change, and shouldn't be used for request queue level protection, > which is exactly provided by q->sysfs_lock. > Yes I think Christoph was also pointed about the same but then assuming schedule/elevator update would be a rare operation it may not cause a lot of contention. Having said that, I'm also fine creating another lock just to protect elevator changes and removing ->sysfs_lock from elevator code. > Not mention it will cause ABBA deadlock over freeze lock, please see > blk_mq_update_nr_hw_queues(). And it can't be used for protecting > 'nr_requests' too. I don't know how this might cause ABBA deadlock. The proposal here's to use ->tag_list_lock (instead of ->sysfs_lock) while updating scheduler attribute from sysfs as well as while we update the elevator through __blk_mq_update_nr_hw_queues(). In each code path (either from sysfs attribute update or from nr_hw_queues update), we first acquire ->tag_list_lock and then freeze-lock. Do you see any code path where the above order might not be followed? Thanks, --Nilay