On Wed, Jan 15, 2025 at 08:05:21PM +0800, Jinliang Zheng wrote: > On Wed, 15 Jan 2025 11:28:54 +1100, Dave Chinner wrote: > > On Fri, Dec 20, 2024 at 01:16:29AM +0800, Jinliang Zheng wrote: > > > xfs_buf uses a semaphore for mutual exclusion, and its count value > > > is initialized to 1, which is equivalent to a mutex. > > > > > > However, mutex->owner can provide more information when analyzing > > > vmcore, making it easier for us to identify which task currently > > > holds the lock. > > > > However, the buffer lock also protects the buffer state and contents > > whilst IO id being performed and it *is not owned by any task*. > > > > A single lock cycle for a buffer can pass through multiple tasks > > before being unlocked in a different task to that which locked it: > > > > p0 <intr> <kworker> > > xfs_buf_lock() > > ... > > <submitted for async io> > > <wait for IO completion> > > ..... > > <io completion> > > queued to workqueue > > ..... > > perform IO completion > > xfs_buf_unlock() > > > > > > IOWs, the buffer lock here prevents any other task from accessing > > and modifying the contents/state of the buffer until the IO in > > flight is completed. i.e. the buffer contents are guaranteed to be > > stable during write IO, and unreadable when uninitialised during > > read IO.... > > Yes. > > > > > i.e. the locking model used by xfs_buf objects is incompatible with > > the single-owner-task critical section model implemented by > > mutexes... > > > > Yes, from a model perspective. > > This patch is proposed for two reasons: > 1. The maximum count of the xfs_buf->b_sema is 1, which means that only one > kernel code path can hold it at the same time. From this perspective, > changing it to mutex will not have any functional impact. > 2. When troubleshooting the hungtask of xfs, sometimes it is necessary to > locate who has acquired the lock. Although, as you said, xfs_buf->b_sema > will flow to other kernel code paths after being down(), it is also helpful > to know which kernel code path locked it first. > > Haha, that's just my thought. If you think there is really no need to know who > called the down() on xfs_buf->b_sema, please just ignore this patch. We are rejecting the patch because it's fundamentally broken, not because we don't want debugging visibility. If you want to track what task locked a semaphore, then that should be added to the semaphore implementation. Changing the XFS locking implementation is not the solution to the problem you are trying to solve. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx