On Wed, Aug 21, 2024 at 05:47:31PM +0200, Christian Brauner wrote: > The i_state member is an unsigned long so that it can be used with the > wait bit infrastructure which expects unsigned long. This wastes 4 bytes > which we're unlikely to ever use. Switch to using the var event wait > mechanism using the address of the bit. Thanks to Linus for the address > idea. > > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> > --- > fs/inode.c | 10 ++++++++++ > include/linux/fs.h | 16 ++++++++++++++++ > 2 files changed, 26 insertions(+) > > diff --git a/fs/inode.c b/fs/inode.c > index 154f8689457f..f2a2f6351ec3 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -472,6 +472,16 @@ static void __inode_add_lru(struct inode *inode, bool rotate) > inode->i_state |= I_REFERENCED; > } > > +struct wait_queue_head *inode_bit_waitqueue(struct wait_bit_queue_entry *wqe, > + struct inode *inode, u32 bit) > +{ > + void *bit_address; > + > + bit_address = inode_state_wait_address(inode, bit); > + init_wait_var_entry(wqe, bit_address, 0); > + return __var_waitqueue(bit_address); > +} > + > /* > * Add inode to LRU if needed (inode is unused and clean). > * > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 23e7d46b818a..a5b036714d74 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -744,6 +744,22 @@ struct inode { > void *i_private; /* fs or device private pointer */ > } __randomize_layout; > > +/* > + * Get bit address from inode->i_state to use with wait_var_event() > + * infrastructre. > + */ > +#define inode_state_wait_address(inode, bit) ((char *)&(inode)->i_state + (bit)) > + > +struct wait_queue_head *inode_bit_waitqueue(struct wait_bit_queue_entry *wqe, > + struct inode *inode, u32 bit); > + > +static inline void inode_wake_up_bit(struct inode *inode, u32 bit) > +{ > + /* Ensure @bit will be seen cleared/set when caller is woken up. */ > + smp_mb(); > + wake_up_var(inode_state_wait_address(inode, bit)); > +} Why is this memory barrier necessary? wakeup_var() takes the wait queue head spinlock, as does prepare_to_wait() before the waiter condition check and the subsequent finish_wait() when the wait is done. Hence by the time the functions like inode_wait_for_writeback() return to the caller, there's been a full unlock->lock cycle on the wait queue head lock between the bit being set and the waiter waking and checking it. ie. the use of the generic waitqueue infrastructure implies a full memory barrier between the bit being set, the wakeup being issued and the waiter checking the condition bit. Hence I'm not sure what race condition this memory barrier is preventing - is this just a historical artifact that predates the current wait/wake implementation? -Dave. -- Dave Chinner david@xxxxxxxxxxxxx