On Mon, Nov 23, 2020 at 10:49:38PM -0800, Chris Goldsworthy wrote: > +static void __evict_bh_lru(void *arg) > +{ > + struct bh_lru *b = &get_cpu_var(bh_lrus); > + struct buffer_head *bh = arg; > + int i; > + > + for (i = 0; i < BH_LRU_SIZE; i++) { > + if (b->bhs[i] == bh) { > + brelse(b->bhs[i]); > + b->bhs[i] = NULL; > + goto out; That's an odd way to spell 'break' ... > + } > + } > +out: > + put_cpu_var(bh_lrus); > +} ... > @@ -3245,8 +3281,15 @@ drop_buffers(struct page *page, struct buffer_head **buffers_to_free) > > bh = head; > do { > - if (buffer_busy(bh)) > - goto failed; > + if (buffer_busy(bh)) { > + /* > + * Check if the busy failure was due to an > + * outstanding LRU reference > + */ > + evict_bh_lrus(bh); > + if (buffer_busy(bh)) > + goto failed; Do you see any performance problems with this? I'm concerned that we need to call all CPUs for each buffer on a page, so with a 4kB page and 512-byte block, we'd call each CPU eight times (with a 64kB page size and 4kB page, we'd call each CPU 16 times!). We might be better off just calling invalidate_bh_lrus() -- we'd flush the entire LRU, but we'd only need to do it once, not once per buffer head. We could have a more complex 'evict' that iterates each busy buffer on a page so transforming: for_each_buffer for_each_cpu for_each_lru_entry to: for_each_cpu for_each_buffer for_each_lru_entry (and i suggest that way because it's more expensive to iterate the buffers than it is to iterate the lru entries)