On Fri, Mar 24 2023 at 3:34P -0400, Jens Axboe <axboe@xxxxxxxxx> wrote: > Just some random drive-by comments. > > > diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c > > index 1de1bdcda1ce..a58f8ac3ba75 100644 > > --- a/drivers/md/dm-bufio.c > > +++ b/drivers/md/dm-bufio.c > > +static void lru_destroy(struct lru *lru) > > +{ > > + BUG_ON(lru->cursor); > > + BUG_ON(!list_empty(&lru->iterators)); > > +} > > Ehm no, WARN_ON_ONCE() for these presumably. Yeah, I raised concern about the BUG_ONs with Joe. Will try to rid the code of BUG_ONs as follow-on work. > > @@ -116,9 +366,579 @@ struct dm_buffer { > > #endif > > }; > > > > +/*--------------------------------------------------------------*/ > > + > > +/* > > + * The buffer cache manages buffers, particularly: > > + * - inc/dec of holder count > > + * - setting the last_accessed field > > + * - maintains clean/dirty state along with lru > > + * - selecting buffers that match predicates > > + * > > + * It does *not* handle: > > + * - allocation/freeing of buffers. > > + * - IO > > + * - Eviction or cache sizing. > > + * > > + * cache_get() and cache_put() are threadsafe, you do not need to > > + * protect these calls with a surrounding mutex. All the other > > + * methods are not threadsafe; they do use locking primitives, but > > + * only enough to ensure get/put are threadsafe. > > + */ > > + > > +#define NR_LOCKS 64 > > +#define LOCKS_MASK (NR_LOCKS - 1) > > + > > +struct tree_lock { > > + struct rw_semaphore lock; > > +} ____cacheline_aligned_in_smp; > > + > > +struct dm_buffer_cache { > > + /* > > + * We spread entries across multiple trees to reduce contention > > + * on the locks. > > + */ > > + struct tree_lock locks[NR_LOCKS]; > > + struct rb_root roots[NR_LOCKS]; > > + struct lru lru[LIST_SIZE]; > > +}; > > This: > > struct foo_tree { > struct rw_semaphore lock; > struct rb_root root; > struct lru lru; > } ____cacheline_aligned_in_smp; > > would be a lot better. > > And where does this NR_LOCKS come from? Don't make up magic values out > of thin air. Should this be per-cpu? per-node? N per node? I'll bet you > that 64 is way too much for most use cases, and too little for others. I cannot speak to the 64 magic value (other than I think it worked well for Joe's testbed). But the point of this exercise is to split the lock to avoid contention. Using 64 accomplishes this. Having there be more or less isn't _that_ critical. The hash to get the region index isn't a function of cpu. We aren't after lockless here. Now that said, will certainly discuss further with Joe and see if we can be smarter here. Your suggestion to combine members certainly makes a lot of sense. I ran with it relative to the bio-prison-v1 (patch 9) changes which have the same layout. Definitely a win on in-core memory as well as avoiding cacheline thrashing while accessing the lock and then the rb_root members (as is always expected): Before: # pahole -C dm_bio_prison drivers/md/dm-bio-prison.ko struct dm_bio_prison { struct lock lock[64] __attribute__((__aligned__(64))); /* 0 4096 */ /* --- cacheline 64 boundary (4096 bytes) --- */ struct rb_root cells[64]; /* 4096 512 */ /* --- cacheline 72 boundary (4608 bytes) --- */ mempool_t cell_pool; /* 4608 72 */ /* size: 4736, cachelines: 74, members: 3 */ /* padding: 56 */ /* forced alignments: 1 */ } __attribute__((__aligned__(64))); After: # pahole -C prison_region drivers/md/dm-bio-prison.ko struct prison_region { spinlock_t lock; /* 0 4 */ /* XXX 4 bytes hole, try to pack */ struct rb_root cell; /* 8 8 */ /* size: 64, cachelines: 1, members: 2 */ /* sum members: 12, holes: 1, sum holes: 4 */ /* padding: 48 */ } __attribute__((__aligned__(64))); # pahole -C dm_bio_prison drivers/md/dm-bio-prison.ko struct dm_bio_prison { struct prison_region regions[64] __attribute__((__aligned__(64))); /* 0 4096 */ /* --- cacheline 64 boundary (4096 bytes) --- */ mempool_t cell_pool; /* 4096 72 */ /* size: 4224, cachelines: 66, members: 2 */ /* padding: 56 */ /* forced alignments: 1 */ } __attribute__((__aligned__(64))); > > @@ -1141,7 +1904,6 @@ static void *new_read(struct dm_bufio_client *c, sector_t block, > > } > > > > *bp = b; > > - > > return b->data; > > } > > > > Unrelated change. There are a bunch of these. I knocked those out, and also the various bracing issues. > I stopped reading here, the patch is just too long. Surely this could be > split up? > > 1 file changed, 1292 insertions(+), 477 deletions(-) > > That's not a patch, that's a patch series. I don't want to upset you or the community but saying this but: in this narrow instance where a sizable percentage of the file got rewritten: to properly review this work you need to look at the full scope of the changes in combination. The effort required by the developer to split the code to ease mail client based review wasn't worth it to me. Mikulas and myself bear the burden of review on dm-bufio.c -- so I gave Joe a pass on splitting because it is make-work to appease the "thou shalt spoon feed reviewers on a mailing list" rule (that again, I argue isn't applicable for more elaborate changes that are in the end intertwined). It hardly seems like time well spent to now go back and re-write the code in terms of layered patches. I'd much rather we spend that time on eliminating the abuse of BUG_ONs, etc. (I stopped short of saying this to you in chat because I wasn't prepared to get into a tight loop with you interactively at that particular moment.. or maybe ever ;) But again, like I said in chat: I'll look closer, and of course discuss with Joe, to see if splitting is worth the investment. Thanks for your review, very much appreciated. Mike