Re: problem w/ read caching..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 01, 2012 at 08:56:21PM +0000, Brad Walker wrote:
> Kent Overstreet <koverstreet@...> writes:
> 
> > 
> > On Mon, Oct 01, 2012 at 08:05:14PM +0000, Brad Walker wrote:
> > > Kent Overstreet <koverstreet@...> writes:
> > > 
> > > > 
> > > > What about cache_bypass_hits, cache_bypass_misses?
> > > > 
> > > 
> > > cache_bypass_hits = 0
> > > cache_bypass_misses = 0
> > 
> > I should've just asked you for all the stats - what about
> > cache_miss_collision?

So cache_miss_collisions, cache_read_races are 0...

----

I was just browsing around the code, and I bet I know what it is -
btree_insert_check_key() is failing because the btree node is full.

The way the code works is on cache miss, we can't just blindly insert
that data into the cache because if a write happens to the same location
after the cache miss but before the data from the cache miss gets
inserted, we'd overwrite the write with stale data.

So btree_insert_check_key() inserts a fake key atomically with the cache
miss - we don't need that key to be persisted so we can skip
journalling and all the normal btree insert code, which is how we can
insert this fake key atomically.

Then, on when we go to insert the real key that points to the data from
the cache miss, we check if the fake key we inserted is still present
and fail the insert if it's not.

It's cmpxchg(), but for the btree.

Anyways... since we're skipping all the normal btree_insert() code,
btree_insert_check_key() can't split the btree node if it's full - if
the btree node is full it just fails it.

This'd be perfectly fine in any normal workload where you've got some
mix of reads and writes... if the btree node is full, a write will come
along to split it.

But the synthetic workload is a bit of a pathological case here :)

But, we should confirm this really is what's going on...  Can you apply
this patch and rerun to test my theory? See if the number of times the
printk fires lines up with the number of cache misses.


diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 4102267..d5c5313 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1875,9 +1875,13 @@ bool bch_btree_insert_check_key(struct btree *b, struct btree_op *op,
 	rw_unlock(false, b);
 	rw_lock(true, b, b->level);
 
+	if (should_split(b)) {
+		printk(KERN_DEBUG "bcache: bch_btree_insert_check_key() failed because btree node full\n");
+		goto out;
+	}
+
 	if (b->key.ptr[0] != btree_ptr ||
-	    b->seq != seq + 1 ||
-	    should_split(b))
+	    b->seq != seq + 1)
 		goto out;
 
 	op->replace = KEY(op->inode, bio_end(bio), bio_sectors(bio));
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux