Re: Quick bcache benchmark

Kent Overstreet <koverstreet@xxxxxxxxxx> · Fri, 16 Dec 2011 10:52:55 -0800

That's what you'd expect in writethrough mode when you aren't getting
any cache hits - try flipping on writeback and see what happens.

On Fri, Dec 16, 2011 at 10:49 AM, Marcus Sorensen <shadowsor@xxxxxxxxx> wrote:
> Actually I think this IS user error. I ran a benchmark with FIO, and
> the results were practically identical with and without bcache.  I
> applied the 3.1.4 kernel patch on top of your 3.1 tree, even though it
> applied cleanly I'm guessing that wiped something out. Here are my
> stats after running the benchmark on bcache, and also included is the
> fio config.
>
> bypassed 32.1G
> cache_bypass_hits 5482
> cache_bypass_misses 194862
> cache_hit_ratio 3
> cache_hits 786
> cache_miss_collisions 206
> cache_misses 19447
> cache_readaheads 0
>
> [global]
> ioengine=libaio
> iodepth=4
> invalidate=1 #make sure we're not cached locally
> direct=1 #don't use buffers during test (test without local caches)
> thread
> ramp_time=20
> time_based
> runtime=180
>
> [8RandomReadWriters]
> rw=randrw
> numjobs=8
> blocksize=4k
> size=1G
>
> [2SequentialReadWriters]
> rw=rw
> numjobs=2
> size=4G
> blocksize_range=64k-1M
>
>
> On Thu, Dec 15, 2011 at 9:28 PM, Marcus Sorensen <shadowsor@xxxxxxxxx> wrote:
>> Thanks! I'll put it through some more tests. I kind of figured that
>> something more real-world would help.
>>
>> On Thu, Dec 15, 2011 at 7:17 PM, Kent Overstreet <koverstreet@xxxxxxxxxx> wrote:
>>> Sorry, I was thinking about that issue for awhile and then I got distracted...
>>>
>>> It's not user error, it's an irritating corner case. Basically, it's
>>> the result of a workaround for a particularly obscure data corruption
>>> bug.
>>>
>>> If a write bypasses the cache, it has to invalidate that region of the
>>> cache; the null key it leaves in the cache will block cache misses
>>> from adding that data to the cache until the btree node fills up (and
>>> possibly splits).
>>>
>>> It hasn't been an issue for us in normal operation, but when you're
>>> just testing - i.e. you don't have much load - that node split may not
>>> happen for a long time, and so if for some reason a bunch of data
>>> bypassed the cache... well, you see what happens.
>>>
>>> Unfortunately a better solution to the original race is not going to
>>> be simple, so it's probably not going to be done in the very near
>>> future. It's a _very_ difficult race to hit, but in the meantime I'd
>>> rather lose performance than corrupt data.
>>>
>>> But the good news is if you put normal server-ish load on it the issue
>>> should go away in steady state operation.
>>>
>>> On Thu, Dec 15, 2011 at 3:40 PM, Marcus Sorensen <shadowsor@xxxxxxxxx> wrote:
>>>> Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-)
>>>>
>>>> On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor@xxxxxxxxx> wrote:
>>>>> That keeps the 'bypassed' value from increasing, but it doesn't change
>>>>> write performance.
>>>>>
>>>>> BEFORE:
>>>>> [root@sansrv2-10 stats_day]# cat *
>>>>> 27.6M
>>>>> 83
>>>>> 3500
>>>>> 0
>>>>> 166
>>>>> 24380
>>>>> 40660
>>>>> 0
>>>>>
>>>>> ...benchmarking...
>>>>>
>>>>> AFTER:
>>>>>
>>>>> [root@sansrv2-10 stats_day]#  for i in `ls`; do echo -n "$i "; cat $i;
>>>>>> done 2>/dev/null
>>>>> bypassed 27.6M
>>>>> cache_bypass_hits 83
>>>>> cache_bypass_misses 3500
>>>>> cache_hit_ratio 0
>>>>> cache_hits 410
>>>>> cache_miss_collisions 48879
>>>>> cache_misses 80545
>>>>> cache_readaheads 0
>>>>>
>>>>> /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d
>>>>>
>>>>> average_key_size 0
>>>>> block_size 2.0k
>>>>> btree_cache_size 3.2M
>>>>> bucket_size 1.0M
>>>>> cache_available_percent 100
>>>>> clear_stats congested 0
>>>>> congested_threshold_us 0
>>>>> dirty_data 0
>>>>> io_error_halflife 0
>>>>> io_error_limit 8
>>>>> root_usage_percent 0
>>>>> synchronous 1
>>>>> tree_depth 1
>>>>>
>>>>>
>>>>> On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet
>>>>> <kent.overstreet@xxxxxxxxx> wrote:
>>>>>> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote:
>>>>>>> Here's some more info. I'm running kernel 3.1.4. When I do random
>>>>>>> writes, the 'bypassed' number increases in stats. Now I'm random
>>>>>>> writing direct to /dev/bcache0 and get the same result.
>>>>>>
>>>>>> Weird. From what you're describing it sounds like throttling is screwed
>>>>>> up (and it was recently), but I can't reproduce it now.
>>>>>>
>>>>>> Can you try echoing 0 to congested_threshold_us in the cache set dir,
>>>>>> and seeing if that fixes it?
>>>>>>
>>>>>>> There also seems to be some work needed with clean-up, since I'm
>>>>>>> unfamiliar with how bcache works I attempted to make-bcache twice,
>>>>>>> thinking I'd start over. That worked, but because my cache device was
>>>>>>> already registered I was unable to re-register my newly formatted
>>>>>>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST,
>>>>>>> don't try to register things with the same name in the same
>>>>>>> directory." I was still able to use my cache device via the old uuid,
>>>>>>> but this will probably cause problems on reboot. Perhaps an unregister
>>>>>>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to
>>>>>>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should
>>>>>>> perhaps check for an existing superblock, ask for confirmation, and
>>>>>>> give some sort instruction on how to unregister, or do it for you if
>>>>>>> you reformat.
>>>>>>
>>>>>> Yeah, I think for some reason bcache isn't opening the devices
>>>>>> exclusively on 3.1. I'll have a look...
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html