Re: Quick bcache benchmark

Marcus Sorensen <shadowsor@xxxxxxxxx> · Fri, 16 Dec 2011 11:49:48 -0700

Actually I think this IS user error. I ran a benchmark with FIO, and
the results were practically identical with and without bcache.  I
applied the 3.1.4 kernel patch on top of your 3.1 tree, even though it
applied cleanly I'm guessing that wiped something out. Here are my
stats after running the benchmark on bcache, and also included is the
fio config.

bypassed 32.1G
cache_bypass_hits 5482
cache_bypass_misses 194862
cache_hit_ratio 3
cache_hits 786
cache_miss_collisions 206
cache_misses 19447
cache_readaheads 0

[global]
ioengine=libaio
iodepth=4
invalidate=1 #make sure we're not cached locally
direct=1 #don't use buffers during test (test without local caches)
thread
ramp_time=20
time_based
runtime=180

[8RandomReadWriters]
rw=randrw
numjobs=8
blocksize=4k
size=1G

[2SequentialReadWriters]
rw=rw
numjobs=2
size=4G
blocksize_range=64k-1M

On Thu, Dec 15, 2011 at 9:28 PM, Marcus Sorensen <shadowsor@xxxxxxxxx> wrote:
> Thanks! I'll put it through some more tests. I kind of figured that
> something more real-world would help.
>
> On Thu, Dec 15, 2011 at 7:17 PM, Kent Overstreet <koverstreet@xxxxxxxxxx> wrote:
>> Sorry, I was thinking about that issue for awhile and then I got distracted...
>>
>> It's not user error, it's an irritating corner case. Basically, it's
>> the result of a workaround for a particularly obscure data corruption
>> bug.
>>
>> If a write bypasses the cache, it has to invalidate that region of the
>> cache; the null key it leaves in the cache will block cache misses
>> from adding that data to the cache until the btree node fills up (and
>> possibly splits).
>>
>> It hasn't been an issue for us in normal operation, but when you're
>> just testing - i.e. you don't have much load - that node split may not
>> happen for a long time, and so if for some reason a bunch of data
>> bypassed the cache... well, you see what happens.
>>
>> Unfortunately a better solution to the original race is not going to
>> be simple, so it's probably not going to be done in the very near
>> future. It's a _very_ difficult race to hit, but in the meantime I'd
>> rather lose performance than corrupt data.
>>
>> But the good news is if you put normal server-ish load on it the issue
>> should go away in steady state operation.
>>
>> On Thu, Dec 15, 2011 at 3:40 PM, Marcus Sorensen <shadowsor@xxxxxxxxx> wrote:
>>> Any ideas on this? Do you think it's a bug, or am I just holding it wrong? :-)
>>>
>>> On Sat, Dec 10, 2011 at 8:02 AM, Marcus Sorensen <shadowsor@xxxxxxxxx> wrote:
>>>> That keeps the 'bypassed' value from increasing, but it doesn't change
>>>> write performance.
>>>>
>>>> BEFORE:
>>>> [root@sansrv2-10 stats_day]# cat *
>>>> 27.6M
>>>> 83
>>>> 3500
>>>> 0
>>>> 166
>>>> 24380
>>>> 40660
>>>> 0
>>>>
>>>> ...benchmarking...
>>>>
>>>> AFTER:
>>>>
>>>> [root@sansrv2-10 stats_day]#  for i in `ls`; do echo -n "$i "; cat $i;
>>>>> done 2>/dev/null
>>>> bypassed 27.6M
>>>> cache_bypass_hits 83
>>>> cache_bypass_misses 3500
>>>> cache_hit_ratio 0
>>>> cache_hits 410
>>>> cache_miss_collisions 48879
>>>> cache_misses 80545
>>>> cache_readaheads 0
>>>>
>>>> /sys/fs/bcache/60da061c-d646-4ebe-931a-d8580add411d
>>>>
>>>> average_key_size 0
>>>> block_size 2.0k
>>>> btree_cache_size 3.2M
>>>> bucket_size 1.0M
>>>> cache_available_percent 100
>>>> clear_stats congested 0
>>>> congested_threshold_us 0
>>>> dirty_data 0
>>>> io_error_halflife 0
>>>> io_error_limit 8
>>>> root_usage_percent 0
>>>> synchronous 1
>>>> tree_depth 1
>>>>
>>>>
>>>> On Fri, Dec 9, 2011 at 11:33 PM, Kent Overstreet
>>>> <kent.overstreet@xxxxxxxxx> wrote:
>>>>> On Fri, Dec 09, 2011 at 10:09:55AM -0700, Marcus Sorensen wrote:
>>>>>> Here's some more info. I'm running kernel 3.1.4. When I do random
>>>>>> writes, the 'bypassed' number increases in stats. Now I'm random
>>>>>> writing direct to /dev/bcache0 and get the same result.
>>>>>
>>>>> Weird. From what you're describing it sounds like throttling is screwed
>>>>> up (and it was recently), but I can't reproduce it now.
>>>>>
>>>>> Can you try echoing 0 to congested_threshold_us in the cache set dir,
>>>>> and seeing if that fixes it?
>>>>>
>>>>>> There also seems to be some work needed with clean-up, since I'm
>>>>>> unfamiliar with how bcache works I attempted to make-bcache twice,
>>>>>> thinking I'd start over. That worked, but because my cache device was
>>>>>> already registered I was unable to re-register my newly formatted
>>>>>> cache dev, got "kobject_add_internal failed for bcache with -EEXIST,
>>>>>> don't try to register things with the same name in the same
>>>>>> directory." I was still able to use my cache device via the old uuid,
>>>>>> but this will probably cause problems on reboot. Perhaps an unregister
>>>>>> file in /sys/fs/bcache would help, I also tried rmmod'ing bcache to
>>>>>> see if I could clear /sys/fs/bcache, but no luck. make-bcache should
>>>>>> perhaps check for an existing superblock, ask for confirmation, and
>>>>>> give some sort instruction on how to unregister, or do it for you if
>>>>>> you reformat.
>>>>>
>>>>> Yeah, I think for some reason bcache isn't opening the devices
>>>>> exclusively on 3.1. I'll have a look...
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html