Re: bcache not working on large files?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Matthew,

> > no, I was explicitly testing random I/O writes of 4k blocks, 
> > no sequential writing. With a file of 1000 GB it does work, but
> > if I use a 10000 GB file, it seems to fail. I would expect, that
> > the size should not really matter here, at least until the cache
> > is filled up.
> 
> 
> When the SSD gets too slow it will get by-passed by bcache. The tunable is in the document. Though if memory serves it's a latency of 20ms for writes which is probably way too short since SSDs can easily take 1-5 seconds when they have to resort to heavy lifting.

ah, that could be an explanation for this effect. But I do not find
an apropriate parameter in the documentation. Do you have a hint?

> What you should do is turn off RAID controller READ caching entirely. And turn OFF writeBACK-caching for the SSD-based LUN(s) at said controller that are being used as bcache caching devices.

hmm, yes read cache for SSDs is a kind of overhead. You mean, this may
have an impact, too? Actually I am not sure about the settings, it is
likely the default...

However, it worked with smaller files like 1000 GB. There is no change
at all in random I/O but the size of the file I am using.

> It would be helpful is you elaborated as to the HBA and SSD drives you are using. BTW doing RAID across the SSDs being used for cache is rather pointless IMO. You're shortening their life, adding unnecessary write IOPs. It's a cache. It's supposed to fail. Bcache will(?) properly handle a busted SSD.

The doumentation says no and recommends a RAID. That's the reason 
why it is used in WriteThrough mode per default. Probably a RAID 5/6
is not a good idea for a Caching SSD-RAID, especially when the HDDs
form a RAID 5/6 on the same controller, too.

Probably a RAID 10 would be better...

> Cache is only useful for absorbing sudden spikes in IOPs or for highly localized and frequently re-used blocks. It's not intended to magically improve the underlying storage by 10-100x under a load that can't be sustained by said layer. Big $$$ SANs have lots of cache but at some point you will reach the saturation point and everything slows to HDD speed.

Yes, I was just testing random I/O. And I had expected, that bcache
will work with large files, too.

So with 8 threads doing random I/O of 4k blocks in a 1000 GB file
results in IOPS in the range of about 20,000. The same setup with
a file of 10,000 GB results in an IOPS of about 400. That is about
the same rate the HDDs have without bcache. This is something I did
not expect. It looks like bcache stopped to work at all...

Maybe your first comment explains this, so that the SSDs get too
lazy. Maybe this is due to the raid controller working with the
HDDs. 

> I also wouldn't futz with the EXT4 settings you had posted previously while doing  benchmark runs because I expect it gets in the way and makes performance worse than if the block stream was less chunky. Only once you have defined a realistic sustained workload and know how often and how fast journaled-to-SSD writes (bcache write-back) can reasonably be de-staged would I revisit those tuning parameters and see if there is any merit to them.

Oh, that must be another thread, I am using XFS... ;-)

> Personally I put NVRAM boards in my servers to be filesystem journals and MD mirror maps. They're incredibly cheap.

I am just curious, where the limits are and in which ranges on can 
use bcache. But I did not expect this result, therefore I asked for
a hint. I would have expected, it would work as before. The random
writes would entirely fit in the SSD cache even if the HDDs would
sleep all the time...

And yes, I am not convinced that a RAID 5/6 for SSD caches is really a
good idea. Hmm, maybe I can use a RAID 10 instead...

Best regards

Dirk

-- 
+----------------------------------------------------------------------+
| Dr. Dirk Geschke       / Plankensteinweg 61    / 85435 Erding        |
| Telefon: 08122-559448  / Mobil: 0176-96906350 / Fax: 08122-9818106   |
| dirk@xxxxxxxxxxxxxxxxx / dirk@xxxxxxxxxxxxx  / kontakt@xxxxxxxxxxxxx |
+----------------------------------------------------------------------+
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux