I totally defer to the good Mr. Nelson about the internals, nobody groks them like he does. One appeal of OpenCAS would be the ability to throw the WAL+DB there and have any remaining capacity used to cache payload data reads/writes. Unfortunately it's a pitch and a half to get installed. > On Jul 2, 2024, at 16:10, Mark Nelson <mark.nelson@xxxxxxxxx> wrote: > > We've had multiple conversations with Intel in the past around OpenCAS. They had some pretty impressive performance numbers and it was fairly flexible (you could do things like pin block regions into the cache). I had always hoped that they would get it mainlined into the kernel, but ultimately that didn't happen. There's been some more userland focused stuff coming out of the drive vendors though that might be interesting. > > I still would like to get back to my idea at some point of allowing bluestore overwrite extents to live on the fast device and then large (maybe not full) object writes to the slow device of the most fragmented objects. The idea here would be that once an overwrite extent is written to the fast device, we no longer need to create a new extent on the slow device (causing fragmentation) but can instead write it in-place. Potentially if you have compressed data you can accumulate lots of overwrite extents on the fast device and then re-compress in one go. I know Adam has a slightly different approach he would like to take here, but the gist of it is that COW like we do in bluestore is painful when you have flash at your disposal. > > Mark > > > On 7/2/24 15:00, Dan van der Ster wrote: >> Totally: OpenCAS, bcache, etc could also help here. >> >> The important thing for me if we use local device caches is to make >> sure our internal OSD data 'tegrity is still sound: deep-scrubbing >> should readthrough, for example. >> >> Cheers, Dan >> >> On Tue, Jul 2, 2024 at 12:56 PM Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote: >>> Thoughts on how finding a way to implement OpenCAS might address this? >>> >>> On Jul 2, 2024, at 15:44, Dan van der Ster <dan.vanderster@xxxxxxxxx> wrote: >>> >>> A better approach could be to read around misses, using the new faster >>> EC partial read to respond to the client quickly, and then >>> asynchronously promote the whole object into the cache pool. >>> Partial writes are more tricky, probably best handled all via a WAL. >>> >>> >>> _______________________________________________ >>> Dev mailing list -- dev@xxxxxxx >>> To unsubscribe send an email to dev-leave@xxxxxxx >> _______________________________________________ >> Dev mailing list -- dev@xxxxxxx >> To unsubscribe send an email to dev-leave@xxxxxxx > > -- > Best Regards, > Mark Nelson > Head of Research and Development > > Clyso GmbH > p: +49 89 21552391 12 | a: Minnesota, USA > w: https://clyso.com | e: mark.nelson@xxxxxxxxx > > We are hiring: https://www.clyso.com/jobs/ > _______________________________________________ > Dev mailing list -- dev@xxxxxxx > To unsubscribe send an email to dev-leave@xxxxxxx _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx