Re: Introduction and project proposal on erasure codes.

Sage Weil <sage@xxxxxxxxxxxx> · Mon, 25 Jun 2018 12:00:27 +0000 (UTC)

On Mon, 25 Jun 2018, Aaditya M Nair wrote:
> Hey guys,
> Aayush and me are new to the Ceph project and are very interested in
> contributing to it.

Great!

> In particular, we want to implement a particular example of reed solomon codes
> described in https://arxiv.org/abs/1501.06683.
> This paper introduces a new repair scheme for reed solomon codes that improves
> the network bandwidth usage at the cost of disk usage. We think that it could
> be very useful to Ceph as an erasure code plugin.
> We would love any and all assistance regarding this project.

I encourage you to look at the existing LRC plugin and at the Clay code 
PR that is pending and should merge soon

	https://github.com/ceph/ceph/pull/14300

to make sure this new code is adding a useful new point in the design 
space.  I'm not an EC expert and have trouble identifying the key 
properties and tradeoffs between the different coding schemes.

> Also, we have some more specific doubts regarding to erasure code plugins
> itself. Forgive me if these questions are already answered somewhere.
> 1.   While writing a new plugin, how do you guys load it dynamically when
> testing your code? The closest I found was recompiling every time we change
> code but I am really hoping that there is a better way.

By default a bunch of files need to be relinked every time the git commit 
changes.  When developing you can reduce the impact by only building a 
minimal set of targets to run vstart (make vstart-base or make vstart) and 
you can also disable this behavior entirely by passing 
-DENABLE_GIT_VERSION=OFF to cmake or do_cmake.sh.

> 2.   How do you test if the data is split correctly or if it being
> reconstructed correctly?

In the simplest case you can simply test creating a pool with your EC code 
and then writing and reading back an object.  Our integration testing does 
this extensively (while mixing in node failures and so on).  There are 
also a range of unit tests for the erasure codes in src/test/erasure-code 
(probably the place to start!), and ceph-erasure-code-corpus, which is 
just a pile of EC configurations and data samples driven by scripts in 
qa/workunits/erasure-code.

sage