Re: Non Shared Persistent Gluster Storage with Kubernetes

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Wed, 6 Jul 2016 15:25:33 +0530

On Wed, Jul 6, 2016 at 12:24 AM, Shyam <srangana@xxxxxxxxxx> wrote:
On 07/01/2016 01:45 AM, B.K.Raghuram wrote:

I have not gone through this implementation nor the new iscsi

implementation being worked on for 3.9 but I thought I'd share the

design behind a distributed iscsi implementation that we'd worked on

some time back based on the istgt code with a libgfapi hook.

The implementation used the idea of using one file to represent one

block (of a chosen size) thus allowing us to use gluster as the backend

to store these files while presenting a single block device of possibly

infinite size. We used a fixed file naming convention based on the block

number which allows the system to determine which file(s) needs to be

operated on for the requested byte offset. This gave us the advantage of

automatically accessing all of gluster's file based functionality

underneath to provide a fully distributed iscsi implementation.

Would this be similar to the new iscsi implementation thats being worked

on for 3.9?

<will let others correct me here, but...>

Ultimately the idea would be to use sharding, as a part of the gluster volume graph, to distribute the blocks (or rather shard the blocks), rather than having the disk image on one distribute subvolume and hence scale disk sizes to the size of the cluster. Further, sharding should work well here, as this is a single client access case (or are we past that hurdle already?).

Not yet, we need common transaction frame in place to reduce the latency for synchronization.

What this achieves is similar to the iSCSI implementation that you talk about, but gluster doing the block splitting and hence distribution, rather than the iSCSI implementation (istgt) doing the same.

< I did a cursory check on the blog post, but did not find a shard reference, so maybe others could pitch in here, if they know about the direction>

There are two directions which will eventually converge.
1) Granular data self-heal implementation so that taking snapshot becomes as simple as reflink.
2) Bring in snapshots of file with shards - this is a bit involved compared to the solution above.

Once 2) is also complete we will have both 1) + 2) combined so that data-self-heal will heal the exact blocks inside each shard.

If the users are not worried about snapshots 2) is the best option.

Further, in your original proposal, how do you maintain device properties, such as size of the device and used/free blocks? I ask about used and free, as that is an overhead to compute, if each block is maintained as a separate file by itself, or difficult to achieve consistency of the size and block update (as they are separate operations). Just curious.

-- 
Pranith

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel