Re: Sharding - Inode write fops - recoverability from failures - design

Krutika Dhananjay <kdhananj@xxxxxxxxxx> · Tue, 24 Feb 2015 01:49:01 -0500 (EST)

From: "Vijay Bellur" <vbellur@xxxxxxxxxx>
To: "Krutika Dhananjay" <kdhananj@xxxxxxxxxx>
Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
Sent: Tuesday, February 24, 2015 11:35:28 AM
Subject: Re:  Sharding - Inode write fops - recoverability from failures - design

On 02/24/2015 10:36 AM, Krutika Dhananjay wrote:
>
>
> ------------------------------------------------------------------------
>
>     *From: *"Vijay Bellur" <vbellur@xxxxxxxxxx>
>     *To: *"Krutika Dhananjay" <kdhananj@xxxxxxxxxx>, "Gluster Devel"
>     <gluster-devel@xxxxxxxxxxx>
>     *Sent: *Monday, February 23, 2015 5:25:57 PM
>     *Subject: *Re: [Gluster-devel] Sharding - Inode write fops -
>     recoverability from failures - design
>
>     On 02/22/2015 06:08 PM, Krutika Dhananjay wrote:
>      > Hi,
>      >
>      > Please find the design doc for one of the problems in sharding which
>      > Pranith and I are trying to solve and its solution @
>      > http://review.gluster.org/#/c/9723/1.
>      > Reviews and feedback are much appreciated.
>      >
>
>     Can this feature be made optional? I think there are use cases like
>     virtual machine image storage, hdfs etc. where the number of metadata
>     queries might not be very high. It would be an acceptable tradeoff in
>     such cases to not be very efficient for answering metadata queries but
>     be very efficient for data operations.
>
>     IOW, can we have two possible modes of operation for the sharding
>     translator to answer metadata queries?
>
>     1. One that behaves like a regular filesystem where we expect a mix of
>     data and metadata operations. Your document seems to cover that part
>     well. We can look at optimizing behavior for multi-threaded single
>     writer use cases after an initial implementation is in place.
>     Techniques
>     like eager locking can be applied here.
>
>     2. Another mode where we do not expect a lot of metadata queries. In
>     this mode, we can visit all nodes where we have shards to answer these
>     queries.
>
> But for sharding translator to be able to visit all shards, it is
> required to know the last shard number.
> Without this, it will never know when to stop looking up the different
> shards. For this to happen, we
> still need to maintain the size attribute for each file.
>

Wouldn't maintaining the total number of shards in the metadata shard be 
sufficient?
Maintaining the correctness of "total number of shards" would again incur the same cost as maintaining size or any other metadata attribute if a client/brick crashes in the middle of a write fop before the attribute is committed to disk.
In other words, we will again need to maintain a "dirty" and "committed" copy of the shard_count to ensure its correctness.

-Krutika

-Vijay

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel