Re: Sharding - Inode write fops - recoverability from failures - design

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/24/2015 12:19 PM, Krutika Dhananjay wrote:


------------------------------------------------------------------------

    *From: *"Vijay Bellur" <vbellur@xxxxxxxxxx>
    *To: *"Krutika Dhananjay" <kdhananj@xxxxxxxxxx>
    *Cc: *"Gluster Devel" <gluster-devel@xxxxxxxxxxx>
    *Sent: *Tuesday, February 24, 2015 11:35:28 AM
    *Subject: *Re:  Sharding - Inode write fops -
    recoverability from failures - design

    On 02/24/2015 10:36 AM, Krutika Dhananjay wrote:
     >
     >
     >
    ------------------------------------------------------------------------
     >
     >     *From: *"Vijay Bellur" <vbellur@xxxxxxxxxx>
     >     *To: *"Krutika Dhananjay" <kdhananj@xxxxxxxxxx>, "Gluster Devel"
     >     <gluster-devel@xxxxxxxxxxx>
     >     *Sent: *Monday, February 23, 2015 5:25:57 PM
     >     *Subject: *Re:  Sharding - Inode write fops -
     >     recoverability from failures - design
     >
     >     On 02/22/2015 06:08 PM, Krutika Dhananjay wrote:
     >      > Hi,
     >      >
     >      > Please find the design doc for one of the problems in
    sharding which
     >      > Pranith and I are trying to solve and its solution @
     >      > http://review.gluster.org/#/c/9723/1.
     >      > Reviews and feedback are much appreciated.
     >      >
     >
     >     Can this feature be made optional? I think there are use
    cases like
     >     virtual machine image storage, hdfs etc. where the number of
    metadata
     >     queries might not be very high. It would be an acceptable
    tradeoff in
     >     such cases to not be very efficient for answering metadata
    queries but
     >     be very efficient for data operations.
     >
     >     IOW, can we have two possible modes of operation for the sharding
     >     translator to answer metadata queries?
     >
     >     1. One that behaves like a regular filesystem where we expect
    a mix of
     >     data and metadata operations. Your document seems to cover
    that part
     >     well. We can look at optimizing behavior for multi-threaded
    single
     >     writer use cases after an initial implementation is in place.
     >     Techniques
     >     like eager locking can be applied here.
     >
     >     2. Another mode where we do not expect a lot of metadata
    queries. In
     >     this mode, we can visit all nodes where we have shards to
    answer these
     >     queries.
     >
     > But for sharding translator to be able to visit all shards, it is
     > required to know the last shard number.
     > Without this, it will never know when to stop looking up the
    different
     > shards. For this to happen, we
     > still need to maintain the size attribute for each file.
     >

    Wouldn't maintaining the total number of shards in the metadata
    shard be
    sufficient?

Maintaining the correctness of "total number of shards" would again
incur the same cost as maintaining size or any other metadata attribute
if a client/brick crashes in the middle of a write fop before the
attribute is committed to disk.
In other words, we will again need to maintain a "dirty" and "committed"
copy of the shard_count to ensure its correctness.


I think the cost of maintaining "total number of shards" is not as expensive as maintaining size or any other metadata attribute. The shard count needs to be updated only when an extending operation results in the creation of a new shard or when a truncate operation results in the removal of a shard. Maintaining other metadata attributes would need a 5 phase transaction for every write operation. Isn't that the case?

-Vijay

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux