Hi Pranith, For the EC encoding/decoding algorithm, could we design a plug-in mechanism to make users can choose their own algorithm or can use the third side library just like Ceph? And I am also curious why originally the IDA algorithm is chosen, instead of the common used Reed-Solomon algorithm? Best Regards, Fang Huang > On Monday, 14 September 2015, 16:30, Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote: > > hi, > > Here is a list of common improvements for both ec and afr planned over > the next few months: > > 1) Granular entry self-heals. > Both afr and ec at the moment do lot of readdirs and lookups to > figure out the differences between the directories to perform heals. > Kritika, Ravi, Anuradha and I are discussing about how to prevent this. > The base algo is to store only the names that need heal in > .glusterfs/indices/entry-changes/<parent-dir-gfid>/ as links to base > file in .glusterfs/indices/entry-changes of the bricks. So only the > names that need to be healed will be going through name heals. > We want to complete this for 3.8 definitely. > > 2) Granular data self-heals. > At the moment even if a single byte changes in the file afr, ec > read the entire file to fix the problems. We are thinking of preventing > this by remembering where the changes happened on the file in extended > attributes. There will be a new extended attribute on the file which > represents a bit map of the changes and each bit represents a range that > needs healing. This extended attribute will have a maximum size it can > represent, the extra chunks will be represented like shards in > .glusterfs/indices/data-changes/<gfid-<block-num>> extended > attribute on > this block will store ranges that need heals. > > For example: If we have extended attribute value maximum size as 4KB and > each bit represents 128KB (i.e. first bit represents changes done from > offset 0-128KB, 2nd bit 128KB+1-256KB etc.), In single extended > attribute we can store changes happening to file upto 4GB (We are > thinking of dynamically increasing the size represented by each bit from > say 4k to 128k, but this is still in design). For changes that are > happening from offset 4GB+1 - 8GB will be stored in extended attribute > of .glusterfs/indices/data-changes/<gfid-of-file-1>. Changes happening > from offset 8GB+1 to 12GB will be stored in extended attribute of > .glusterfs/indices/data-changes/<gfid-of-file-2>, (please note that > these files are empty, they will just contain extended attributes) etc. > We want to complete this for 3.8 (stretch goal) > > 3) Performance & throttling improvements for self-heal: > We are also looking into the multi-threaded self-heal daemon patch > by Richard for inclusion in 3.8. We are waiting for the discussions by > Raghavendra G on QoS to be over before coming to any decisions on > throttling. > > After we have compound fops: > Goal here is to come up with compound fops and prevent un-necessary > round trips: > 4) Transaction latency improvements: > On afr: > In the unoptimized version of transaction we have: 1) Lock, 2) > Pre-op 3) op 4) Post-op 5) unlock > We will > have: 1) > Lock, 2) Pre-op + op 3) post-op + unlock > This reduces round trips from 5 to 3 in the un-optimized version > of afr-transaction. > On EC: > In the unoptimized version (worst case of unaligned write) of > transaction we have: 1) Lock, 2) get version, size xattrs 3) reads of > pre, post unaligned chunks 4) op 5) update version, size 6) unlock > We will > have: 1) > Lock + get version, size xattrs + reads of pre, post unaligned chunks, > 2) op 3) update version, size + unlock > This reduces round trips from 6 to 3 in the un-optimized version > of ec-transaction. > > 5) Entry self-heal per name latency improvements: > Before: 1) Lock, 2) lookup to determine if the file needs to be > deleted/created 3) create/delete 4) Unlock > After: 1) Lock + lookup 2) delete/create + unlock > > Roadmap that applies only for EC: for 3.8 > - Use SSE2/AVX/NEON extensions when available to speed up Galois Field > calculations > - Use a systematic matrix to improve encoding performance (it will also > improve decoding performance when all bricks are healthy) > - Implement a new algorithm able to detect and repair chunks of data on > the fly. > > Roadmap that applies only for AFR: > 1) Once granular entry/data heals, throttling are in, we can look at > generalizing Richard's lazy replication patch to be used for Near > synchronous replication between data centers and possibly just the > bricks, haven't looked into the patch myself. > > We will be sending out more mails as soon as design completes for each > of these items. We are eagerly waiting for Xavi to come back to get his > comments as well for how EC will be impacted by the common changes. > Feedback on this plan is very welcome! > > Pranith > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel