has the 32 group limit been fixed yet? If not how about that :) ? https://bugzilla.redhat.com/show_bug.cgi?id=789961
On Thu, Mar 13, 2014 at 8:01 AM, Jeff Darcy <jdarcy@xxxxxxxxxx> wrote:
> I am a little bit impressed by the lack of action on this topic. I hate to beWelcome, Carlos. I think it's great that you're taking initiative here.
> "that guy", specially being new here, but it has to be done.
> If I've got this right, we have here a chance of developing Gluster even
> further, sponsored by Google, with a dedicated programmer for the summer.
> In other words, if we play our cards right, we can get a free programmer and
> at least a good start/advance on this fantastic.
However, it's also important to set proper expectations for what a GSoC intern
could reasonably be expected to achieve. I've seen some amazing stuff out of
GSoC, but if we set the bar too high then we end up with incomplete code and
the student doesn't learn much except frustration.
GlusterFS consists of 430K lines of code in the core project alone. Most of
it's written in a style that is generally hard for newcomers to pick up -
both callback-oriented and highly concurrent, often using our own "unique"
interpretation of standard concepts. It's also in an area (storage) that is
not well taught in most universities. Given those facts and the short
duration of GSoC, it's important to focus on projects that don't require deep
knowledge of existing code, to keep the learning curve short and productive
time correspondingly high. With that in mind, let's look at some of your
suggestions.
It certainly would have been nice to have you at the community IRC meeting
> I think it would be nice to listen to the COMMUNITY (yes, that means YOU),
> for either suggestions, or at least a vote.
yesterday, at which we discussed release content for 3.6 based on the
feature proposals here:
http://www.gluster.org/community/documentation/index.php/Planning36
The results are here:
http://titanpad.com/glusterfs-3-6-planning
> 1) There is a project going on ( https://forge.gluster.org/disperse ), that
> My opinion, being also my vote, in order of PERSONAL preference:
> consists on re-writing the stripe module on gluster. This is speciallyThis was decided as a core feature for 3.6. I'll let Xavier (the feature
> important because it has a HUGE impact on Total Cost of Implementation
> (customer side), Total Cost of Ownership, and also matching what the
> competition has to offer. Among other things, it would allow gluster to
> implement a RAIDZ/RAID5 type of fault tolerance, much more efficient, and
> would, as far as I understand, allow you to use 3 nodes as a minimum
> stripe+replication. This means 25% less money in computer hardware, with
> increased data safety/resilience.
owner) answer w.r.t. whether there's any part of it that would be
appropriate for GSoC.
This is also core for 3.6 under the name "policy based split brain
> 2) We have a recurring issue with split-brain solution. There is an entry on
> trello asking/suggesting a mechanism that arbitrates this resolution
> automatically. I pretty much think this could come together with another
> solution that is file replication consistency check.
resolution":
http://www.gluster.org/community/documentation/index.php/Features/pbspbr
Implementing this feature requires significant knowledge of AFR, which both
causes split brain and would be involved in its repair. Because it's also
one of our most complicated components, and the person who just rewrote it
won't be around to offer help, I don't think this project *as a whole*
would be a good fit for GSoC. On the other hand, there might be specific
pieces of the policy implementation (not execution) that would be a good
fit.
Looks like somebody has read the Isilon marketing materials. ;)
> 3) Accelerator node project. Some storage solutions out there offer an
> "accelerator node", which is, in short, a, extra node with a lot of RAM,
> eventually fast disks (SSD), and that works like a proxy to the regular
> volumes. active chunks of files are moved there, logs (ZIL style) are
> recorded on fast media, among other things. There is NO active project for
> this, or trello entry, because it is something I started discussing with a
> few fellows just a couple of days ago. I thought of starting to play with
> RAM disks (tmpfs) as scratch disks, but, since we have an opportunity to do
> something more efficient, or at the very least start it, why not ?
A full production-level implementation of this, with cache consistency and
so on, would be a major project. However, a non-consistent prototype good
for specific use cases - especially Hadoop, as Jay mentions - would be
pretty easy to build. Having a GlusterFS server (for the real clients)
also be a GlusterFS client (to the real cluster) is pretty straightforward.
Testing performance would also be a significant component of this, and IMO
that's something more developers should learn about early in their careers.
I encourage you to keep thinking about how this could be turned into a real
GSoC proposal.
Keep the ideas coming!
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users