Re: Storage cluster advise, anybody?

Traiano Welcome <traiano@xxxxxxxxx> · Fri, 22 Apr 2016 23:24:25 +0300

Hi Valeri

On Fri, Apr 22, 2016 at 10:24 PM, Digimer <lists@xxxxxxxxxx> wrote:
> On 22/04/16 03:18 PM, Valeri Galtsev wrote:
>> Dear Experts,
>>
>> I would like to ask everybody: what would you advise to use as a storage
>> cluster, or as a distributed filesystem.
>>
>> I made my own research of what I can do, but I hit a snag with my
>> seemingly best choice, so I decided to stay away from it finally, and ask
>> clever people what they would use.
>>
>> My requirements are:
>>
>> 1. I would like to have one big (say, comparable to petabyte) filesystem,
>> accessible on more than one machine, composed of disk space leftovers on a
>> bunch of machines having 1 gigabit per second ethernet connections
>
> This sounds like you want a cloud-type storage, like ceph or gluster. I
> don't use either, so I can't speak to them in detail.
>
>> 2. It can be a bit slow, as filesystem one would need for backups onto it
>> (say, using bacula or bareos), and/or for long term storage of large
>> datasets, portions of which can be copied over to faster storage for
>> processing if necessary. I would be thinking in 1-2 TB of data written to
>> it daily.
>>
>> 3. It would be great to have it single machine failure/reboot resilient
>
> HA solutions put a priority on resilience, not resource utilization
> efficiency, so you need to pick your priority. If you put a priority on
> resilience and availability, you'll want to do something like create two
> machines with equal storage, configure them in single-primary and use a
> floating IP to expore the space over NFS or similar.
>
> Then you would use pacemaker to manage the floating IP, fence (stonith)
> a lost node, and promote drbd->mount FS->start nfsd->start floating IP.
>
> This is not efficient, but it is very resilient. All of this is 100%
> open source.
>
>> 4. metadata machines should be redundant (or at least backup medatada host
>> should be manually convertible into master metadata host if fatal failure
>> to master or corruption of its data happens)
>>

Sounds like Hadoop HDFS might be worth looking into:

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Overview

>>
>> What I would like to avoid/exclude:
>>
>> 1. Proprietary commercial solutions, as:
>>
>> a. I would like to stay on as minimal budget as possible
>> b. I want to be able to predict that it will exist for long time, and I
>> have better experience with my predictions of this sort about open source
>> projects as opposed to proprietary ones
>>
>> 2. Open source solutions using portions of proprietary closed source
>> binaries/libraries (e.g., I would like to stay away from google
>> proprietary code/binaries/libraries/modules)
>>
>> 3. Kernel level modifications. I really would like to have this
>> independent of OS as much as I can, or rather available on multiple OSes
>> (though I do not like Java based things - just my personal experience with
>> some of them). I have a bunch of Linux boxes and a bunch of FreeBSD boxes,
>> and I do not want to exclude neither of them if possible. Also, the need
>> to have custom Linux kernel specifically scares me: Linux kernels get
>> critical updates often, and having customizations lagging behind the need
>> of critical update is as unpleasant as rebooting the machine because of
>> kernel update is.
>>
>> I'm not too scared of a "split nature" projects: proprietary projects
>> having open source satellite. I have mixed experience with those, using
>> open source satellite I mean. Some of them are indeed not neglected, and
>> even though you may be missing some features commercial counterpart has,
>> some are really great ones: they are just missing commercial support, and
>> maybe having a bit sparse documentation, thus making you to invest more
>> effort into making it work, which I don't mind: I can earn my sysadmin's
>> salary here. I would say I more often had good experience with those than
>> bad one (and I have a list of early indications of potential bad outcome,
>> so I can more or less predict my future with this kind of projects).
>>
>> <rant>
>> I really didn't mean to write this, but I figure it probably will surface
>> once I start getting your advices, so here it is. I did my research having
>> my requirements in mind and came up with the solution: moosefs. It is not
>> reviewed much, no reviews with criticism at all, and not much you can ("I
>> could" I should say) find howtos about customizations, performance tuning
>> etc. It installs without a hitch. It runs well, until you start stress
>> writing a lot to it in parallel, then it started performing exponentially
>> badly for me. Here is where extensive attempts to find performance tuning
>> documentation faces lack of success. What made my decision to never ever
>> use it in a future was the following. I started migrating data back from
>> moosefs to local UFS (that is FreeBSD box) filesystem using rsync command.
>> What I observed was: source files after they have been touched by rsync
>> changed their timestamps. As if instead of creation timestamp it is an
>> access timestamp on moosefs. This renders rsync from moosefs useless, as
>> you can not re-run failed rsync, and you obliterate some of metadata of
>> the source ("creation" timestamp). I wrote e-mail to sourceforge moosefs
>> mail list, mentioning all this and the fact that I am using open source
>> moosefs. Next day they replied asking whether I use version 3."this" or
>> version 3."that", as they want to know in which of them they have a bug.
>> Whereas latest open source version they have everywhere, including
>> sourceforge is older version: 2.0.88.
>> Basically, my decision was made. Sorry for venting it out here, but I
>> figured, it will happen some moment when I will get your advises.
>> </rant>
>>
>> Thanks a lot for all your advises!
>>
>> Valeri
>
> Before you go any further, you need to decide what is your priority. If
> you need resilience, prepare to invest in the back-end hardware. If it
> is more important to scrape unused resources from everywhere, then
> resilience is not going to be so good.
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> _______________________________________________
> CentOS mailing list
> CentOS@xxxxxxxxxx
> https://lists.centos.org/mailman/listinfo/centos
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
https://lists.centos.org/mailman/listinfo/centos