Re: Erasure code failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, I am trying it over luminous.

Well the bug has been going for 8 month and it hasn't been merged yet. Idk if that is whats preventing me to make it work. Tomorrow I will try to prove it again.


El 19/10/2017 a las 23:00, David Turner escribió:
Running a cluster on various versions of Hammer and Jewel I haven't had any problems.  I haven't upgraded to Luminous quite yet, but I'd be surprised if there is that severe of a regression especially since they did so many improvements to Erasure Coding.

On Thu, Oct 19, 2017 at 4:59 PM Jorge Pinilla López <jorpilo@xxxxxxxxx> wrote:

Well I was trying it some days ago and it didn't work for me.

maybe because of this:

http://tracker.ceph.com/issues/18749

https://github.com/ceph/ceph/pull/17619

I don't know if now it's actually working


El 19/10/2017 a las 22:55, David Turner escribió:
In a 3 node cluster with EC k=2 m=1, you can turn off one of the nodes and the cluster will still operate normally.  If you lose a disk during this state or another server goes offline, then you lose access to your data.  But assuming that you bring up the third node and let it finish backfilling/recovering before restarting any other nodes, then you're fine.

On Thu, Oct 19, 2017 at 4:49 PM Jorge Pinilla López <jorpilo@xxxxxxxxx> wrote:
Imagine we have a 3 OSDs cluster and I make an erasure pool with k=2 m=1.

If I have an OSD fail, we can rebuild the data but (I think) the hole cluster won't be able to perform IOS.

Wouldn't be possible to make the cluster work in a degraded mode?
I think it would be a good idea to make the cluster work on degraded mode and promise to re balance/re build whenever a third OSD comes alive.
On reads, it could serve the data using the live data chunks and rebuilding (if necessary) the missing ones(using cpu to calculate the data before serving// with 0 RTA) or trying to rebuild the missing parts so it actually has the 2 data chunks on the 2 live OSDs (with some RTA and space usage) or even doing both things at the same time (with high network and cpu and storage cost).
On writes, it could write the 2 data parts into the live OSDs and whenever the third OSD comes up, the cluster could re balance rebuilding the parity chunk and re positioning the parts so all OSDs have the same amount of data/work.

would this be possible?


Jorge Pinilla López
jorpilo@xxxxxxxxx
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: A34331932EBC715A

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Jorge Pinilla López
jorpilo@xxxxxxxxx
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: A34331932EBC715A


--

Jorge Pinilla López
jorpilo@xxxxxxxxx
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: A34331932EBC715A

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux