Il 17/12/2023 14:52, Joe Julian ha scritto:
From what I've been told (by experts) it's really hard to make it happen. More if proper redundancy of MON and MDS daemons is implemented on quality HW.
LSI isn't exactly crap hardware. But when a flaw causes it to drop drives under heavy load, the rebalance from dropped drives can cause that heavy load causing a cascading failure. When the journal is never idle long enough to checkpoint, it fills the partition and ends up corrupted and unrecoverable.
Good to know. Better to add a monitoring service that stops everything
when the log is too full.
That also applies to Gluster, BTW, even if with less severe
consequences: sometimes, "peer files" got lost due to /var filling up
and glusterd wouldn't come up after a reboot.
Neither Gluster nor Ceph are "backup solutions", so if the data is not easily replaceable it's better to have it elsewhere. Better if offline.
It's a nice idea but when you're dealing in petabytes of data, streaming in as fast as your storage will allow, it's just not physically possible.
Well, it will have to stop sometimes, or you'd need an infinite storage,
no? :) Usually data from experiments comes in bursts, with (often large)
intervals when you can process/archive it.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users