OSD journal sizing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've been researching OSD journals lately, and just realized something that's not particularly nice.

SSDs are getting bigger and bigger. It's typical for customers to use e.g. a 500GB SSD for their journals. If the OSDs themselves are on spinners, there is no use in having journals bigger than 10GB because the 7200 RPMs impose a hard ceiling on throughput that spinners can achieve.

Now, SSDs need wear-leveling to avoid premature failure. If only a small region of the SSD is partitioned/used, users may fear (regardless of the reality, whatever it may be) that this small region will be "pummeled to death" by Ceph and cause the expensive SSD to fail prematurely.

I *thought* this could be addressed by creating the journal partitions large enough to fill the entire disk and using the "osd journal size" parameter to limit how much disk capacity is actually used for journaling, but now I just noticed that the "osd journal size" parameter "is ignored if the journal is a block device, and the entire block device is used."

And while working on http://tracker.ceph.com/issues/16878 it occurred to me that large journals are not getting tested much. Is that a valid assumption?

Next week I plan to attempt to make a reproducer for the bug, and then try to come up with a patch to fix it. Any ideas/pointers, either here or in the tracker, would be appreciated.

--
Nathan Cutler
Software Engineer Distributed Storage
SUSE LINUX, s.r.o.
Tel.: +420 284 084 037
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux