Re: Bounding OSD memory requirements during peering/recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/02/15 15:31, Gregory Farnum wrote:

So, memory usage of an OSD is usually linear in the number of PGs it
hosts. However, that memory can also grow based on at least one other
thing: the number of OSD Maps required to go through peering. It
*looks* to me like this is what you're running in to, not growth on
the number of state machines. In particular, those past_intervals you
mentioned. ;)

Hi Greg,

Right, that sounds entirely plausible, and is very helpful.

In practice, that means I'll need to be careful to avoid this situation occurring in production — but given that's unlikely to occur except in the case of non-trivial neglect, I don't think I need be particularly concerned.

(Happily, I'm in the situation that my existing cluster is purely for testing purposes; the data is expendable.)

That said, for my own peace of mind, it would be valuable to have a procedure that can be used to recover from this state, even if it's unlikely to occur in practice.

I'm currently running an experiment where I augment the RAM of each OSD node with 10GB swapfiles on each spinning OSD disk, so that there's a big-enough backing-store to complete log reconstruction.

(You obviously wouldn't want to operate in this manner during normal production operation — the loss of a single drive would cause a hard machine-crash, and the performance will be fairly diabolical, particularly if you allow client workloads to carry on in the background.)

I did try enabling zswap on the Utopic LTS kernel as supplied as an option in Ubuntu 14.04; however, the kernel was not stable in such a configuration and several machines crashed under memory pressure.

I do have OSDs committing suicide periodically, probably because they're insufficiently responsive to heartbeats as they start to hit swap. This is before experimenting with the various OSD tuning dials for timeouts, so some improvement may be possible.

In the meantime, I've configured the ceph-osd Upstart jobs to apply a post-exec command of `sleep 3600` to reduce the rate at which they're respawned.

So far, the resulting configuration seems to be making progress, albeit moderately slowly.

Cheers,
David
--
David McBride <dwm37@xxxxxxxxx>
Unix Specialist, University Information Services
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux