Re: osd down question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The OSDs will heartbeat each other, and report back to the monitors if
any other OSD fails to respond.

An OSD that fails to respond is effectively down, since it's not doing
the things that it's supposed to do.  It is possible for this process
to cause problems.  For example, I've had some OSDs on an overloaded
node mark all of the other OSDs in the cluster down, because the
overloaded node wasn't processing the heartbeat responses quickly
enough.  The solution there was to adjust "mon osd min down reporters"
and "mon osd min down reports" so that a single node can't do that.

You might see something like "epoch XXX wrongly marked me down",
followed by the OSD rejoining the cluster.  That's a sign that the OSD
was overloaded, but not down.  Once it was kicked out of the cluster,
it caught up with the backlog and was able to rejoin.  This shouldn't
cause a chain reaction though.

If you're not seeing that, then the OSD really is unresponsive, and
needs to be restarted.  The other OSDs will start replicating it's
data automatically to make the cluster healthy again.  This should not
cause a chain reaction.  If your cluster is overloaded (very close to
running out of CPU, RAM, or Disk IO), then a failed OSD can cause a
chain reaction as other OSDs pick up the failed OSD's workload.



On Mon, Nov 3, 2014 at 11:12 PM, 飞 <duron800@xxxxxx> wrote:
> hello, I am running ceph v0.87 for one week, at this week,
> many osd have marking down, but I run "ps -ef | grep osd", I can see
> the osd process, the osd not really down, then, I check osd log,
> I see many logs like  "osd.XX from dead osd.YY,marking down",
> if the 0.87 will check other osd process ? if some osd is down, then the mon
> will mark the current to down state ?
> This will cause a chain reaction, leading to failure of the entire cluster,
> it is a bug ?
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux