Re: Cluster down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jorge,

I am referring to a whole different network.  Redundant switches are fine
to protect against physical failure, but they can get congested too.  Refer
to

https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_redundancy

Section 5.8.1

Our setup is - ring0 goes on the fast switches on its own VLAN.  ring1 goes
onto cheap 1Gb switches, and it will take over if needed.

--
Alex Gorbachev
ISS - Storcium



On Tue, Oct 19, 2021 at 6:57 AM Jorge JP <jorgejp@xxxxxxxxxx> wrote:

> Hello Alex,
>
> I don't understand the second part of your response.
>
> All my ceph nodes are connected to two switches 40G in bound. IF I lose
> one switch all should be works without problems. Is this about you talk?
>
> Anyway, I think about your comments with corosync are correct..
>
> Thank you.
>
> ------------------------------
> *De:* Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx>
> *Enviado:* jueves, 14 de octubre de 2021 7:21
> *Para:* Jorge JP <jorgejp@xxxxxxxxxx>
> *Cc:* Marc <Marc@xxxxxxxxxxxxxxxxx>; ceph-users@xxxxxxx <
> ceph-users@xxxxxxx>
> *Asunto:* Re:  Re: Cluster down
>
> Hi Jorge,
>
> This looks like a corosync problem to me.  If corosync loses connectivity,
> the Proxmox nodes would fence and reboot.  Ideally, you'd have a second
> ring on different switch(es), even a cheap 1Gb switch will do.
>
> --
> Alex Gorbachev
> ISS - Storcium
>
>
>
> On Wed, Oct 13, 2021 at 7:07 AM Jorge JP <jorgejp@xxxxxxxxxx> wrote:
>
> Hello Marc,
>
> For add node to ceph cluster with Proxmox first I have to install Proxmox
> hehe, this is not the problem.
>
> File configuration is revised and correct. I understand your words but not
> is problem of configuration.
>
> I can understand that cluster can have problems if any servers not
> configured correctly or ports in the switches not configured correctly. But
> this server never became in a member of cluster.
>
> I extracted a part of logfile when ceph down.
>
> A bit weeks ago, I have a problem with a port configuration and remove mtu
> 9216 and various hypervisors of cluster proxmox rebooted. But today the
> server not relationated with ceph cluster. Only have public and private ips
> in same network but ports not configured.
>
> ________________________________
> De: Marc <Marc@xxxxxxxxxxxxxxxxx>
> Enviado: miércoles, 13 de octubre de 2021 12:49
> Para: Jorge JP <jorgejp@xxxxxxxxxx>; ceph-users@xxxxxxx <
> ceph-users@xxxxxxx>
> Asunto: RE: Cluster down
>
> >
> > We currently have a ceph cluster in Proxmox, with 5 ceph nodes with the
> > public and private network correctly configured and without problems.
> > The state of ceph was optimal.
> >
> > We had prepared a new server to add to the ceph cluster. We did the
> > first step of installing Proxmox with the same version. I was at the
> > point where I was setting up the network.
>
> I am not using proxmox, just libvirt. But I would say the most important
> part is your ceph cluster. So before doing anything I would make sure to
> add the ceph node first and then install other things.
>
> > For this step, I did was connect by SSH to the new server and copy the
> > network configuration of one of the ceph nodes to this new one. Of
> > course, changing the ip addresses.
>
> I would not copy at all. Just change the files manually if you did not
> edit one file correctly or the server reboots before you change the ip
> addresses you can get into all kinds of problems.
>
> > What happened when restarting the network service is that I lost access
> > to the cluster. I couldn't access any of the 5 servers that are part of
> > the  ceph cluster. Also, 2 of 3 hypervisors
> > that we have in the proxmox cluster were restarted directly.
>
> So now you know, you first have to configure networking, then ceph and
> then proxmox. Take your time adding a server. I guess the main reason you
> are in the current situation, you try to do it quick quick.
>
> > Why has this happened if the new server is not yet inside the ceph
> > cluster on the proxmox cluster and I don't even have the ports
> > configured on my switch?
>
> Without logs nobody is able to tell.
>
> > Do you have any idea?
> >
> > I do not understand, if now I go and take any server and configure an IP
> > of the cluster network and even if the ports are not even configured,
> > will the cluster knock me down?
>
> Nothing should happen if you install an OS and use ip addresses in the
> same space as your cluster/client network. Do this first.
>
> > I recovered the cluster by phisically removing the cables from the new
> > server.
>
> So wipe it, and start over.
>
> > Thanks a lot and sorry for my english...
>
> No worries, your english is much better than my spanish ;)
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux