> On the storage, are we ok if a node goes down? ie, does it spread it
> over all the storage nodes/raid? Or is it just in one place and you are
> over all the storage nodes/raid? Or is it just in one place and you are
> dead if that node dies?
For storage, we maintain 3 replicas for data, spread across 3 nodes. However, not all nodes are equally resourced, we have 2 large nodes and 1 much smaller, therefore, more replicas will be spread over these 2 larger nodes. We can likely afford to lose one of the large physical nodes while still maintaining data integrity.
> Is there any way to backup volumes?
There are ways to clone, extend, take snapshots etc of these volumes. We've never done it, so it'll be a learning process for us all ;). We should sync to get a better handle on the requirements for backups. In CentOS CI we've set up backups to S3, we can certainly use some of that, eg: backup of etcd, but may need further investigation to backup the volumes managed by OCS. Will need to do some research here.
> should we make a playbooks/manual/ocp.yml playbook for things like
> - list of clusteradmins
> - list of clustermoniting
> - anything else we want to manage post install
> - list of clusteradmins
> - list of clustermoniting
> - anything else we want to manage post install
Sure yep, as we're finishing up soonish, I'd imagine the next few weeks we'll all be back focused on the Infra/Releng tasks and will be focusing on tying up any loose ends like this, and starting migration of apps.
> Have we tried a upgrade of the clusters yet? Did everything go ok?
> Do we need any docs on upgrades?
> Do we need any docs on upgrades?
Yes, we've already completed a number of upgrades, latest is to 4.8.11. We have SOPs for upgrades which we can copy over from the CentOS CI infra, and will make any updates required in the process.
> Since the control plane are vm's I assume we need to drain them one at
> a time to reboot the virthosts they are on?
> a time to reboot the virthosts they are on?
If we are rebooting a single vmhost/control plane VM at a time, yes that should be good. If we are doing more than 1 at the same time, we should do a full graceful cluster shutdown, and then a graceful cluster startup. We have SOPs for this in CentOS CI also, we'll get those added here and any content updates made.
> * Should we now delete the kubeadmin user? In 3.x I know they advise to
> do that after auth is setup.
> do that after auth is setup.
We can delete it, as we have system:admin available from the os-control01 node. Best practices might suggest we do. We can also give cluster-admin role to all users in the sysadmin-main and sysadmin-openshift groups.
I'm in two minds about deleting it, I was hoping to wait until we get a solution that syncs IPA groups/users to Openshift. There is an official supported solution for syncing LDAP (think that will work?).
> * Right now the api is only internal. Is it worth getting a forward
> setup to allow folks to use oc locally on their machines? It would
> expose that api to the world, but of course it would still need auth.
> setup to allow folks to use oc locally on their machines? It would
> expose that api to the world, but of course it would still need auth.
We'd love to expose it, but.. all interaction with the clusters upto this point have also only been done via Ansible, so if it turns out we can't expose the API like this we're ok with that. With minor changes to the playbook we should be able to at least replicate the current 3.11 experience.
>> That's what we decided to do for the CentOS CI ocp setup, and so CI
>> tenants can use oc from their laptop/infra. As long as cert exposed for
>> default ingress has it added in the SAN, it works fine :
>>
>> X509v3 Subject Alternative Name:
>> DNS:*.apps.ocp.ci.centos.org, DNS:api.ocp.ci.centos.org,
>> DNS:apps.ocp.ci.centos.org
> Yeah, thats all fine, but to make it work for our setup, I would need to
> get RHIT to nat in port 6443 to proxy01/10 from the internet. At least I
> think thats the case. Openshift 3 could just use https, but alas, I fear
> OCP4 needs that 6443 port.
>> tenants can use oc from their laptop/infra. As long as cert exposed for
>> default ingress has it added in the SAN, it works fine :
>>
>> X509v3 Subject Alternative Name:
>> DNS:*.apps.ocp.ci.centos.org, DNS:api.ocp.ci.centos.org,
>> DNS:apps.ocp.ci.centos.org
> Yeah, thats all fine, but to make it work for our setup, I would need to
> get RHIT to nat in port 6443 to proxy01/10 from the internet. At least I
> think thats the case. Openshift 3 could just use https, but alas, I fear
> OCP4 needs that 6443 port.
Yep think you're right on that.
>Do we want to try and enable http/2 ingress?
https://docs.openshift.com/container-platform/4.5/networking/ingress-operator.html#nw-http2-haproxy_configuring-ingress
https://docs.openshift.com/container-platform/4.5/networking/ingress-operator.html#nw-http2-haproxy_configuring-ingress
We can take a look and see if we can figure it out!
> We will want to enable kubevirt/whatever it's called...
We definitely want to make this available, but we will have to set quotas on usage. We should enable on staging, but should we enable on production?
On CentOS CI OCP4 cluster, we have Openshift Virtualization / kubevirt installed, but I don't think anyone is actually using it. We have several tenants which have elevated permissions, and are then accessing KVM directly to bring up VMs on the Openshift nodes, this is something we want to avoid, as we can't effectively set quotas on this type of usage.
On Fri, 24 Sept 2021 at 07:24, Kevin Fenzi <kevin@xxxxxxxxx> wrote:
On Thu, Sep 23, 2021 at 07:44:49AM +0200, Fabian Arrotin wrote:
> On 23/09/2021 02:55, Neal Gompa wrote:
> > On Wed, Sep 22, 2021 at 7:12 PM Kevin Fenzi <kevin@xxxxxxxxx> wrote:
> <snip>
>
> >>
> >> * Since the control plane are vm's I assume we need to drain them one at
> >> a time to reboot the virthosts they are on?
>
> Correct
>
> >>
> >> * Should we now delete the kubeadmin user? In 3.x I know they advise to
> >> do that after auth is setup.
> >>
> >
> > I'm not sure that's a good idea. I'm not even certain that was a good
> > idea in the OCP 3.x days, because eliminating the kubeadmin user means
> > you lose your failsafe login if all else fails.
>
> +1 here : the reason why we decided to still keep kubeadmin on the other
> OCP clusters used for CentOS CI and Stream is exactly for that reason :
> still be able to login, if there is a problem with the oauth setup, and
> troubleshoot issues if (for example) ipsilon or IPA have troubles ... :-)
We can keep it if folks like. I'd really prefer we don't use it except
for emergency though. Having people do things as their user will make it
way easier to see who did what. ;)
> >> * Right now the api is only internal. Is it worth getting a forward
> >> setup to allow folks to use oc locally on their machines? It would
> >> expose that api to the world, but of course it would still need auth.
>
> That's what we decided to do for the CentOS CI ocp setup, and so CI
> tenants can use oc from their laptop/infra. As long as cert exposed for
> default ingress has it added in the SAN, it works fine :
>
> X509v3 Subject Alternative Name:
> DNS:*.apps.ocp.ci.centos.org, DNS:api.ocp.ci.centos.org,
> DNS:apps.ocp.ci.centos.org
Yeah, thats all fine, but to make it work for our setup, I would need to
get RHIT to nat in port 6443 to proxy01/10 from the internet. At least I
think thats the case. Openshift 3 could just use https, but alas, I fear
OCP4 needs that 6443 port.
kevin
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
--
David Kirwan
Software Engineer
Community Platform Engineering @ Red Hat
T: +(353) 86-8624108 IM: @dkirwan
_______________________________________________ infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure