Re: Openshift 4 SOP PR review

Neal Gompa <ngompa13@xxxxxxxxx> · Thu, 23 Sep 2021 23:05:33 -0400

On Thu, Sep 23, 2021 at 11:02 PM David Kirwan <dkirwan@xxxxxxxxxx> wrote:
>
> > On the storage, are we ok if a node goes down? ie, does it spread it
> > over all the storage nodes/raid? Or is it just in one place and you are
> > dead if that node dies?
> For storage, we maintain 3 replicas for data, spread across 3 nodes. However, not all nodes are equally resourced, we have 2 large nodes and 1 much smaller, therefore, more replicas will be spread over these 2 larger nodes. We can likely afford to lose one of the large physical nodes while still maintaining data integrity.
>
> > Is there any way to backup volumes?
> There are ways to clone, extend, take snapshots etc of these volumes. We've never done it, so it'll be a learning process for us all ;). We should sync to get a better handle on the requirements for backups. In CentOS CI we've set up backups to S3, we can certainly use some of that, eg: backup of etcd, but may need further investigation to backup the volumes managed by OCS. Will need to do some research here.
>
> > should we make a playbooks/manual/ocp.yml playbook for things like
> > - list of clusteradmins
> > - list of clustermoniting
> > - anything else we want to manage post install
> Sure yep, as we're finishing up soonish, I'd imagine the next few weeks we'll all be back focused on the Infra/Releng tasks and will be focusing on tying up any loose ends like this, and starting migration of apps.
>
> > Have we tried a upgrade of the clusters yet? Did everything go ok?
> > Do we need any docs on upgrades?
> Yes, we've already completed a number of upgrades, latest is to 4.8.11. We have SOPs for upgrades which we can copy over from the CentOS CI infra, and will make any updates required in the process.
>
>
> > Since the control plane are vm's I assume we need to drain them one at
> > a time to reboot the virthosts they are on?
> If we are rebooting a single vmhost/control plane VM at a time, yes that should be good. If we are doing more than 1 at the same time, we should do a full graceful cluster shutdown, and then a graceful cluster startup. We have SOPs for this in CentOS CI also, we'll get those added here and any content updates made.
>
> > * Should we now delete the kubeadmin user? In 3.x I know they advise to
> > do that after auth is setup.
> We can delete it, as we have system:admin available from the os-control01 node. Best practices might suggest we do. We can also give cluster-admin role to all users in the sysadmin-main and sysadmin-openshift groups.
> I'm in two minds about deleting it, I was hoping to wait until we get a solution that syncs IPA groups/users to Openshift. There is an official supported solution for syncing LDAP (think that will work?).
>
> > * Right now the api is only internal. Is it worth getting a forward
> > setup to allow folks to use oc locally on their machines? It would
> > expose that api to the world, but of course it would still need auth.
> We'd love to expose it, but.. all interaction with the clusters upto this point have also only been done via Ansible, so if it turns out we can't expose the API like this we're ok with that. With minor changes to the playbook we should be able to at least replicate the current 3.11 experience.
>
> >> That's what we decided to do for the CentOS CI ocp setup, and so CI
> >> tenants can use oc from their laptop/infra. As long as cert exposed for
> >> default ingress has it added in the SAN, it works fine :
> >>
> >> X509v3 Subject Alternative Name:
> >>                 DNS:*.apps.ocp.ci.centos.org, DNS:api.ocp.ci.centos.org,
> >> DNS:apps.ocp.ci.centos.org
>
> > Yeah, thats all fine, but to make it work for our setup, I would need to
> > get RHIT to nat in port 6443 to proxy01/10 from the internet. At least I
> > think thats the case. Openshift 3 could just use https, but alas, I fear
> > OCP4 needs that 6443 port.
> Yep think you're right on that.
>
>
> >Do we want to try and enable http/2 ingress?
> https://docs.openshift.com/container-platform/4.5/networking/ingress-operator.html#nw-http2-haproxy_configuring-ingress
> We can take a look and see if we can figure it out!
>
>
> > We will want to enable kubevirt/whatever it's called...
> We definitely want to make this available, but we will have to set quotas on usage. We should enable on staging, but should we enable on production?
>
> On CentOS CI OCP4 cluster, we have Openshift Virtualization / kubevirt installed, but I don't think anyone is actually using it. We have several tenants which have elevated permissions, and are then accessing KVM directly to bring up VMs on the Openshift nodes, this is something we want to avoid, as we can't effectively set quotas on this type of usage.
>

I believe the Hyperscale SIG is using it. Or if they're not, it's
because they don't know it's there.

-- 
真実はいつも一つ！/ Always, there's only one truth!
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure