On Fri, Sep 24, 2021 at 12:01:27PM +0900, David Kirwan wrote: > > On the storage, are we ok if a node goes down? ie, does it spread it > > over all the storage nodes/raid? Or is it just in one place and you are > > dead if that node dies? > For storage, we maintain 3 replicas for data, spread across 3 nodes. > However, not all nodes are equally resourced, we have 2 large nodes and 1 > much smaller, therefore, more replicas will be spread over these 2 larger > nodes. We can likely afford to lose one of the large physical nodes while > still maintaining data integrity. ok. Fair enough. > > Is there any way to backup volumes? > There are ways to clone, extend, take snapshots etc of these volumes. We've > never done it, so it'll be a learning process for us all ;). We should sync > to get a better handle on the requirements for backups. In CentOS CI we've > set up backups to S3, we can certainly use some of that, eg: backup of > etcd, but may need further investigation to backup the volumes managed by > OCS. Will need to do some research here. Yeah, backups of etcd would be nice, but mostly I was thinking of applications that have persistent data. Right now we have those on netapp NFS volumes, where it keeps snapshots and mirrors to another site. I suppose we could just keep using NFS for data that has to persist and just use local for other things, but it's sure nice to have it dynamically provisioned. ;) > > should we make a playbooks/manual/ocp.yml playbook for things like > > - list of clusteradmins > > - list of clustermoniting > > - anything else we want to manage post install > Sure yep, as we're finishing up soonish, I'd imagine the next few weeks > we'll all be back focused on the Infra/Releng tasks and will be focusing on > tying up any loose ends like this, and starting migration of apps. ok. > > Have we tried a upgrade of the clusters yet? Did everything go ok? > > Do we need any docs on upgrades? > Yes, we've already completed a number of upgrades, latest is to 4.8.11. We > have SOPs for upgrades which we can copy over from the CentOS CI infra, and > will make any updates required in the process. Great. > > Since the control plane are vm's I assume we need to drain them one at > > a time to reboot the virthosts they are on? > If we are rebooting a single vmhost/control plane VM at a time, yes that > should be good. If we are doing more than 1 at the same time, we should do > a full graceful cluster shutdown, and then a graceful cluster startup. We > have SOPs for this in CentOS CI also, we'll get those added here and any > content updates made. > > > * Should we now delete the kubeadmin user? In 3.x I know they advise to > > do that after auth is setup. > We can delete it, as we have system:admin available from the os-control01 > node. Best practices might suggest we do. We can also give cluster-admin > role to all users in the sysadmin-main and sysadmin-openshift groups. Yeah, we should put this in the playbook so it's very clear who has this and when it was added, etc. > I'm in two minds about deleting it, I was hoping to wait until we get a > solution that syncs IPA groups/users to Openshift. There is an official > supported solution for syncing LDAP (think that will work?). Yeah, needs investigation. > > * Right now the api is only internal. Is it worth getting a forward > > setup to allow folks to use oc locally on their machines? It would > > expose that api to the world, but of course it would still need auth. > We'd love to expose it, but.. all interaction with the clusters upto this > point have also only been done via Ansible, so if it turns out we can't > expose the API like this we're ok with that. With minor changes to the > playbook we should be able to at least replicate the current 3.11 > experience. Sure, but currently app owners can use oc on their local machines to view logs, debug, etc. I think thats a nice thing to keep working. > >> That's what we decided to do for the CentOS CI ocp setup, and so CI > >> tenants can use oc from their laptop/infra. As long as cert exposed for > >> default ingress has it added in the SAN, it works fine : > >> > >> X509v3 Subject Alternative Name: > >> DNS:*.apps.ocp.ci.centos.org, DNS:api.ocp.ci.centos.org, > >> DNS:apps.ocp.ci.centos.org > > > Yeah, thats all fine, but to make it work for our setup, I would need to > > get RHIT to nat in port 6443 to proxy01/10 from the internet. At least I > > think thats the case. Openshift 3 could just use https, but alas, I fear > > OCP4 needs that 6443 port. > Yep think you're right on that. I can put in a request for this. > >Do we want to try and enable http/2 ingress? > https://docs.openshift.com/container-platform/4.5/networking/ingress-operator.html#nw-http2-haproxy_configuring-ingress > We can take a look and see if we can figure it out! ok. > > We will want to enable kubevirt/whatever it's called... > We definitely want to make this available, but we will have to set quotas > on usage. We should enable on staging, but should we enable on production? We can start with staging and test and see what usage might be before we go to prod. > On CentOS CI OCP4 cluster, we have Openshift Virtualization / kubevirt > installed, but I don't think anyone is actually using *it*. We have several > tenants which have elevated permissions, and are then accessing KVM > directly to bring up VMs on the Openshift nodes, this is something we want > to avoid, as we can't effectively set quotas on this type of usage. Yeah, I think FCOS folks are a user there? They might be migrated to our new cluster, so they would likely need the same perms here. ;( Thanks! kevin
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure