============================================ #fedora-meeting: Infrastructure (2011-08-04) ============================================ Meeting started by nirik at 19:00:02 UTC. The full logs are available at http://meetbot.fedoraproject.org/fedora-meeting/2011-08-04/infrastructure.2011-08-04-19.00.log.html Meeting summary --------------- * Robot Roll Call (nirik, 19:00:02) * New folks introductions and apprentice tasks/feedback (nirik, 19:03:02) * F16 Alpha Freeze reminder and tickets (nirik, 19:05:46) * LINK: https://fedorahosted.org/fedora-infrastructure/browser/architecture/Environments.png (nirik, 19:06:22) * Upcoming Tasks/Items (nirik, 19:10:37) * List items / random info (nirik, 19:16:21) * there was a short unplanned outage yesterday. Sent details to list. (nirik, 19:23:48) * infra-docs is live and ready for SOP's to be converted to it. (nirik, 19:24:04) * DNS glue records are now fixed. (nirik, 19:25:18) * backup03 sees it's take drive, so we can set it up now. (nirik, 19:25:34) * new wildcard cert is ready to go (28 days to spare) (nirik, 19:25:52) * new ibiblio02 machine should be ready soon. (nirik, 19:27:04) * Meeting tagged tickets: (nirik, 19:27:50) * LINK: https://fedorahosted.org/fedora-infrastructure/query?status=new&status=assigned&status=reopened&group=milestone&keywords=~Meeting&order=priority (nirik, 19:27:50) * Open Floor (nirik, 19:28:51) * need to update IMM/RSA on machines, as well as reset it on 4 of them. (nirik, 19:34:48) * LINK: https://fedorahosted.org/fedora-infrastructure/ticket/2836 (nirik, 19:53:57) Meeting ended at 20:03:57 UTC. Action Items ------------ Action Items, by person ----------------------- * **UNASSIGNED** * (none) People Present (lines said) --------------------------- * nirik (121) * skvidal (48) * abadger1999 (37) * smooge (27) * zodbot (9) * lmacken (2) * CodeBlock (2) * jsmith (2) * Klainn (1) * ricky (0) * codeblock (0) -- 19:00:02 <nirik> #startmeeting Infrastructure (2011-08-04) 19:00:02 <zodbot> Meeting started Thu Aug 4 19:00:02 2011 UTC. The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:02 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic. 19:00:02 <nirik> #meetingname infrastructure 19:00:02 <zodbot> The meeting name has been set to 'infrastructure' 19:00:02 <nirik> #topic Robot Roll Call 19:00:03 <nirik> #chair smooge skvidal codeblock ricky nirik abadger1999 19:00:03 <zodbot> Current chairs: abadger1999 codeblock nirik ricky skvidal smooge 19:00:11 <smooge> here 19:00:17 <Klainn> giggity 19:00:18 * abadger1999 here 19:00:18 <nirik> morning smooge 19:00:39 * nirik waves to all 19:01:26 * nirik will start the meeting at :03 19:01:45 * CodeBlock waves 19:02:44 <smooge> oh I thought I was late again 19:03:02 <nirik> #topic New folks introductions and apprentice tasks/feedback 19:03:06 <nirik> smooge: not at all. ;) 19:03:28 <nirik> so, any new folks like to introduce themselevs? any apprentice folks like to talk about specific items or questions? 19:04:16 <nirik> I added another apprentice / easyfix ticket yesterday... 19:04:24 <nirik> move/convert SOP's over from wiki to git. 19:04:53 <nirik> I've also gotten several replies to my aug fi-apprentice ping email. A number of people had busy summers but hope to dig back in soon. 19:05:18 <nirik> I'll be doing the group cleanup next week. 19:05:46 <nirik> #topic F16 Alpha Freeze reminder and tickets 19:05:57 <nirik> Reminder that we are in a pre-release freeze right now. 19:06:22 <nirik> https://fedorahosted.org/fedora-infrastructure/browser/architecture/Environments.png 19:06:28 <nirik> lists whats included and whats not. 19:06:40 <nirik> Anything thats included, you MUST post to the list and get 2 +1's on. 19:06:56 <smooge> It looks like we will slip 1 or more weeks if I read the email correct 19:07:06 <nirik> yeah, seeming likely. ;( 19:07:08 <jsmith> It's entirely possible 19:07:18 <nirik> we also have f16 alpha tickets all filed: 19:07:20 <jsmith> Not for sure yet, but somewhat likely, given the late TC 19:07:27 <nirik> .ticket 2894 19:07:28 <zodbot> nirik: #2894 (F16Alpha: websites) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2894 19:07:33 <nirik> .ticket 2895 19:07:36 <zodbot> nirik: #2895 (F16Alpha: Verify mirror space) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2895 19:07:40 <nirik> .ticket 2896 19:07:41 <zodbot> nirik: #2896 (F16Alpha: Release day ticket) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2896 19:07:42 <nirik> .ticket 2897 19:07:45 <zodbot> nirik: #2897 (F16Alpha: Verify mirror permissions) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2897 19:07:47 <nirik> .ticket 2898 19:07:50 <zodbot> nirik: #2898 (F16Alpha: Verify mirrormanager redirects) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2898 19:08:08 <nirik> so, we should make sure we have these under control before Alpha. 19:08:43 <skvidal> sorry, sorry 19:08:45 * skvidal is here 19:08:50 <nirik> hey skvidal. No worries. 19:09:12 <nirik> so, does anyone wish to take on any of those alpha tickets for their very own? ;) 19:09:47 <nirik> in any case we will make sure they get done before alpha. 19:10:19 <nirik> Anything more on alpha tickets? 19:10:37 <nirik> #topic Upcoming Tasks/Items 19:10:50 <nirik> Anyone have upcoming items they wish to plan/schedule or discuss? 19:11:09 <nirik> We can't affect any of the machines in the freeze, but we can work on other machines and also plan/document things. ;) 19:11:52 <nirik> I'm planning on sending out a straw man plan for upgrading hosted for people to poke holes in. 19:12:37 <abadger1999> I'm working on a little web app for ambassadors to be able to run a raffle. 19:12:52 <abadger1999> Plan to deploy it to production after alpha freeze. 19:13:04 <abadger1999> Not sure if it'll become a permanent fixture or will be a one-shot. 19:13:07 <nirik> abadger1999: cool. ;) 19:13:30 <nirik> is that likely to need to follow the dev-> stg-> prod chain? or so simple it can just test in stg? 19:14:07 <abadger1999> nirik: I can test in stg since it's not deployed yet. 19:14:17 <abadger1999> nirik: But I can start in dev/w a dev instance if you'd rather. 19:14:22 <abadger1999> up to you :-) 19:14:23 <smooge> I will take mirror space and permissions 19:14:47 <smooge> I am planning one two things that I will need +1 for 19:14:54 <nirik> abadger1999: don't care too much on a simple app I don't think. If it can be safely tested in stg thats fine. Especially if it doesn't use a different framework, etc. 19:15:04 <abadger1999> <nod> 19:15:05 <nirik> smooge: thanks on the tickets. 19:15:34 <abadger1999> It'll be TG2. I'll plan on testing stg; I'll holler if I need something else b/c it's not safe to test there. 19:15:50 <nirik> ok 19:16:21 <nirik> #topic List items / random info 19:16:33 <nirik> So, I thought I would bring up a few things I posted on list for discussion... 19:16:43 <nirik> but of course replies to the list are fine too. 19:17:12 <nirik> First one was: sysadmin group requirement for sysadmin-qa. I was thinking we might drop that requirement for them since they don't care about sysadmin emails. 19:17:24 <nirik> I don't know if there's some other reason sysadmin-foo groups require sysadmin. 19:18:01 <nirik> Second one was access to log02 for apprentices. ;) 19:18:09 <abadger1999> nirik: one thing about that was that they needed to go through bastion to get to their boxes I think... we could add sysadmin-qa to the list of groups that can shell into bastion, though. 19:18:33 <nirik> abadger1999: I made them a bastion-comm01... so they should be able to use that for access. 19:18:40 <abadger1999> Okay 19:18:43 <abadger1999> That works too :-) 19:18:48 <nirik> does sysadmin get shell on bastion? 19:19:06 <abadger1999> I think that's the way we set it up. 19:19:08 * abadger1999 checks 19:19:31 <nirik> doesn't seem to. 19:19:38 <skvidal> you have to be in sysadmin-noc or above 19:19:40 * nirik thinks thats just the emails 19:19:41 <skvidal> to get into bastion 19:20:15 <abadger1999> ah, looks like we explicitly list all the sysadmin-* groups. Misrecollection on my part. 19:20:40 <nirik> for not sysadmin-qa I think it makes sense... if you are sysadmin-resource you should still be in the loop on commits and outages so you can know changes that affect your resource. 19:21:48 <nirik> anyhow, can see if there's a historical reason and just change it if there's not. 19:23:14 <nirik> so, do chime in on list. ;) 19:23:33 <nirik> some info items: 19:23:48 <nirik> #info there was a short unplanned outage yesterday. Sent details to list. 19:24:04 <nirik> #info infra-docs is live and ready for SOP's to be converted to it. 19:25:18 <nirik> #info DNS glue records are now fixed. 19:25:23 <smooge> 1) I need to update our wildcard certificate. 2) I am going to remove ns1/ns2 from the dns for fedoraproject.org and other zones that have been fixed 19:25:34 <nirik> #info backup03 sees it's take drive, so we can set it up now. 19:25:40 <smooge> actually the files aren't fixed. I realized that I needed +1 to do so 19:25:52 <nirik> #info new wildcard cert is ready to go (28 days to spare) 19:26:48 * nirik thinks of other things pending. 19:27:04 <nirik> #info new ibiblio02 machine should be ready soon. 19:27:50 <nirik> #topic Meeting tagged tickets: 19:27:50 <nirik> https://fedorahosted.org/fedora-infrastructure/query?status=new&status=assigned&status=reopened&group=milestone&keywords=~Meeting&order=priority 19:28:00 <nirik> any meeting tickets folks would like to note or talk about? 19:28:06 <nirik> or any other tickets for that matter? 19:28:18 <smooge> not me at the moment 19:28:28 <skvidal> nothing leaps to mind 19:28:32 <CodeBlock> nope 19:28:45 <nirik> cool. 19:28:51 <nirik> #topic Open Floor 19:28:55 <nirik> anything for open floor? 19:29:17 <smooge> just waiting for the hardware to be finished racking 19:29:25 <nirik> smooge: any news on that? 19:29:42 <smooge> nothing beyond that it was what caused our outage yesterday :) 19:30:03 <nirik> yeah, I figured. ;( 19:30:28 <nirik> Once those are in place, I'd like to build up the bvirthostwhatever and put a new releng03 on it. 19:30:53 <smooge> ok 19:30:54 <nirik> smooge: oh, can you talk about that IMM/RSA reset thing a bit? 19:31:15 <smooge> ok so for some reason a bunch of our IMM boxes went "dead" to the world after we left pHX2 19:31:17 * skvidal stabs imm/rsa in the face 19:31:26 <skvidal> oh, sorry, bitter 19:31:51 <smooge> the only fix I have found is to install an IBM tool which talks to the hidden controller between the IMM and the box 19:32:00 <nirik> there are 4 machines where the management interface is not working currently. 19:32:08 <smooge> and tell it to give an ip address and reset 19:32:21 <nirik> s/not working/not working at all. no ping, no ssh, no nothing/ 19:33:06 <nirik> unfortunately, those machines are contain 'important' guests. 19:33:09 <smooge> the issue is.. all the systems which are down are critical 19:33:19 <smooge> so it can't happen until after the freeze 19:33:46 * nirik nods. 19:34:02 <nirik> Also, many of our machines have older versions of the IMM firmware. Updating that might be a good thing too. 19:34:13 <nirik> not that the new one is too much better. ;) 19:34:34 <abadger1999> So, the boxes are up and the guests are running but the management interface is down? 19:34:48 <nirik> #info need to update IMM/RSA on machines, as well as reset it on 4 of them. 19:34:48 <smooge> correct. if something happens to the box.. we are sol 19:34:50 <nirik> abadger1999: yep 19:34:53 <abadger1999> Okay. 19:35:44 * nirik tries to think of anything else to discuss... 19:35:53 <nirik> any other topics? Or shall we call it a short meeting? 19:36:20 <skvidal> one minor thing 19:36:23 <skvidal> the infra-hosts git repo 19:36:30 <skvidal> if anyone wants to start adding notes to servers 19:36:32 <skvidal> please do so 19:36:44 * nirik nods. Good plan. 19:36:46 <skvidal> hell, anytime you remember something 'odd' that's is quasi-specific to that server, do it 19:36:51 <skvidal> it can be anything 19:37:07 <skvidal> look at log02 for an example 19:37:41 * nirik has an idea. Not sure it will be useful or work tho. 19:38:04 <skvidal> nirik: ? 19:38:05 <nirik> could we put something in that repo to mark what hosts are in which update group? A B C ? 19:38:12 <skvidal> absolutely 19:38:29 <nirik> then, somehow generate func lists or whatever from that... 19:38:31 <skvidal> put it in the 'notes' file 19:38:35 <skvidal> hmmm... 19:38:44 <skvidal> sure 19:38:45 <nirik> or perhaps thats best as seperate groups in func 19:38:46 <skvidal> we could do that 19:38:47 <skvidal> no 19:38:55 <skvidal> I think we could do that 19:39:01 <skvidal> I can write a script to mine that data out 19:39:05 <skvidal> don't put it in 'notes' then 19:39:14 <skvidal> maybe make a 'servertype' item or something like that 19:39:28 <nirik> I'd like a 'func-yum --hosts-from-list=group-a check update' or whatever. 19:39:34 <nirik> yeah, or 'updategroup' or something. 19:39:39 <skvidal> it would probably be 19:39:46 <skvidal> func-yum --hosts=@group-a update 19:39:54 * nirik nods, thats fine. 19:39:56 <skvidal> since func-yum should handle thar group syntax now 19:40:14 <nirik> anyhow, can figure that out out of band... 19:40:25 <skvidal> yep 19:40:55 <abadger1999> app => rhel6; lmacken thinks that fedoracommunity should be pretty easy to fix once he gets the last packages built for EPEL6. 19:41:08 <abadger1999> So that just leaves mediawiki slowness. 19:41:24 <nirik> cool. I keep meaning to look at that, but never get to it. ;) 19:41:29 <abadger1999> Do we want to put out a cattle call to find a new fi-apprentice to look at that? 19:41:31 <lmacken> yeah, I'm working on the moksha EL6 thing... dealing with odd issues with the TG2 stack atm. 19:41:33 <nirik> might see if ricky or ianweller can look at some point. 19:41:41 <nirik> abadger1999: that would be cool too. 19:42:35 <abadger1999> nirik: Do we have a ticket about the slowness issue? 19:42:40 <nirik> abadger1999: once we have a rhel6 app server working, would bapp01 be hard to do? or it's mostly distro independent? 19:42:54 <nirik> nope. I can file one tho... 19:43:36 <abadger1999> nirik: I'll write a call for volunteers; if you get a ticket open with some numbers/testing it'll be a good place for me to send people to get started. 19:43:47 <abadger1999> nirik: I'd say do bapp01 last. 19:44:08 <abadger1999> nirik: bapp01 has a bunch of stuff running that's not on the other app servers. 19:44:14 <nirik> yeah. 19:44:15 <abadger1999> cron jobs and such. 19:44:42 <nirik> ok. 19:44:45 <abadger1999> things that interface with rh bugzilla, koji... not everything on there is easy to test in stg for those reasons :-( 19:45:00 <nirik> ok. 19:45:32 <nirik> I can file a ticket on the mediawiki thing. 19:45:35 <abadger1999> Probably we need to update the other app servers, then look through puppet for what's running on bapp01. 19:45:44 <abadger1999> (and not on the other app servers) 19:45:55 <nirik> does bapp01 need to be in phx2? (for bugzilla access, etc?) 19:46:00 <abadger1999> and the people responsible for those (mdomsch, I, maybe lmacken) 19:46:01 <smooge> yes 19:46:14 <smooge> it needs bugzilla, mounting of the netapps 19:46:16 <abadger1999> site down and make sue all of those work... 19:46:23 <smooge> and various other things 19:46:27 <abadger1999> maybe in production since they might be hard to test. 19:46:42 <abadger1999> (without having side effects on bugzilla/koji/etc) 19:46:49 <smooge> nirik, it is probably the most critical box that needs to be in phx :/ 19:47:20 <nirik> ok. 19:48:20 <abadger1999> If we think that multiple small, targetted servers are more scalable than one beefier server, bapp01 might be a good candidate. 19:48:46 <abadger1999> It doesn't truly need to be an app server and it doesn't need to be load balanced. 19:49:08 <nirik> well, the reason I asked if it needs to be in phx2, was thinking that it would be nice if it could be 'floating'... ie, have app server setup in puppet and a bapp thing and we could move bapp to whatever app server we wanted to run those things. 19:49:21 <nirik> but it sounds like thats not possible. 19:49:37 <skvidal> nirik: the mount points make it tricky, I suspect 19:49:43 <nirik> abadger1999: https://fedorahosted.org/fedora-infrastructure/ticket/2908 19:49:54 <skvidal> though I've often wondered about that... is it actually MOVIING or accessing files on those mount points? 19:49:59 <skvidal> or is it mostly acquiring directory indexes? 19:50:42 <smooge> Some of everything I believe 19:51:35 <nirik> not sure. 19:52:10 <abadger1999> I can't think of anything off hand that would be writing to the mount points at least, but bapp01 is very... eclectic so I don't know everything that's running on it. 19:52:40 <skvidal> I guess I was wondering 19:52:44 <abadger1999> nirik: thanks. I'll send a message aboout that. 19:52:49 <skvidal> could we dump the nfs mounts 19:53:00 <skvidal> and use file-indexes of the rpms generated on the boxes 19:53:11 <nirik> skvidal: yeah, I think that might be for bodhi to complete package names... 19:53:11 <skvidal> or even repometadata 19:53:20 <nirik> on the other apps at least 19:53:34 <skvidal> nirik: that 's what I was thinking - I'm sure bodhi can read a list from a file faste than a dir glob.glob() 19:53:56 <skvidal> I'll see about looking at the code for bodhi to see if I can make that work 19:53:57 <nirik> https://fedorahosted.org/fedora-infrastructure/ticket/2836 19:54:09 <nirik> lmacken: ^ is that for package name completion? 19:54:12 <nirik> skvidal: cool. 19:54:23 <nirik> It would be nice to not have to have mounts on the app servers. 19:55:00 <skvidal> indeed 19:55:10 <skvidal> and it would make those boxes less 'special' 19:55:41 <nirik> also, currently we have app05 and app06 that are not in phx2, but they are not in the base load (only backups) I think due to this reason. 19:56:20 <nirik> (well, and possibly db latency) 19:56:40 <smooge> I think the writing is from mirrormanager 19:56:59 <smooge> nirik, a lot of db latency 19:57:32 <skvidal> smooge: mirrormanager is writing to nfs? or do you mean writing to the db? 19:58:01 <smooge> skvidal, I thought there was something in mirrormanager that writes to the disks.. but I could be wrong 19:58:25 <skvidal> smooge: I know it writes out its mirror metalinks files and what-not 19:58:29 <skvidal> but that's not big 19:58:55 <smooge> oh I was thinking you were wondering about ro access versus rw. I misread something 19:59:02 <abadger1999> nirik: db latency was why they were backups originally. 19:59:03 <skvidal> np 19:59:05 <lmacken> nirik, skvidal: it used to be for the build auto-completion, but I think we may not need /mnt/koji on the app servers anymore. I'll look into it and follow up in the ticket. 19:59:13 <skvidal> lmacken: thank you 19:59:17 <nirik> abadger1999: yeah. ;( 19:59:21 <nirik> lmacken: cool. Thanks. 19:59:56 <nirik> in any case I think we all agree on bapp01: a) identify and document the 'specialness' it has and b) try and reduce that so it's less complex/SPOF. ;) 20:00:42 <smooge> +1 20:00:49 <nirik> ok, any last items from anyone? if not will close out soon here... 20:01:24 <skvidal> lmacken: just did some searches through the code 20:01:43 <skvidal> lmacken: looks like it is _fetch_candidate_builds() which does the autocompletion and that looks like direct koji calls to get those lists 20:01:56 <skvidal> lmacken: so - I suspect you are correct about /mnt/koji being a legacy mount 20:02:56 <nirik> cool. 20:03:55 <nirik> ok, thanks for coming everyone! 20:03:57 <nirik> #endmeeting
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure