============================================ #fedora-meeting: Infrastructure (2015-03-12) ============================================ Meeting started by nirik at 18:00:06 UTC. The full logs are available at http://meetbot.fedoraproject.org/fedora-meeting/2015-03-12/infrastructure.2015-03-12-18.00.log.html . Meeting summary --------------- * aloha (nirik, 18:00:06) * New folks introductions / Apprentice feedback (nirik, 18:00:06) * announcements and information (nirik, 18:11:30) * Group effort cleaned up the pkgdb branch script on friday - kevin (nirik, 18:11:30) * Good progress made on new cloud (vnc working, copr being tested) - kevin/msuchy (nirik, 18:11:30) * Fedora 22 Alpha is out! Freeze is over! - kevin (nirik, 18:11:30) * Mass reboots happened yesterday, please report any issues you find - kevin (nirik, 18:11:31) * https://register.flocktofedora.org deployed to OpenShift for Flock 2015 Rochester. (Please wait for announcement to register). Need to figure out how to stand https://flocktofedora.org back up. -lmacken (nirik, 18:11:32) * VACUUM ANALYZE on datanommer db made a difference. we'll need to investigate why autovacuum isn't running regularly on our postgres dbs - ralph (nirik, 18:11:36) * fedmsg+karma commands coming to zodbot soon https://github.com/fedora-infra/supybot-fedora/pull/22 - ralph (nirik, 18:11:39) * we have tons of open pull requests this week. any help reviewing is appreciated. http://ambre.pingoured.fr/fedora-infra/ - ralph (nirik, 18:11:42) * monitoring: let us design something better - kevin (nirik, 18:12:44) * LINK: http://linux-ha.org/source-doc/assimilation/html/index.html (nirik, 18:14:00) * where is our source code? - smooge (nirik, 18:31:05) * ACTION: puiterwijk will see if we can generate a list of packages with upstreams being retired to notify the devel list of. (nirik, 18:36:47) * Mirrormanager2 [how is this coming along?] (nirik, 18:36:53) * Learn about: collectd (nirik, 18:47:08) * LINK: https://admin.fedoraproject.org/collectd/ (nirik, 18:47:31) * Meeting process (nirik, 18:53:29) * Open Floor (nirik, 18:57:40) Meeting ended at 19:01:23 UTC. Action Items ------------ * puiterwijk will see if we can generate a list of packages with upstreams being retired to notify the devel list of. Action Items, by person ----------------------- * puiterwijk * puiterwijk will see if we can generate a list of packages with upstreams being retired to notify the devel list of. * **UNASSIGNED** * (none) People Present (lines said) --------------------------- * nirik (129) * puiterwijk (38) * smooge (25) * threebean (16) * oddshocks (15) * ClockworkOmega (9) * kushalk124 (8) * zodbot (5) * relrod (5) * mhurron (3) * andreasch (1) * Mohamed_Fawzy (1) * lmacken (1) * janeznemanic (1) * abadger1999 (0) * mdomsch (0) * pingou (0) * dgilmore (0) -- 18:00:06 <nirik> #startmeeting Infrastructure (2015-03-12) 18:00:06 <zodbot> Meeting started Thu Mar 12 18:00:06 2015 UTC. The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:06 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic. 18:00:06 <nirik> #meetingname infrastructure 18:00:06 <zodbot> The meeting name has been set to 'infrastructure' 18:00:06 <nirik> #topic aloha 18:00:06 <nirik> #chair smooge relrod nirik abadger1999 lmacken dgilmore mdomsch threebean pingou puiterwijk 18:00:06 <zodbot> Current chairs: abadger1999 dgilmore lmacken mdomsch nirik pingou puiterwijk relrod smooge threebean 18:00:06 <nirik> #topic New folks introductions / Apprentice feedback 18:00:12 <relrod> here 18:00:17 <andreasch> hi 18:00:17 * puiterwijk is here 18:00:22 * threebean is here 18:00:42 <Mohamed_Fawzy> hi 18:00:44 <ClockworkOmega> here 18:01:04 <relrod> Oh, no roll-call section anymore? sorry 18:01:05 <smooge> here 18:01:23 <janeznemanic> hi 18:01:35 <smooge> there is always rolecall 18:01:43 <nirik> relrod: we can, perhaps we can add it to this same topic. 18:02:03 <nirik> seems like a waste to just have several minutes where we just say hi and then ask for freeback/new people. 18:03:27 <nirik> anyhow, any new folks like to introduce themselves? or apprentices with questions? 18:03:41 <ClockworkOmega> I'm new actually. 18:03:41 <kushalk124> Hey, 18:04:20 <kushalk124> So I started with some things, made my first package, which has been reviewed and I am looking for a sponsor, 18:04:34 <kushalk124> And next I would like to contribute to some apps, have been looking at datanommer 18:04:41 * oddshocks here 18:05:14 <nirik> ClockworkOmega: welcome. care to give us a one line intro? are you more interested in sysadmin or application devel stuff? 18:05:25 <nirik> kushalk124: cool. datanommer can always use some work... 18:05:48 <kushalk124> nirik, I would be happy to help out :) 18:06:35 <ClockworkOmega> Thanks. Yes I'm an spiring sysadmin looking to get professionally into Linux but in the mean time I want to volunteer with Fedora. 18:06:40 <ClockworkOmega> a* 18:07:44 <nirik> ClockworkOmega: welcome. We can give you some pointers on where to start after the meeting over in #fedora-admin. ;) 18:08:15 <nirik> ok, get ready for info dump... 18:08:16 <kushalk124> nirik, Is there something where we can put some machine learning /data analysis / visualization , I would love to do something of that sort as well :D 18:08:21 <ClockworkOmega> Thanks. Maybe my spelling will improve by then :P 18:08:58 <nirik> kushalk124: not sure what you mean, you're welcome (and encouraged) to mine data for interesting things... 18:09:36 <relrod> kushalk124: probably some opportunities there with fedmsg -- suggest talking to threebean 18:10:14 <kushalk124> nirik, ah thanks :) Yes I can think of interesting things to work on with data 18:10:33 <nirik> with fedmsg we have a lot of data... making sense of it would be great. ;) 18:10:42 <kushalk124> relrod, Thanks :) fedmsg would be helpful , I will also have a word with threebean 18:10:59 <threebean> kushalk124: cool :) 18:11:04 <kushalk124> nirik, Yes, today I was exploring how the badges are given , using the data from fedmsg :D 18:11:26 <nirik> excellent. 18:11:30 <nirik> #topic announcements and information 18:11:30 <nirik> #info Group effort cleaned up the pkgdb branch script on friday - kevin 18:11:30 <nirik> #info Good progress made on new cloud (vnc working, copr being tested) - kevin/msuchy 18:11:30 <nirik> #info Fedora 22 Alpha is out! Freeze is over! - kevin 18:11:31 <nirik> #info Mass reboots happened yesterday, please report any issues you find - kevin 18:11:32 <nirik> #info https://register.flocktofedora.org deployed to OpenShift for Flock 2015 Rochester. (Please wait for announcement to register). Need to figure out how to stand https://flocktofedora.org back up. -lmacken 18:11:36 <nirik> #info VACUUM ANALYZE on datanommer db made a difference. we'll need to investigate why autovacuum isn't running regularly on our postgres dbs - ralph 18:11:39 <nirik> #info fedmsg+karma commands coming to zodbot soon https://github.com/fedora-infra/supybot-fedora/pull/22 - ralph 18:11:42 <nirik> #info we have tons of open pull requests this week. any help reviewing is appreciated. http://ambre.pingoured.fr/fedora-infra/ - ralph 18:11:45 <nirik> so, theres a big dump of info. ;) 18:12:14 <nirik> on to discussion topics 18:12:44 <nirik> #topic monitoring: let us design something better - kevin 18:13:04 <nirik> so, we get a lot of alerts, and they kind of aren't all that useful much of the time. 18:13:24 <nirik> I'd like to look at some alternatives. 18:13:44 <nirik> One of them is that we should try out assimilation 18:13:48 <mhurron> complete alternatives to nagios or just a different way to configure it? 18:14:00 <nirik> http://linux-ha.org/source-doc/assimilation/html/index.html 18:14:04 <nirik> both. ;) 18:14:14 <nirik> I think we can try assimilation out in our cloud network 18:14:32 <nirik> and we could look at redesigning our nagios setup if that proves easy/possible to do shorter term 18:15:21 <nirik> I'll look at floating some ideas on the list. 18:15:34 <nirik> if anyone is interested in helping out, they could chime in there too. ;) 18:16:10 <threebean> I'm interested in seeing it happen.. but I don't know much about alternatives. 18:16:17 <nirik> Things I want this to fix: 18:16:38 <threebean> I was hoping to get into automating much of our existing nagios config (so it's derived from ansible host and group vars..). but switching systems: I hadn't considered. 18:16:40 <oddshocks> Whatever the resolution may be, the alerts system could definitely be improved 18:16:44 <nirik> * alerts should start out just going to irc, then if still happening, email, then pager. 18:16:52 <oddshocks> +1 18:17:13 <oddshocks> too many emails 18:17:14 <nirik> * alerts for things that aren't user/customer impacting should never go to pagers. 18:18:32 <nirik> I'd really like to have alerts be a special event, not a 'oh no, there goes the pager again' 18:18:44 <nirik> anyhow, we can discuss more on list 18:19:03 * threebean nods 18:19:07 <puiterwijk> maybe also prioritizing services. while tagger is user-facing, I'm not sure it's as critical as distgit or koji being down 18:19:12 <nirik> the nice thing about Assimilation is that it just autodetects. You don't need to configure what it monitors. 18:19:21 <nirik> puiterwijk: also good idea. 18:19:30 <threebean> there's also a lot of app-specific errors that mostly go to developers, but not a broader monitoring thing. 18:19:47 <threebean> lmacken just noticed a bunch of internal errors from the badges backend. it needed a restart, but nagios didn't know about it. 18:20:13 <nirik> yeah, and we often forget to add things to nagios when we make new ones 18:20:25 <nirik> and staging alerts should not be the same as production 18:20:49 <puiterwijk> I would say staging should only (maybe) get IRC alerts, never email or pager 18:20:52 <ClockworkOmega> How could someone help in improving the system? 18:21:19 <nirik> puiterwijk: email still might be handy to see if something is down for a long time. 18:21:39 <nirik> ClockworkOmega: well, chime in on the mailing list post I am going to make I guess. ;) and/or look at our current setup in ansible git. 18:21:42 <lmacken> threebean: ah 18:21:42 <puiterwijk> nirik: not sure. for "long time", the monitoring should have a log of itself 18:21:59 <puiterwijk> (just my opinion) 18:22:01 <nirik> perhaps. 18:22:12 <nirik> also, currently nagios alerts go to 'sysadmin-members' 18:22:24 <nirik> I would suspect 99% of them just filter them into the trash. ;) 18:22:40 <nirik> well, perhaps 95% 18:23:16 <puiterwijk> right, but I guess that's caused by the fact that it sends so much email 18:23:35 <puiterwijk> so if we'd fix the signal/noise ratio, that percentage should hopefully go down 18:23:43 <threebean> yeah. I rely excusively on irc for nagios alerts. 18:23:43 <nirik> well, that and a number of sysadmin members aren't very active or have no idea how to fix something or have access to do so 18:24:13 <smooge> my phone is my alerter 18:24:20 <smooge> when its charged 18:24:27 <smooge> unlike right now 18:24:28 <nirik> a case I often hit: something goes down like a proxy or something, and so theres 20-30 alerts, then 20-30emails or whatever. 18:24:37 <nirik> but I saw them on irc and fixed it. 18:24:51 <nirik> so I hit 'delete all' on my phone and 'catch up all' in my nagios folder 18:24:58 <nirik> all those pages/emails are... completely overhead 18:25:29 <nirik> anyhow, will post to the list we can brainstorm a plan there. :) 18:25:50 <nirik> anything else on monitoring? 18:26:15 <puiterwijk> yeah, I think "no monitoring" is not a solution, even though it may solve the "too many alerts" problem :) 18:26:35 <nirik> agreed. we want to see problems before our users do. 18:26:40 <nirik> #topic where is our source code? - smooge 18:26:41 <nirik> Google code and gitorious are going away.. what projects there we might rely on? 18:26:43 <threebean> eh, if we're moving monitoring around, it might be nice to get a flashier collectd frontend (or replacement). there are nice, modern open source frontends out there 18:26:44 * oddshocks digs nirik's idea of IRC -> email -> pager, with those other exceptions/rules mentioned along with it 18:26:45 <nirik> smooge: you added this? 18:26:45 <ClockworkOmega> What about a different system for filtering? 18:27:03 <ClockworkOmega> Or rather a different methodology for it? 18:27:13 <nirik> oops. Didn't mean to cut off everyone there on monitoring. ;) 18:27:13 <puiterwijk> I think we only have code in fedorahosted and github in infra 18:27:16 <nirik> #undo 18:27:16 <zodbot> Removing item from minutes: <MeetBot.items.Topic object at 0xfa7b3d0> 18:27:42 <nirik> threebean: graphite was suggested... it's Django tho and bigger... 18:27:49 <nirik> ClockworkOmega: filtering where? 18:28:53 * relrod has played with graphite before. It's _extremely_ modular, but that also means that setting it up has a _lot_ of little components to maintain and set up and learn. 18:29:44 <ClockworkOmega> I wasn't speaking so much about direction. 18:29:57 <nirik> relrod: and it seems heavy to me, but perhaps it's worth it. ;) 18:31:05 <nirik> #topic where is our source code? - smooge 18:31:08 <smooge> puiterwijk, my questions was about a bit bigger picture.. do we rely on tools which are hosted there and do we know where they will be after they close down. 18:31:18 <nirik> anyhow, I don't think we have any code there... anyone know of any? 18:31:23 <puiterwijk> smooge: yes, we will be. 18:31:32 <puiterwijk> but that's all Fedora packaged as far as I know 18:31:50 <puiterwijk> nirik: as said, we don't have any code there, but smooge is worried about stuff we depend on 18:32:10 <nirik> sure, it's good to ask everyone... in case we missed something. 18:32:13 <puiterwijk> and I personally think that's the problem of the EPEL apckage maintainers 18:32:14 * threebean doesn't know of any 18:32:16 <smooge> it is more of a 'something we need to be aware of if it all goes away' 18:32:39 <puiterwijk> smooge: yeah, makes sense. though I guess most upstreams that are still active will find another host, and then package maintainers should follow that 18:32:43 <puiterwijk> (just my 2 cents) 18:33:00 <puiterwijk> or rather, my opinion 18:33:02 <smooge> still active... 18:33:10 <smooge> that was where I start to get itchy 18:33:12 <nirik> it's just like when berlios went away. ;) 18:33:32 <smooge> our planet uses a forked venus which doesn't match what current venus is 18:33:45 <puiterwijk> smooge: well, non-active upstreams have always been a problem, regardless of the place where the code is 18:34:04 <puiterwijk> so yes, I see the problem there, but that's not especially related to gitorious/gcode shutting down 18:34:29 <nirik> well, it's just more at once. 18:34:33 <threebean> it might be worth trying to script something that goes through our packages searching for links to these soon-to-be-gone services. Look at SourceN fields, look at the 'upstream url' in pkgdb.. 18:34:36 <smooge> ok never mind. 18:34:44 <threebean> ... and generate a list to send to the devel list. 18:34:49 <nirik> threebean: could anytia do that? 18:35:05 <nirik> or the using it's db I guess 18:35:11 <puiterwijk> nirik: well, it has a list of upstream URLs, yes. 18:35:14 <puiterwijk> but not for all source files 18:35:15 <threebean> yeah. would take a little scripting. 18:35:37 <nirik> sure, it would never be 100% 18:35:43 <nirik> but could find the ones that are obvious 18:35:57 <puiterwijk> I can take a throw at that after the meeting 18:36:24 <nirik> cool. :) 18:36:47 <nirik> #action puiterwijk will see if we can generate a list of packages with upstreams being retired to notify the devel list of. 18:36:53 <nirik> #topic Mirrormanager2 [how is this coming along?] 18:37:05 <nirik> smooge: this was your question? or ? 18:37:11 <puiterwijk> For this, it's too bad that pingou is gone today. 18:37:19 <nirik> well, I can give some info. :) 18:37:26 <puiterwijk> ah, sure 18:37:30 <smooge> nirik, it was brought up about bapp02 ooms 18:37:43 <smooge> and I thought that was the area to put questions like that. 18:38:05 <puiterwijk> smooge: yeah, I think that's a right place indeed. 18:38:09 <nirik> we have 1 mirrorlist server thats on mm2. The mirrormanager2-mirrorlist rpm needs some work tho (it doesn't come up right on boot). I'll work with pingou to fix that next week. 18:38:24 <nirik> sure, absolutely right. ;) 18:38:43 <nirik> once mirrormanager2-mirrorlist is set, we should convert the rest of the mirrorlist servers to use it. 18:39:11 <nirik> oddshocks was looking into seeing if we could validate the data those use... 18:39:27 <nirik> so we avoid pushing out bad data to them. 18:39:46 * oddshocks nods 18:40:03 <nirik> On the other parts, we have staging versions of: backend, crawler, frontend. We need to finish some fedmsg work on them... then make production ones and switch. 18:40:27 <nirik> I don't know for sure if fedmsg integration is the last bit they need or if there was something more pingou was waiting on doing 18:40:55 <nirik> for bapp02 in the mean time the only thing we could possibly do is decrease the number of crawlers I guess. 18:40:59 <threebean> hm. did we do that already? can't recall. will have a look. 18:41:20 <nirik> threebean: it still spews crons about fedmsg things missing... might just need adding in the playbook(s) 18:41:41 <threebean> ah, cool. I'll poke it after the meeting. 18:41:53 <oddshocks> On my end... I'm really clueless as to which parts of the pickle data is the critical data that determines if the pickle is good or bad. I have pingou's script to compare two pickles and have used that as a jumping-off point to write a validate_pickle.py script, but I'm still pretty clueless. So if anyone has any other info on the rather-large amount of content in these pickles, it'd be appreciated 18:41:59 <nirik> It would be cool if we could finish rolling this out before beta, but not sure if thats being too pushy 18:42:47 <nirik> oddshocks: yeah, the only thing I have is that traceback from the mm2-mirrorlist. The old mirrorlists don't show any error they just suck up all memory and fall over. 18:43:00 <oddshocks> At the least, I could probably use maybe 2 more good pickles as examples, so I can compare 3 good pickles and see what they have, that the bad pickle doesn't. threebean was kind enough to get me one good pickle to compare to the bad one, but I'm not sure where that was taken from. But I could probably use a couple more for comparitive purposes 18:43:15 <oddshocks> Oh, yeah, I have that traceback you sent me, too, nirik :) 18:43:37 <nirik> sure. I can get you some more good ones... 18:43:52 <nirik> it makes them hourly 18:44:02 <oddshocks> Feel free, anyone, to tell me that I'm going about this less-than-optimally. :P 18:44:07 <oddshocks> nirik: cool, thanks :) 18:45:03 <nirik> smooge: did that answer the question? anything more on mm2 (without pingou around) 18:45:20 * oddshocks wasn't sure if other people knew more than he did about what causes the pickle to be bad 18:45:21 <smooge> well other than "we are moving to it next week" 18:45:38 <oddshocks> when does pingou get back again? 18:45:38 <puiterwijk> oddshocks: I have some notes about bad pickles. will look them up for you 18:45:39 <nirik> next week. 18:45:43 <oddshocks> puiterwijk: _awesome_, thanks 18:45:47 <oddshocks> nirik: cool 18:45:56 <nirik> I don't know if we can move to it before beta, but it would be nice. 18:45:58 <smooge> oddshocks, they are magic to me 18:46:12 <nirik> we could also retire bapp02, app01.stg from this, so that would be all good. 18:46:43 <smooge> yay! 18:46:46 <smooge> ok that is all I needed 18:47:00 <nirik> ok, I didn't have anyone signed up to tell us all about an app, so hey, I guess I will randomly pick one to talk about... how about collectd! :) 18:47:08 <nirik> #topic Learn about: collectd 18:47:18 <smooge> ah man.. I was about to leave too. 18:47:28 <nirik> ha ha ha. 18:47:31 <nirik> https://admin.fedoraproject.org/collectd/ 18:47:37 <nirik> giving a dns error. neat. ;) 18:47:49 <puiterwijk> demo effect 18:47:55 <puiterwijk> but works for me, so it's proxy-local 18:48:19 <nirik> ok, fixed. 18:48:26 <nirik> its log01's vpn. ;) 18:48:51 <nirik> anyhow, we have this application called collectd. It runs a agent on various machines and reports back to a central version on log01. 18:49:12 <nirik> that version takes the data in as rrdtool files and then can display it on the above web page. 18:49:31 <nirik> it's kinda clunky, but it can give bit picture graphs of things. 18:50:05 <nirik> anytime you see a host in the list that is NOT a fully qualified domain name, it's an old host still in puppet/rhel6 18:50:39 <nirik> you can zoom on graphs pretty close, it takes quite a lot of readings. 18:51:01 <nirik> there are plugins for various things, including some we have written ourselves. 18:51:23 <nirik> threebean has http://threebean.org/fedmsg-health.html which uses some of these. 18:52:09 <nirik> any questions or comments on collectd? ;) 18:52:36 <mhurron> it is pretty ugly isn't it :P 18:52:46 <nirik> yeah, it's not a winner on the interface. ;) 18:53:05 <puiterwijk> it's a well-designed admin tool - it shows what it needs to, in a concise interface :)_ 18:53:23 <nirik> heh 18:53:29 <nirik> #topic Meeting process 18:53:43 <nirik> just real quickly, do we want to keep doing the gobby document meeting process? 18:53:45 <mhurron> well ... if it showed what it needed to wouldn't it show abnormalities on the front page? 18:54:03 <nirik> mhurron: well, it doesn't know what abnormal is. It only reports the news. ;) 18:55:07 <nirik> anyhow, I know gobby has issues, but I like the shared document thing... but I'm happy to go back to the old meeting format if people prefer. 18:55:07 <puiterwijk> nirik: I think this process works fine 18:55:33 <smooge> now that i have gobby working on my laptop I do to 18:56:48 <nirik> yeah, having to have a special client is a pain. As is to me having to reenter the password and reconnect everytime I reboot or move networks, and it's autosave doesn't work right accross reboots of the server. 18:56:51 <nirik> otherwise it's fine. ;) 18:56:57 <threebean> yeah, I like the new process still. I was late to update the gobby document this week, fwiw.. but hope to do it earlier in subsequent weeks. 18:57:18 <nirik> I might look into *pads again and see if we can find a web based on we can actually package and deploy 18:57:40 <nirik> #topic Open Floor 18:57:45 <nirik> anyone have items for open floor? 18:57:47 <puiterwijk> I have a very quick thing 18:58:02 <puiterwijk> I just ran a first test run of the gcode/gitorious script, and found 90 projects so far 18:58:09 <puiterwijk> (of the 5000 or so in anitya) 18:58:44 <smooge> puiterwijk, did you do that in awk? 18:58:47 <smooge> :) 18:58:48 <nirik> quite a few 18:58:49 <puiterwijk> smooge: yup :) 18:59:08 <puiterwijk> nirik: yeah, but not a lot of "important" ones to us 18:59:13 * smooge goes off to corrupt more people so his mod_awk httpd module will be used 18:59:24 <puiterwijk> but I'll refine it a bit and make a more complete list 18:59:30 <puiterwijk> smooge: mod_awk? tell me more :-) 18:59:36 <nirik> ha. 18:59:43 <nirik> ok, if nothing else will close out in a minute or so 18:59:45 <smooge> puiterwijk, ok thanks. pastebin me the code 19:00:17 <puiterwijk> smooge: you mean for mod_awk (which I don't have... yet)? or the gcode/gitorous stuff? 19:00:29 <relrod> hah 19:00:29 <smooge> gcode/gitorious 19:01:04 <nirik> thanks for coming everyone! 19:01:06 <puiterwijk> smooge: yeah, will send it out when I get more stuff added (I want to also check the distgit repos) 19:01:12 <smooge> its like mod_perl but even creakier.. /* I started on this as a joke back in 1997.. please don't make me look anymore */ 19:01:23 <nirik> #endmeeting
Attachment:
pgprhnanShlws.pgp
Description: OpenPGP digital signature
_______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure