Summary/Minutes from today's Fedora Infrastructure meeting (2013-03-07)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



============================================
#fedora-meeting: Infrastructure (2013-03-07)
============================================


Meeting started by nirik at 18:59:59 UTC. The full logs are available at
http://meetbot.fedoraproject.org/fedora-meeting/2013-03-07/infrastructure.2013-03-07-18.59.log.html
.



Meeting summary
---------------
* welcome everyone  (nirik, 18:59:59)

* New folks introductions and Apprentice tasks  (nirik, 19:02:28)
  * new folks would be advised to look into ansible.  (nirik, 19:11:09)

* Applications status / discussion  (nirik, 19:11:16)
  * fas-openid is now in production. Thanks puiterwijk. :)  (nirik,
    19:11:42)
  * fedmsg starting to be enabled for secondary arch compose processes.
    (nirik, 19:12:30)
  * fedmsg-notify blog post out there, hopefully more consumers  (nirik,
    19:14:03)
  * LINK: https://admin.fedoraproject.org/haproxy/proxy01   (lmacken,
    19:14:18)
  * LINK: http://lewk.org/blog/fedmsg-notify.html   (pingou, 19:14:19)

* Sysadmin status / discussion  (nirik, 19:24:17)
  * ssmoogen reinstalled proxy01 as x86_64...  (nirik, 19:25:18)
  * smooge is waiting to find out if physical memory arrived in PHX2 so
    we can give it to systesm  (ssmoogen, 19:30:38)

* Private Cloud status update / discussion  (nirik, 19:34:17)
  * moving things into openstack cloudlet  (nirik, 19:36:13)
  * will move some compute nodes from the other cloudlet over to it, and
    then have 2 nodes to continue testing other things with.  (nirik,
    19:36:53)

* Upcoming Tasks/Items  (nirik, 19:39:33)
  * 2013-03-07 remove inactive apprentices.  (nirik, 19:39:38)
  * 2013-03-12 to 2013-03-21 pycon  (nirik, 19:39:39)
  * 2013-03-19 to 2013-03-26 - koji update  (nirik, 19:39:39)
  * 2013-03-29 - spring holiday.  (nirik, 19:39:39)
  * 2013-04-02 to 2013-04-16 ALPHA infrastructure freeze  (nirik,
    19:39:39)
  * 2013-04-16 F19 alpha release  (nirik, 19:39:40)
  * 2013-05-07 to 2013-05-21 BETA infrastructure freeze  (nirik,
    19:39:41)
  * 2013-05-21 F19 beta release  (nirik, 19:39:43)
  * 2013-05-31 end of 1st quarter  (nirik, 19:39:45)
  * 2013-06-11 to 2013-06-25 FINAL infrastructure freeze.  (nirik,
    19:39:47)
  * 2013-06-25 F19 FINAL release  (nirik, 19:39:49)

* Open Floor  (nirik, 19:40:38)
  * idea: do some vfads and focus on specific areas to get them done.
    (nirik, 19:42:41)

Meeting ended at 20:03:41 UTC.




Action Items
------------





Action Items, by person
-----------------------
* **UNASSIGNED**
  * (none)




People Present (lines said)
---------------------------
* nirik (117)
* skvidal (98)
* abadger1999 (38)
* lmacken (30)
* pingou (24)
* threebean (14)
* blackdeerranger (9)
* KasumiNinja (7)
* ssmoogen (5)
* zodbot (4)
* Adran (4)
* mdomsch (3)
* cyberworm54 (3)
* maayke (2)
* samkottler (1)
* SmootherFrOgZ (1)
* puiterwijk (1)
* smooge (0)
* ricky (0)
* dgilmore (0)
* CodeBlock (0)
--
18:59:59 <nirik> #startmeeting Infrastructure (2013-03-07)
18:59:59 <zodbot> Meeting started Thu Mar  7 18:59:59 2013 UTC.  The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:59:59 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
18:59:59 <nirik> #meetingname infrastructure
18:59:59 <zodbot> The meeting name has been set to 'infrastructure'
18:59:59 <nirik> #topic welcome everyone
18:59:59 <nirik> #chair smooge skvidal CodeBlock ricky nirik abadger1999 lmacken dgilmore mdomsch threebean
19:00:00 <zodbot> Current chairs: CodeBlock abadger1999 dgilmore lmacken mdomsch nirik ricky skvidal smooge threebean
19:00:21 <nirik> who all is around for a infrastructure meeting?
19:00:23 * lmacken 
19:00:35 * cyberworm54 here
19:00:37 * abadger1999 here
19:00:40 <blackdeerranger> here
19:00:41 * maayke here
19:00:48 <KasumiNinja> KasumiNinja here
19:01:11 * threebean is here
19:01:24 * nirik will wait another min for folks to wander in.
19:02:03 * pingou 
19:02:22 <nirik> ok, I guess we can go ahead and dive in. ;)
19:02:28 <nirik> #topic New folks introductions and Apprentice tasks
19:02:42 <nirik> any new folks want to introduce themselves? or apprentices with questions?
19:02:50 <blackdeerranger> me
19:02:52 <KasumiNinja> I'm new
19:03:06 <ssmoogen> here
19:03:12 * SmootherFrOgZ is here
19:03:21 <mdomsch> hola
19:03:27 <KasumiNinja> I work as  a sysadmin and like to help fedora with sysadmin tasks
19:03:34 <nirik> blackdeerranger / KasumiNinja: welcome. ;)
19:03:39 <blackdeerranger> :)
19:03:47 <KasumiNinja> :-)
19:04:03 <nirik> blackdeerranger: are you also interested in sysadmin side of things? or more application development?
19:04:07 <KasumiNinja> I have no programming experience
19:04:21 <blackdeerranger> I typed my introduction here: http://paste.fedoraproject.org/4574/80565136/
19:04:45 <blackdeerranger> I´m interested in sysadmin tasks
19:04:59 <nirik> one thing I'd like to mention here... I sent out my regular fi-apprentice feedback email and got some email back that highlighted how self driven we require people to be...
19:05:02 <blackdeerranger> I´m not a software developer
19:05:37 <nirik> cool.
19:06:13 <cyberworm54> http://paste.fedoraproject.org/4574/80565136/
19:06:19 <cyberworm54> oops sorry :)
19:06:22 <nirik> I'm wondering if we want to try and change that focus any, or if there are ways to better communicate it to new folks so they don't get confused when no one is assigning them specific tasks and asking for updates all the time.
19:07:16 <nirik> perhaps I will start a discussion on the list around that, just wanted to mention it here.
19:07:18 <blackdeerranger> As I unterstand it is a good staring point to join "fi-apprentice "
19:07:22 <maayke> nirik: I will send you email tomorrow :)
19:07:28 <nirik> blackdeerranger: yeah.
19:07:33 <blackdeerranger> good
19:08:09 <nirik> the thing is that we are not setup to do heavy mentoring... so joining usually means the new person has to be very focused on just going and doing things and bugging people.
19:08:22 <KasumiNinja> works fine for me
19:08:25 * samkottler is just getting here
19:09:00 <skvidal> something worth mentioning
19:09:06 <nirik> anyhow, will post a thread to the list about it. ;) just brainstorming.
19:09:08 <skvidal> given where fedora infra is going with config mgmt and c&c
19:09:15 <skvidal> anyone interested in the sysadmin side of things
19:09:27 <skvidal> would be best served by going to ansible.cc
19:09:35 <skvidal> and trying it out for themselves to be familiar
19:09:44 <nirik> absolutely. ;)
19:09:54 <skvidal> we have lots of examples at:  http://infrastructure.fedoraproject.org/cgit/ansible.git/tree/
19:10:08 <skvidal> but our examples are going to increasingly get more complicated, I suspect :)
19:10:28 <nirik> they always do.
19:10:29 <skvidal> so KasumiNinja, blackdeerranger: probably worth familiarizing yourself with this tool
19:10:55 <blackdeerranger> sounds interesting - thanks
19:11:09 <nirik> #info new folks would be advised to look into ansible.
19:11:16 <nirik> #topic Applications status / discussion
19:11:29 <KasumiNinja> great I'll look into it
19:11:29 <nirik> any exciting application side news this week or upcoming?
19:11:38 * pingou has mainly be bothering threebean
19:11:42 <nirik> #info fas-openid is now in production. Thanks puiterwijk. :)
19:11:43 <pingou> been*
19:11:51 <skvidal> KasumiNinja, blackdeerranger: great
19:11:53 <pingou> \ó/ well done puiterwijk
19:12:06 <threebean> we *just* got our first fedmsg message out of the secondary arch compose process
19:12:10 <abadger1999> Three cheers for puiterwijk :-)
19:12:16 <threebean> puiterwijk++
19:12:30 <nirik> #info fedmsg starting to be enabled for secondary arch compose processes.
19:12:48 <abadger1999> puiterwijk and I also discussed oauth to some length.  I'm organizing that into a message to send to the list.
19:12:57 <lmacken> I announced the fedmsg-notify today, so we'll hopefully see a lot more consumers
19:13:08 <abadger1999> An oauth server is what I think his next project is going to be.
19:13:19 <lmacken> I also fixed some long-standing push ordering bugs in the bodhi masher last week
19:13:36 <nirik> lmacken: yeah. Great post. ;)
19:13:41 <lmacken> thanks ☺
19:13:48 * Adran is here (late)
19:14:03 <nirik> #info fedmsg-notify blog post out there, hopefully more consumers
19:14:07 <pingou> nice lmacken
19:14:12 <nirik> lmacken: do we have any way to tell how many consumers there are?
19:14:18 <lmacken> https://admin.fedoraproject.org/haproxy/proxy01
19:14:19 <lmacken> and 02
19:14:19 <pingou> #link http://lewk.org/blog/fedmsg-notify.html
19:14:28 <lmacken> fedmsg-raw-zmq-outbound
19:14:29 <nirik> cool.
19:14:48 <pingou> lmacken: where is your blog plugin?
19:14:56 <lmacken> pingou: hrm?
19:15:02 <nirik> abadger1999: related. Have we given any thought about further expanding our 2 factor stuff to web apps or other places? Or haven't really explored that yet.
19:15:07 <abadger1999> mdomsch: Question -- is python-GeoIP still needed for mirrormanager?  I noticed that it was orphaned in Fedora/EPEL.
19:15:11 <pingou> lmacken: I saw tour twitter feed on your blog and I went... :)
19:15:45 <abadger1999> nirik: yes -- SmootherFrOgZ has been working up a patch to do 2-factor login to fas and support in python-fedora to take advantage of that.
19:15:53 <mdomsch> abadger1999: yep - largely used. I'll have to go adopt it
19:15:58 <abadger1999> nirik: I think that'll give us the first step.
19:16:28 <nirik> abadger1999: ah ha. I was confused as to what that work was about. Good.
19:16:38 <abadger1999> mdomsch: Cool.  If you need help, feel free to add me as a comaintainer and I'll put it on my once a cycle, look for updates list.
19:16:43 <nirik> note that also fas-openid has that PAPE plugin we could use for openid consuming applications.
19:16:49 <abadger1999> nirik: His work will be a first step.
19:17:03 <abadger1999> There's lots of other things that need to happen before it's "real"
19:17:21 <abadger1999> ie: at first we'll have both the single factor and 2-factor login.
19:17:28 <nirik> sure, yeah
19:17:36 <abadger1999> so you could circumvent by simply going to the other login page.
19:17:58 <abadger1999> But first we get it working, then we make it mandatory for some people.
19:18:08 * nirik nods.
19:18:16 <abadger1999> So -- continuing to make progress :-)
19:18:20 <lmacken> not sure if it was talked about last week, but we hit yet an issue with our tg1 apps where they would just silently block all requests. It happened after some dns change, but hopefully this change will fix it in the future. https://github.com/fedora-infra/python-fedora/pull/18
19:18:20 <nirik> also, that would only be yubikey?
19:18:49 <abadger1999> nirik: yeah -- since fas only knows about yubikey at this point.
19:18:54 <nirik> lmacken: ah yes, thanks for finding that. ;) We should try and get that fix rolled out to production...
19:18:58 <abadger1999> It's another thing we'll have to do to get this good to go.
19:19:13 <Adran> abadger1999: I might poke you later, I'd be interested in seeing if Google Authenticator could work, maybe I can work on something.
19:19:44 <abadger1999> Adran: cool.  It should.  We just have to integrate it more tightly with fas (we have basically a separate google authenticator setup right now)
19:19:54 <Adran> Ah.
19:20:01 <skvidal> Adran: could work where? we have support for google auth in totpcgi - it's available - just not integrated to fas
19:20:05 <skvidal> Adran: yes - what abadger1999 said :)
19:20:22 <nirik> so, next week is fudcon... how many of you will be out there?
19:20:24 <Adran> skvidal: Right. Maybe it can be integrated? :)
19:20:51 <threebean> nirik: s/fudcon/pycon/
19:20:58 <threebean> I'll be there :)
19:21:00 <nirik> yeah, sorry, brain failure. ;)
19:21:01 <lmacken> me too
19:21:50 <nirik> so, suggestion: we may want to be cautious about application changes tomorrow/the weekend... just so things are not possibly unstable with all you folks gone. ;)
19:22:37 <nirik> and hopefully you all have safe travels. :)
19:22:55 <nirik> any other application news?
19:23:10 <threebean> pingou started a really good conversation with hughsie about integrating tagger with AppStream
19:23:30 <threebean> and he's already written some code for it today.
19:23:38 <threebean> Pretty exciting.  :)
19:23:40 <nirik> cool. I saw some of that.
19:23:46 <nirik> I haven't looked at appstream yet much...
19:24:17 <nirik> #topic Sysadmin status / discussion
19:24:25 <abadger1999> I'll be there
19:25:03 <nirik> so, lets see... not too much on the sysadmin side that I can recall in detail. ;)
19:25:14 <skvidal> nirik: a few things
19:25:18 <nirik> #info ssmoogen reinstalled proxy01 as x86_64...
19:25:27 <nirik> (which is good, since it now matches all the other proxies)
19:25:32 <ssmoogen> and I think it worked
19:25:32 <skvidal> 1. we've been moving ahead on the ansible conversion
19:25:46 * mdomsch is going to need mmbapp01 to be x86_64 too due to s3cmd memory usage
19:25:59 <skvidal> 2. I've added a path lookup plugin to ansible that will allow us to have lookups for staging then production like we do now in puppet
19:26:19 <threebean> nice!
19:26:31 <pingou> very nice
19:26:36 <nirik> skvidal: yep. :) we need to test out some workflow there/make some simple playbooks for simple hosts to test things out.
19:26:48 <nirik> mdomsch: :( oh well... we can do that.
19:26:51 <skvidal> 3. more things moving into openstack and we're already running into limits  of our available resources
19:27:11 <pingou> oh :(
19:27:14 <skvidal> I have 4 more instances to finish transitioning and then we can start moving systems over to increase the available resources
19:27:30 <skvidal> which is great b/c that should double our available resources
19:28:01 <skvidal> nirik: for later - we might consider tinkering with multiple availability zones and osuosl02
19:28:04 <nirik> and add cinder volumes from each of the compute nodes.
19:28:15 <nirik> yep. That would be great.
19:29:02 <nirik> so, that bug that lmacken mentioned earlier... I was wondering, should we make app servers depend on their local proxy ?
19:29:19 <nirik> right now they hit admin.fedoraproject.org, which is dns round robin for all of them.
19:29:28 <nirik> but if they talked to their local proxy it might be faster.
19:29:36 <nirik> and also if that proxy is down, then likely they are too.
19:29:59 <lmacken> so appX->proxyX?
19:30:34 <nirik> yeah.
19:30:38 <ssmoogen> #info smooge is waiting to find out if physical memory arrived in PHX2 so we can give it to systesm
19:30:42 <nirik> app01/02/03/04 would hit proxy01
19:31:32 <nirik> anyhow, it's a thought. I don't think it's urgent.
19:31:46 <skvidal> nirik: query on this
19:32:04 <skvidal> would it make more sense for us to pursue the above? or for us to pursue breaking all the apps out?
19:32:36 <nirik> well, the above is pretty trivial if we want to do it. ;) Breaking apps out is still a longer term thing we really must do, IMHO.
19:32:54 <skvidal> nirik: fair enough..
19:33:02 <skvidal> if the proxies were running next to the app servers
19:33:07 <skvidal> like on the same hw
19:33:10 <skvidal> I';m inclined to say yes
19:33:22 <nirik> and I think breaking apps out kinda wants ansible to be ready to handle those apps.
19:33:25 <skvidal> but we don't really want an outage on proxy01 to kill the app servers
19:33:35 <nirik> yeah, true.
19:33:50 <nirik> well, lets leave it for now...
19:34:17 <nirik> #topic Private Cloud status update / discussion
19:34:20 <abadger1999> About the private cloud -- does this mean we're going to just use openstack going forward?
19:34:24 <nirik> we already hit on some of this above...
19:34:33 <skvidal> abadger1999: so right now here is what we have discussed
19:34:43 <skvidal> abadger1999: 1. moving the instances we have over to openstack
19:34:54 <skvidal> 2. moving 2 of the compute nodes over to openstack for additional resources
19:35:04 <skvidal> 3. taking the remaining 2 machines for other prototyping/testing
19:35:22 <skvidal> whether 3 is of openstack or of $something_else is really up to later discussion/decision
19:35:32 <abadger1999> <nod>
19:36:03 <ssmoogen> would adding another proxy to phx2 help?
19:36:07 <abadger1999> Okay so it seems like our "production" services are moving openstack but we're still testing out the alternatives ?
19:36:13 <nirik> #info moving things into openstack cloudlet
19:36:53 <nirik> #info will move some compute nodes from the other cloudlet over to it, and then have 2 nodes to continue testing other things with.
19:37:08 <nirik> abadger1999: yeah, or possibly we will use the other 2 to test the next openstack version...
19:37:14 <abadger1999> <nod>
19:38:08 <nirik> I think we can knock a bunch of things off our 'need to figure out before production' list now too.
19:38:24 <nirik> I'll look at cleaning up the wiki page on that, since I think we solved or decided many of them
19:38:50 <nirik> anything else on cloudlets?
19:39:04 <skvidal> no - but I had something for openfloor when that happens
19:39:28 <nirik> ok
19:39:33 <nirik> #topic Upcoming Tasks/Items
19:39:38 <nirik> #info 2013-03-07 remove inactive apprentices.
19:39:39 <nirik> #info 2013-03-12 to 2013-03-21 pycon
19:39:39 <nirik> #info 2013-03-19 to 2013-03-26 - koji update
19:39:39 <nirik> #info 2013-03-29 - spring holiday.
19:39:39 <nirik> #info 2013-04-02 to 2013-04-16 ALPHA infrastructure freeze
19:39:40 <nirik> #info 2013-04-16 F19 alpha release
19:39:41 <nirik> #info 2013-05-07 to 2013-05-21 BETA infrastructure freeze
19:39:43 <nirik> #info 2013-05-21 F19 beta release
19:39:45 <nirik> #info 2013-05-31 end of 1st quarter
19:39:47 <nirik> #info 2013-06-11 to 2013-06-25 FINAL infrastructure freeze.
19:39:49 <nirik> #info 2013-06-25 F19 FINAL release
19:39:51 <nirik> anything anyone would like to schedule or note?
19:40:11 <nirik> we have a little less than a month until alpha freeze.
19:40:38 <nirik> #topic Open Floor
19:40:42 <nirik> skvidal: you had something?
19:40:45 <skvidal> yah
19:40:53 <skvidal> something I was thinking about
19:40:53 <pingou> I sent somethings about jenkins, feedbacks welcome :)
19:41:01 <skvidal> in any given week we're all working on a billion things
19:41:06 <nirik> pingou: yeah, I have it marked to reply to. ;) Thanks.
19:41:15 <skvidal> and I was wondering if there was any thought to doing something like a virtual fad week
19:41:26 <skvidal> where we intend to focus on a few tasks and get them done
19:41:31 <nirik> skvidal: focusing on one area?
19:41:33 <pingou> sounds nice
19:41:34 <nirik> yeah.
19:41:35 <skvidal> right
19:41:39 <skvidal> we all are on irc
19:41:42 <skvidal> and most of the time all day long
19:41:55 <nirik> I think that kind of thing is very effective if we plan exactly what we want to try and do.
19:41:55 <skvidal> it seems like it would be very possible to schedulea vfad
19:41:57 <pingou> and most of us in a close timezone :)
19:42:00 <nirik> and make sure the needed people are available
19:42:09 <skvidal> hell, tie us toigether using a google hangout if need be
19:42:18 * threebean nods
19:42:20 <skvidal> but announce that we will be focused on one thing
19:42:28 <skvidal> and unavailable for random-ass pings
19:42:41 <nirik> #info idea: do some vfads and focus on specific areas to get them done.
19:42:48 <skvidal> I liked that we were able to knock out a specific problem last year for the 2fa fad
19:42:57 <skvidal> and I think we  should try it w/o the relocation
19:43:10 <skvidal> so what subjects would be things we could knock out?
19:43:20 <nirik> application logging?
19:43:27 * lmacken was just about to say that :P
19:43:30 <pingou> 2fa app wise
19:43:30 <threebean> perfect
19:43:53 <lmacken> breaking apps out
19:43:53 <skvidal> what else?
19:44:13 <threebean> ansible migration
19:44:43 <pingou> CLI logins for our web-app
19:44:47 <nirik> fedorahosted-ng
19:44:58 <pingou> but that comes back to the discussion abadger1999 wants to start :)
19:45:15 <lmacken> IDS
19:45:26 <pingou> IDS?
19:45:31 <lmacken> intrusion detection system
19:45:46 <pingou> ah, cool
19:45:48 * skvidal is making a list
19:45:51 <nirik> identify and plan how to get rid of our SPOF.
19:45:56 <skvidal> the db?
19:46:04 <skvidal> db-replication
19:46:11 <nirik> db's are a big one.
19:46:18 <pingou> spof?
19:46:19 <nirik> there might be other things tho.
19:46:24 <skvidal> single point of failure
19:46:26 <abadger1999> <nod>  I'd love to have some FADs on 2-fa and oauth.
19:46:27 <nirik> sorry, single point of failure.
19:46:34 <abadger1999> oauth needs some discussion first.
19:47:08 <abadger1999> 2-fa is closer to having a solid plan where a FAD type setting would really help
19:47:42 <skvidal> okay that's a good list start
19:47:43 <abadger1999> lmacken, for app logging -- do we know how to fix the problems we have?
19:48:04 * abadger1999 still hasn't found any reason we aren't getting all tracebacks in our logs.
19:48:35 <lmacken> abadger1999: I didn't know tracebacks were not appearing. Could be a few things.
19:48:49 <lmacken> I'd love to experiment with https://github.com/ryanpetrello/canary
19:49:12 <abadger1999> lmacken: that's like my number 1 problem with app logging since 2007 or so :-)
19:49:18 <pingou> spof?
19:49:20 <pingou> sorry
19:49:25 <lmacken> hmm, probably a simple ini or wsgi config tweak honestly
19:49:36 <nirik> getting all the data is a good first step, then to fix/reduce so it only tells us about real errors...
19:49:36 <ssmoogen> single point of failure
19:49:41 <skvidal> pingou: places where if that server/service dies then everything dies
19:49:52 <skvidal> nirik: I'll add another item to our list
19:49:53 <skvidal> nirik: nagios
19:50:01 <nirik> I know some apps really send a ton of stuff... I think fas sends to error_log on every successfull login.
19:50:18 <nirik> yeah. I have done some work on nagios, but there is more to plan out and do.
19:50:25 <pingou> skvidal: I hit arrow up/enter in the wrong window, sorry for the noise
19:50:37 <skvidal> pingou: oh - I thought you were still wondering what that means
19:50:57 <abadger1999> lmacken: I would think so... but I know you've looked at it a ton of times nad we've only succeeded in over logging things that we don't care about :-/
19:51:47 <nirik> I'd love to get to the point where someone says "hey, I just hit $app and got a 500" and we can easily see a traceback to attach to a ticket about it. ;)
19:52:02 <skvidal> so
19:52:05 <lmacken> abadger1999: potential pycon hackfest item :)
19:52:05 <skvidal> wrt app logging
19:52:10 <skvidal> it seems, to me, somewhat obvious
19:52:14 <skvidal> that if we break out app servers
19:52:18 <skvidal> logging becomes A LOT simpler
19:52:21 <nirik> yep.
19:52:22 <skvidal> doesn't it?
19:52:34 <nirik> if the log is from foobarapp01 it's likely caused by foobarapp
19:52:34 <threebean> yup
19:52:35 <skvidal> since isolating the logs for tagger won't involve sifting through a bunch of other logs
19:52:41 <abadger1999> skvidal: for some definition of a lot.
19:52:44 <abadger1999> yeah
19:52:45 <skvidal> (tagger was just an example)
19:52:51 <abadger1999> that's about 1/3 of the problem i think.
19:53:09 <skvidal> abadger1999: what are the other 2/3rds?
19:53:12 <lmacken> centralizing
19:53:13 <lmacken> analyzing
19:53:30 <abadger1999> another 1/3 is getting logs consolidated per service rather than per host.
19:53:32 <lmacken> realtime notifications of Bad Errors
19:53:46 <lmacken> trending
19:53:53 <abadger1999> ie: fas is its on boxes but it's still hard to search for the traceback because it could be on fas1,2,3
19:54:08 <abadger1999> *on its own boxes
19:54:20 <skvidal> okay - a couple of things to note - with our existing logging configuration on log02
19:54:26 <skvidal> right now we have 2 major logging groups
19:54:27 <pingou> load balancing makes it harder for sure
19:54:27 <skvidal> per-host
19:54:28 <skvidal> and merged
19:54:38 <skvidal> there is nothing stopping us from grouping those logs, too
19:54:43 <skvidal> ie: fas
19:54:46 * nirik nods
19:54:47 <skvidal> apps
19:55:02 <lmacken> hookup up the SyslogHandler for each app is still on the TODO
19:55:04 <skvidal> so you'd end up with consolidated syslogs/app logs int /var/log/groups/fas/
19:55:49 <abadger1999> skvidal: It's syslog based for the apache logs?  So we can have a single log file for a service?  Because that would be really nice.
19:55:59 <skvidal> abadger1999: well we have the app log now
19:56:34 <skvidal> which is only for apps which are using it
19:56:42 <nirik> which isn't many
19:56:42 <lmacken> bodhi in stg atm, iirc
19:56:43 <skvidal> istr it is local4
19:56:54 <skvidal> (that's the log facility)
19:56:57 * lmacken hasn't had cycles to wrap that up
19:57:09 <skvidal> yes local4
19:57:16 <skvidal> abadger1999: so we have 2 options there
19:57:25 <lmacken> (documented here: https://fedoraproject.org/wiki/Infrastructure/AppBestPractices#Centralized_logging)
19:57:41 <skvidal> 1. setup apache on the app servers to emit all error logs to logger on local4
19:57:52 <skvidal> 2. figure out a nicer way to setup apache/our apps
19:58:00 <skvidal> (or something in between)
19:58:08 <skvidal> I'd like to suggest one more avenue
19:58:11 <skvidal> that will require testing
19:58:13 <skvidal> and discussion
19:58:24 <skvidal> but is this - if there is a non-syslog mechanism for getting apache logs off of systems
19:58:33 <skvidal> let's hear about it
19:59:05 <nirik> we could also work on reducing our syslogs....
19:59:21 <skvidal> for example if there is a way to use 0mq to emit logs sanely
19:59:27 <skvidal> and to integrate it at the apache layer
19:59:37 <skvidal> I'd be inclined to pay attention to it, personally.
19:59:50 <skvidal> but
19:59:52 <skvidal> 1. it needs to work
19:59:57 <lmacken> where sanely == reliably w/o risk of losing messages
19:59:58 * puiterwijk is finally home and online
19:59:58 <skvidal> 2. be fairly reliable under load
20:00:05 <skvidal> lmacken: :)
20:00:09 <nirik> yeah.
20:00:23 <skvidal> lmacken: it can lose a few - but ideally the ring buffer that rsyslog uses would be the most desireable
20:01:00 <lmacken> sounds like a logging vfad would be a good idea ☺
20:01:01 <nirik> skvidal: so, can you post your vfad list and thoughts around those to the list and we can look at picking one and scheduling it?
20:01:13 * nirik nods. logging seems popular
20:01:17 * lmacken going to Monitorama after PyCon, so may have more ideas later this month
20:01:22 <skvidal> nirik: yes
20:01:34 <nirik> thanks.
20:01:34 <abadger1999> nirik: careful though -- logging is only popular because it's such a pain :-)
20:01:39 <nirik> indeed.
20:01:55 <skvidal> it's only a pain for you crazy kids and your new-fangled webapps ;)
20:02:00 <skvidal> :)
20:02:00 <nirik> ok, any other open floor items before we close out?
20:02:23 <threebean> hm, I'd put a vote in for splitting appservers first.  It might make fixing logging easier.
20:02:26 <abadger1999> I think for logging -- we should have a plan (like:  reconfigure all apps to log to syslog local4).  Then the vfad concentrates on doing that plan.
20:02:57 <nirik> yeah.
20:02:58 <pingou> +
20:03:38 <nirik> ok, thanks for coming everyone. Do continue on #fedora-admin, #fedora-apps, and #fedora-noc.
20:03:41 <nirik> #endmeeting

Attachment: signature.asc
Description: PGP signature

_______________________________________________
infrastructure mailing list
infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/infrastructure

[Index of Archives]     [Fedora Development]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux