18:00:00 <nirik> #startmeeting Infrastructure (2015-06-25)
18:00:00 <zodbot> Meeting started Thu Jun 25 18:00:00 2015 UTC.  The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:00:00 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
18:00:00 <nirik> #meetingname infrastructure
18:00:00 <nirik> #topic aloha
18:00:00 <nirik> #chair smooge relrod nirik abadger1999 lmacken dgilmore mdomsch threebean pingou puiterwijk pbrobinson
18:00:00 <zodbot> The meeting name has been set to 'infrastructure'
18:00:00 <zodbot> Current chairs: abadger1999 dgilmore lmacken mdomsch nirik pbrobinson pingou puiterwijk relrod smooge threebean
18:00:01 <nirik> #topic New folks introductions / Apprentice feedback
18:00:13 * relrod here
18:00:15 * lmacken 
18:00:37 <nirik> any new folks like to introduce themselves?
18:00:43 <nirik> or apprentices with questions or comments?
18:00:46 <ardian> here o/
18:00:52 * abompard here
18:01:38 <nyazdani> well I'm Nate, the new RH intern, but I believe that I've met most of you  before
18:01:50 <nirik> hey nyazdani. welcome.
18:01:53 * threebean 
18:01:59 <nirik> #topic GSoC student update - kushal
18:02:05 <nirik> any GSoC updates?
18:02:48 <sonalkr132> Kusal doesn't seem to be here. Who here among students?
18:03:00 <AnuradhaW> Hi, I'm here. I'm working on integrating the styles I have coded for askFedora main and Q/A pages.
18:03:02 * Corey84 .
18:03:18 <Shad0w_Crux> << GSoC
18:03:33 <Corey84> welcome Shad0w_Crux
18:03:37 <Shad0w_Crux> Currently working on implementing the UI this week. (For Rolekit)
18:03:58 <Shad0w_Crux> Been going back and forth on different approaches, still working at it.
18:04:03 <AnuradhaW> I have met with some problems with the DJango framework as I'm quite new to this framework. So, I'm researching more on the exact process I need to follow with integration.
18:04:10 <nirik> Shad0w_Crux: thanks for the update.
18:04:23 <nirik> AnuradhaW: ok. Thanks.
18:04:24 <sonalkr132> As of me, this week I added ability to add member to projects
18:04:46 <nirik> sonalkr132: great.
18:04:51 <sshagarwal> Hi
18:04:54 <nirik> any other students?
18:04:57 <sonalkr132> This was long awaited feature and quite major one.
18:05:08 <sshagarwal> I am left
18:05:22 <Corey84> any updates sshagarwal
18:05:41 <sshagarwal> I have gone through the pieces required in implementing the ptotocol
18:05:47 <sshagarwal> *protocol
18:05:58 <prth> hi
18:06:06 <sshagarwal> And I am done with the incoming server
18:06:15 <nirik> reminder: do post blog posts about your progress. :) Thats a nice way to explain in more detail what you are working on.
18:06:33 <nirik> thanks sshagarwal. Any other students?
18:06:42 <sshagarwal> I am on the msg store (the way the messages will be stored on disk) implementayion
18:06:55 <sshagarwal> *implementation
18:07:04 <prth> i have implemented cropping & downloading of wallpaper
18:07:52 <threebean> #link https://prthp.wordpress.com/2015/06/25/crop-complete/
18:07:56 <prth> also added the global resize buttons & UI tweaks
18:08:02 <nirik> great. ;) thanks prth
18:08:15 <nirik> is that all the GSoC folks who are present?
18:08:35 * nirik will move on to announcements/info then...
18:08:39 <sshagarwal> Sshagarwal@blogspot.com
18:08:39 <prth> thanks threebean nirik
18:09:14 <nirik> #topic announcements and information
18:09:14 <nirik> #info Large outage and issues last thursday-saturday, great work fixing things - everyone
18:09:15 <nirik> #info work moving forward on people01 replacement for people03 - kevin
18:09:15 <nirik> #info Mailman3 migrations started - abompard
18:09:15 <nirik> https://fedoraproject.org/wiki/Mailman3_Migration
18:09:16 <nirik> https://fedoraproject.org/wiki/User:Abompard/HyperKittyDeploymentPlan
18:09:17 <nirik> #info packaging fedmsg for python3 underway (for the mailman plugin) - ralph
18:09:19 <nirik> #info koschei now in production - koschei team
18:09:21 <nirik> #info kevin will be out from 2015-06-27 to 2015-07-05 - kevin
18:09:23 <nirik> #info fedora-tagger performance problems fixed.  Should be usable again.  - ralph
18:09:25 <nirik> #info umdl has been (and is still) running with --delete and freeing 40MB in the database - adrian
18:09:27 <nirik> #info new MM2 release fixes bug which disabled (admin_active=false) mirrors
18:09:29 <nirik> #link https://apps.fedoraproject.org/tagger
18:09:33 <nirik> #info Fedora Infra cloud now at https://fedorainfracloud.org/ - Patrick
18:09:35 <nirik> #info Please ping puiterwijk if you don't have your password by tomorrow and expected one - Patrick
18:09:37 <nirik> bunches of info... ;)
18:09:59 <abompard> any preferred order?
18:10:05 <puiterwijk> #info Patrick will be out from 2015-06-26 to 2015-06-26
18:10:06 <nirik> ok, anything in there anyone like to discuss further or note.
18:10:20 <abompard> yep, I could start
18:10:21 <puiterwijk> err, to 2015-06-28
18:10:52 <nirik> abompard: ok, the process we have (look at the gobby doc) is to put all the informational/status stuff into a section.
18:11:02 <nirik> then have discussions next based on what people want to discuss.
18:11:23 <abompard> ah, yeah, sorry, gobby first
18:11:42 <nirik> https://fedoraproject.org/wiki/Gobby has access info.
18:11:58 <nirik> we can add your discussion items to the end of the discussion list. :)
18:12:00 <nirik> #topic Jenkins migration to the new cloud - mizdebsk
18:12:02 <abompard> yeah, did it last time, forgot this time :-)
18:12:09 <nirik> mizdebsk: you wanted to bring this up?
18:12:17 <mizdebsk> so, we need to migrate jenkins to the new cloud
18:12:22 <nirik> yep.
18:12:39 <mizdebsk> i thought it would be a good oportunity to also start using our packaged jenkins from fedora repos
18:12:42 <nirik> right now the ones in the old cloud are a el6 master, and el7/f20/el6 slaves
18:13:01 <mizdebsk> there may be some missing plugins, but we can resolve that
18:13:04 <nirik> http://jenkins.cloud.fedoraproject.org/
18:13:24 <nirik> so, can we still have el builders with a fedora master?
18:13:35 <mizdebsk> packaged jenknis must be installed only on master - slaves will download and run code from master
18:13:49 <nirik> ok, and so master we would want to do f22 probibly?
18:13:58 <mizdebsk> so we can have master running, lets say f22, and slaves can be el6/7 or anything
18:14:21 <nirik> sounds reasonable to me. ;)
18:14:34 <mizdebsk> later we can decide what do to about el7 master
18:14:41 <mizdebsk> (epel or scl or something else)
18:15:07 <nirik> I don't mind fedora as the master as long as we have people willing to upgrade it and keep it working on newer.
18:15:09 <puiterwijk> mizdebsk: just curious, but can Jenkins also spin up Openstack instances when needed? Just thinking it would be interresting if it did, as we would have as many builders as we need at any moment and none more
18:15:30 <mizdebsk> so i would like to create two new instances in the new cloud (one for master and one for slave) and try deploying packaged jenkins
18:15:35 <nirik> puiterwijk: i think I saw a plugin... not sure tho
18:15:52 <mizdebsk> puiterwijk: jenkins has hundreds of plugins, i'm pretty sure there is some for openstack
18:16:18 <nirik> I'm fine with this plan. Any objections?
18:16:30 <puiterwijk> none from me, sounds good to me
18:16:41 <mizdebsk> i can volunteer to work on this (unless this is urgent and someone else wants to take this)
18:17:04 <nirik> I don't think its super urgent... and that would be great if you wanted to work on it. ;)
18:17:20 <mizdebsk> great, i will post more details on the mailing list
18:17:23 <nirik> it should be pretty easy to setup. We have our persistent cloud playbooks working pretty well now.
18:17:25 <puiterwijk> sure, much appreciated. I would be glad to help with it
18:17:26 <threebean> I like it.. ;)
18:17:44 <nirik> thanks mizdebsk!
18:17:55 <nirik> #topic - mdomsch - Retire MM 1.4.4 from Fedora and EPEL repos
18:18:01 <nirik> so we deferred this from last week.
18:18:11 <nirik> pingou: are you around ? or adrianr ?
18:18:23 <nirik> any thoughts on this? IMHO we should take over the package and push mm2 to it.
18:18:32 <Corey84> +1  for voluntering (plus learnign openstack)
18:19:11 <smooge> nirik, what do you mean by push mm2 to it?
18:19:14 <Corey84> pus mm2 to what ?
18:19:24 <Corey84> the repos?
18:19:28 <nirik> the package in fedora/epel.
18:19:34 <puiterwijk> smooge: push mm2 code to the mirrormanager package in Fedora/EPEL
18:19:39 <nirik> except leave the epel6 one along
18:19:49 <smooge> ah ok
18:19:50 <nirik> and the fedora 21/22 ones. just push to rawhide and epel7
18:20:04 <smooge> I don't think pingou is around
18:20:07 <nirik> I'll just mail them about this
18:20:23 <nirik> #info nirik to mail involved parties.
18:20:26 <nirik> #topic - Mailman3 / HyperKitty migration started.
18:20:31 <nirik> abompard: you're up. ;)
18:20:50 <abompard> thanks :-) I've written a status report in the Gobby doc
18:21:04 <abompard> I don't think there's much discussion to have, it's more of an FYI
18:21:05 <nirik> we can just dump it here if you like...
18:21:21 <abompard> I'll summarize
18:21:26 <nirik> ok
18:21:35 <abompard> A first batch of automated lists were migrated
18:21:44 <abompard> but I hit a couple bug, one is blocking
18:21:58 <abompard> there's a missing feature in mailman3: topic subscriptions
18:22:14 <abompard> it's not widely used but it's heavily used on the package-announce list
18:22:20 <abompard> so I rolled it back
18:22:44 <abompard> Further migration depends on fixing two things
18:23:10 <abompard> this missing feature, and a bug in Postorius that will cause a 500 error if you try to link your address to an existing one
18:23:18 <abompard> it's more of a missing feature too really
18:23:37 <nirik> it would be very nice to have tho. ;)
18:23:53 <abompard> Also, the migration of the first lists was not properly announced, I'll let you know when I've fixed those problems and am ready to move more lists
18:24:01 <nirik> well, actually I think we have to have it because people will login with user@fedoraproject.org but likely won't have lists under that address.
18:24:13 <nirik> (or some people won't anyhow)
18:24:16 <abompard> nirik: my thinking too
18:24:38 <abompard> At least I've already made it so the postorius login page is the same as HyperKitty's
18:24:44 <nirik> thanks abompard. :) keep us posted.
18:24:45 <abompard> so you get the nice Fedora login button
18:24:50 <nirik> ah good.
18:24:51 <abompard> sure
18:25:03 <nirik> anything else on this?
18:25:06 <abompard> nope
18:25:07 <abompard> thanks
18:25:15 <smooge> abompard, if someone needs to set this up fro another project... how hard is it to change login pages and such?
18:25:15 <nirik> #topic leader for next week's meeting - kevin
18:25:20 <nirik> oops.
18:25:22 <nirik> #undo
18:25:22 <zodbot> Removing item from minutes: <MeetBot.items.Topic object at 0x2c39e850>
18:25:26 <smooge> sorry
18:25:27 <nirik> go ahead. ;)
18:25:35 * Corey84 is in and out for next  ~10 mins
18:25:42 <abompard> smooge: it should be a simple change in the config file
18:26:06 <abompard> smooge: actually Django has a mechanism for that problem but neither Postorius nor HyperKitty were using it properly
18:26:07 <smooge> ok thanks. I got asked to see about setting it up for a couple of projects so wanted to get an idea of what I ened to do
18:26:20 <abompard> smooge: also, I need to send those pull requests
18:26:24 <abompard> and get it accepted
18:26:39 <smooge> ok so will talk with you about it in a couple of weeks?
18:26:49 <abompard> I'm trying to keep the fedora-specific bits to a minimum
18:26:57 <nirik> excellent.
18:26:58 <abompard> sometimes that means things take a bit longer
18:27:04 <smooge> yeah understood. thanks for the info abompard
18:27:16 <abompard> smooge: sure, feel free to hit me up when you need
18:27:27 <nirik> #topic leader for next week's meeting - kevin
18:27:38 <nirik> ok, I am out from this saturday to next saturday.
18:27:45 <nirik> Would someone like to run the meeting next week? :)
18:27:52 <smooge> I was planning on being Al Haig and run the meeting
18:27:53 <puiterwijk> I can run it
18:28:02 <puiterwijk> oh, go ahead smooge
18:28:09 <smooge> or I can let puiterwijk do so and I can be Dick Cheney
18:28:12 <nirik> ok, thanks much smooge
18:28:16 <nirik> or puiterwijk. ;)
18:28:23 <nirik> you two can duel for it.
18:28:39 <Corey84> lol
18:28:42 <puiterwijk> heh. We'll fight it out, but we got it covered I think :)
18:28:51 <smooge> I guess puiterwijk will have to choose the weapons.. I expect it will be fsck.ext2 at 20 paces
18:29:03 <threebean> smooge++
18:29:05 <nirik> :)
18:29:17 <nirik> ok, as long as one of you does it. great.
18:29:21 <nirik> #topic Learn about: Nagios
18:29:28 <nirik> smooge: you wanted to talk to us about nagios today?
18:29:42 <smooge> Hi everyone. I have some items I wrote up about nagios that I will paste in channel
18:29:55 <smooge> I will pause after every paragraph and will answer questions at the end.
18:30:05 <smooge> Our monitoring solution has been Nagios for at least the last 6
18:30:06 <smooge> years. We have tried a couple of other ones, but found that they were
18:30:06 <smooge> lacking some of the script-ability and lack of needing a database
18:30:06 <smooge> backend that Nagios gave us.
18:30:16 <smooge> We have 2 Nagios servers, one in our central PHX2 location and one
18:30:17 <smooge> exterior at Ibiblio. The internal one monitors services that can only
18:30:17 <smooge> be seen inside the network and exterior one tries to see things as a
18:30:17 <smooge> 'consumer' of Fedora would see things. This can lead to 'Why am I
18:30:17 <smooge> getting alerts?' when everything looks fine from inside of Fedora but
18:30:18 <smooge> by reading the alert you can see some come from noc01 (internal
18:30:19 <smooge> http://admin.fedoraproject.org/nagios/ oauth required) or from noc02
18:30:21 <smooge> (external http://admin.fedoraproject.org/nagios-external)
18:30:27 <smooge> Our Nagios setup uses the Nagios Remote Plugin Executor (nrpe) for
18:30:29 <smooge> most system checks on servers that are being monitored. This is done
18:30:31 <smooge> over SNMP to try and cut down the number of services required on each
18:30:33 <smooge> service and possible security issues with SNMP. In general a box
18:30:35 <smooge> registered in nagios sees that it does not have 'excessive' number of
18:30:37 <smooge> processes, excessive amounts of disk space used, and some other
18:30:39 <smooge> general items. Particular servers will have more localized checks like
18:30:41 <smooge> 'is httpd running?' 'does the webpage work', 'is our metadata
18:30:45 <smooge> correct?' etc
18:30:47 <smooge> Anyone who has worked in Fedora Infrastructure will have noticed that
18:30:49 <smooge> the number of alerts have gone down astronomically in the last 3
18:30:51 <smooge> months. This was due to a HUGE effort by Kevin F to change getting
18:30:53 <smooge> alerts when and who. So now instead of getting an alert anytime a slow
18:30:55 <smooge> httpd restart happens, we only get emails and pages if it lasts longer
18:30:57 <smooge> than X minutes. This has reduced pager fatigue quite a lot.
18:31:07 <smooge> Configuration of nagios is done via ansible in our public
18:31:08 <smooge> repository. Anyone who wants to see how we are doing things (or not
18:31:08 <smooge> doing things) can view it from our git repository
18:31:08 <smooge> https://infrastructure.fedoraproject.org/cgit/ansible.git
18:31:16 <smooge> ....
18:31:25 <smooge> it looked so much better in my emacs window
18:31:38 <smooge> sorry about that..
18:32:09 <nirik> to expand on the notification changes a while back, the first alert just goes to irc in #fedora-noc. If the problem persists for 10min the next alerts go to email/pagers/and irc, and do so every hour after that until acked or recovered.
18:32:58 <smooge> Any questions from people on nagios. Also suggestions on how I can present this better in the future?
18:33:51 * nirik thinks that all looks good from a high level.
18:33:54 <threebean> hm.  I'll hazard a statement:  writing nagios checks is a good opportunity for new contributors who are sysadmin-types but want to get into dev or are dev-types that want to get into sysadmin.
18:34:09 <jcvicelli> Is there a wiki?
18:34:28 <randomuser> turn off column width in $editor, use newlines organically :P
18:35:09 <nirik> threebean: yeah, agreed. They can be complex, but if you use 'git grep' and look at a specific host you can see the places you would need to add a new one.
18:35:20 <nirik> jcvicelli: on nagios config?
18:35:26 <nirik> randomuser: you docs person you
18:35:33 <smooge> randomuser, well I did that because it pasted everything as one huge line when I tested in another channel. It looked even worse. but I will experiment on fixing it
18:35:36 <jcvicelli> Yes
18:35:45 <randomuser> just teasing, smooge :)
18:36:08 <randomuser> jcvicelli, https://infrastructure.fedoraproject.org/infra/docs/nagios.rst is a good start
18:36:20 <jcvicelli> Cool
18:36:42 * smooge has 'learn to write rst' on his afternoon work list
18:37:19 <mizdebsk> are generic checks (like free mem or disk storage) performed automatically for every machine known to nagios? only host-specific checks need to be added explicitly?
18:37:39 <nirik> mizdebsk: yeah, there's a 'servers' group with a bunch of 'standard' checks in it...
18:38:07 <randomuser> I've seen some political-type reasons to use a nagios fork instead of Nagios proper, has there been any evaluation of or discussion about them ?
18:38:31 <nirik> randomuser: we are using the version in epel, no one has landed any of the forks. ;)
18:38:34 <nirik> if they did we could.
18:38:40 <mizdebsk> can someone who is neither sysadmin-main nor sysadmin-noc (like me) acknowledge an alert?
18:38:49 <randomuser> fair enough
18:39:26 <nirik> mizdebsk: there's a list I think.
18:39:26 <smooge> mizdebsk, I thought it was just sysadmin-noc but that was a while ago
18:39:31 <puiterwijk> mizdebsk: no, only the ones on the nagios list can do so
18:39:40 <puiterwijk> the list is in ansible
18:40:06 <nirik> https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/nagios_server/files/nagios/cgi.cfg
18:40:24 <mizdebsk> lets say i want to schedule an outage, how can i disable nagios checks for given hosts in advance?
18:40:55 <nirik> mizdebsk: you would need to get added there, then you can schedule 'downtime'
18:40:56 <puiterwijk> you can ping one of the people on that list
18:41:16 <nirik> you can also do it in ansible playbooks, we have some examples where it sets downtime for a host or service before doing something.
18:41:36 <mizdebsk> ok, thx for answers, i have no more questions
18:41:55 <nirik> for example the playbooks/groups/notifs-backend.yml
18:41:56 <nirik> playbook
18:42:19 <smooge> we also were pretty generous of putting people in sysadmin-noc in the past
18:42:47 <nirik> yeah, it's kind of the next step up from apprentice for sysadmin stuff.
18:42:50 <smooge> it was sort of our 'apprentice' group at one point
18:43:05 <nirik> yeah, that too
18:43:23 <smooge> so should people with rbac access also be in it?
18:43:53 <nirik> in which, the nagios cgi list?
18:44:19 <smooge> be in sysadmin-noc
18:44:30 <smooge> sorry I will talk after meeting ..
18:44:43 <nirik> sure, not sure I am following, but also out of coffee. ;)
18:44:43 <smooge> random brain firing in the middle of the meeting doesn't keep on target
18:45:39 <nirik> any other nagios questions from anyone for smooge ?
18:45:52 <smooge> not from me :)?
18:46:27 <nirik> thanks smooge!
18:46:34 <nirik> #topic Open Floor
18:46:38 <nirik> any items for open floor?
18:46:54 * tflink has one thing he forgot about until a few minutes ago
18:47:30 <randomuser> smooge++
18:47:30 <zodbot> randomuser: Karma for smooge changed to 5:  https://badges.fedoraproject.org/tags/cookie/any
18:47:30 <randomuser> thanks!
18:47:34 <nirik> sure, fire away
18:47:37 <tflink> as we move our phabricator instance from the old cloud to infra machines, I'm debating making it less qa specific and opening it up to other fedora groups
18:47:54 <tflink> so instead of using the current qadevel.fp.o hostname, it'd be something like phab.fp.o
18:48:07 <nirik> sure, we could do that if you like.
18:48:16 <tflink> just curious if there were any thoughts on if that's a good/bad idea
18:48:33 <nirik> well, it might mean you have more support burden... if it's popular, etc.
18:48:39 * tflink is still looking into how much work it'd be to make that happen
18:48:47 <nirik> but I have no idea how much it would be really
18:49:08 * tflink suspects that it wouldn't be a problem unless it got popular to the point where one machine couldn't keep it all
18:49:16 <randomuser> tflink, I need to make time to catch up with you on a portion of that... buildbot packaging and ansible stuff
18:49:30 <nirik> yeah
18:49:39 <jcvicelli> Just fyi guys, im looking for easy fixes to work, but if anyone needs a hand, i can help, i have some time free
18:49:59 <pingou> jcvicelli: someone was just asking for a script to port trac ticket to pagure :)
18:50:08 <nirik> jcvicelli: cool. :) I keep meaning to file more easyfixes, but never get around to it. perhaps I will try this week
18:50:16 <nirik> pingou: oh, nice... yeah
18:50:16 <tflink> like I said, I'm still getting a better idea of how much time I'd have to put into packaging etc. to make that happen but mostly wanted to see if there were objections before I went farther
18:50:35 * nirik doesn't have any objections really.
18:50:35 <pingou> tflink: sounds cool to me :)
18:50:46 <nyazdani> i think it's a really good idea
18:51:07 <tflink> randomuser: let me know when you have time
18:51:28 <nirik> cool. any other items? if not will close out in a minute here...
18:51:34 <randomuser> tflink, will do, busy in the short term but I did want to at least register the intent :)
18:52:14 <tflink> randomuser: no worries
18:53:06 <nirik> ok, thanks for coming everyone!
18:53:09 <nirik> #endmeeting