18:00:00 #startmeeting Infrastructure (2015-06-25) 18:00:00 Meeting started Thu Jun 25 18:00:00 2015 UTC. The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:00 Useful Commands: #action #agreed #halp #info #idea #link #topic. 18:00:00 #meetingname infrastructure 18:00:00 #topic aloha 18:00:00 #chair smooge relrod nirik abadger1999 lmacken dgilmore mdomsch threebean pingou puiterwijk pbrobinson 18:00:00 The meeting name has been set to 'infrastructure' 18:00:00 Current chairs: abadger1999 dgilmore lmacken mdomsch nirik pbrobinson pingou puiterwijk relrod smooge threebean 18:00:01 #topic New folks introductions / Apprentice feedback 18:00:13 * relrod here 18:00:15 * lmacken 18:00:37 any new folks like to introduce themselves? 18:00:43 or apprentices with questions or comments? 18:00:46 here o/ 18:00:52 * abompard here 18:01:38 well I'm Nate, the new RH intern, but I believe that I've met most of you before 18:01:50 hey nyazdani. welcome. 18:01:53 * threebean 18:01:59 #topic GSoC student update - kushal 18:02:05 any GSoC updates? 18:02:48 Kusal doesn't seem to be here. Who here among students? 18:03:00 Hi, I'm here. I'm working on integrating the styles I have coded for askFedora main and Q/A pages. 18:03:02 * Corey84 . 18:03:18 << GSoC 18:03:33 welcome Shad0w_Crux 18:03:37 Currently working on implementing the UI this week. (For Rolekit) 18:03:58 Been going back and forth on different approaches, still working at it. 18:04:03 I have met with some problems with the DJango framework as I'm quite new to this framework. So, I'm researching more on the exact process I need to follow with integration. 18:04:10 Shad0w_Crux: thanks for the update. 18:04:23 AnuradhaW: ok. Thanks. 18:04:24 As of me, this week I added ability to add member to projects 18:04:46 sonalkr132: great. 18:04:51 Hi 18:04:54 any other students? 18:04:57 This was long awaited feature and quite major one. 18:05:08 I am left 18:05:22 any updates sshagarwal 18:05:41 I have gone through the pieces required in implementing the ptotocol 18:05:47 *protocol 18:05:58 hi 18:06:06 And I am done with the incoming server 18:06:15 reminder: do post blog posts about your progress. :) Thats a nice way to explain in more detail what you are working on. 18:06:33 thanks sshagarwal. Any other students? 18:06:42 I am on the msg store (the way the messages will be stored on disk) implementayion 18:06:55 *implementation 18:07:04 i have implemented cropping & downloading of wallpaper 18:07:52 #link https://prthp.wordpress.com/2015/06/25/crop-complete/ 18:07:56 also added the global resize buttons & UI tweaks 18:08:02 great. ;) thanks prth 18:08:15 is that all the GSoC folks who are present? 18:08:35 * nirik will move on to announcements/info then... 18:08:39 Sshagarwal@blogspot.com 18:08:39 thanks threebean nirik 18:09:14 #topic announcements and information 18:09:14 #info Large outage and issues last thursday-saturday, great work fixing things - everyone 18:09:15 #info work moving forward on people01 replacement for people03 - kevin 18:09:15 #info Mailman3 migrations started - abompard 18:09:15 https://fedoraproject.org/wiki/Mailman3_Migration 18:09:16 https://fedoraproject.org/wiki/User:Abompard/HyperKittyDeploymentPlan 18:09:17 #info packaging fedmsg for python3 underway (for the mailman plugin) - ralph 18:09:19 #info koschei now in production - koschei team 18:09:21 #info kevin will be out from 2015-06-27 to 2015-07-05 - kevin 18:09:23 #info fedora-tagger performance problems fixed. Should be usable again. - ralph 18:09:25 #info umdl has been (and is still) running with --delete and freeing 40MB in the database - adrian 18:09:27 #info new MM2 release fixes bug which disabled (admin_active=false) mirrors 18:09:29 #link https://apps.fedoraproject.org/tagger 18:09:33 #info Fedora Infra cloud now at https://fedorainfracloud.org/ - Patrick 18:09:35 #info Please ping puiterwijk if you don't have your password by tomorrow and expected one - Patrick 18:09:37 bunches of info... ;) 18:09:59 any preferred order? 18:10:05 #info Patrick will be out from 2015-06-26 to 2015-06-26 18:10:06 ok, anything in there anyone like to discuss further or note. 18:10:20 yep, I could start 18:10:21 err, to 2015-06-28 18:10:52 abompard: ok, the process we have (look at the gobby doc) is to put all the informational/status stuff into a section. 18:11:02 then have discussions next based on what people want to discuss. 18:11:23 ah, yeah, sorry, gobby first 18:11:42 https://fedoraproject.org/wiki/Gobby has access info. 18:11:58 we can add your discussion items to the end of the discussion list. :) 18:12:00 #topic Jenkins migration to the new cloud - mizdebsk 18:12:02 yeah, did it last time, forgot this time :-) 18:12:09 mizdebsk: you wanted to bring this up? 18:12:17 so, we need to migrate jenkins to the new cloud 18:12:22 yep. 18:12:39 i thought it would be a good oportunity to also start using our packaged jenkins from fedora repos 18:12:42 right now the ones in the old cloud are a el6 master, and el7/f20/el6 slaves 18:13:01 there may be some missing plugins, but we can resolve that 18:13:04 http://jenkins.cloud.fedoraproject.org/ 18:13:24 so, can we still have el builders with a fedora master? 18:13:35 packaged jenknis must be installed only on master - slaves will download and run code from master 18:13:49 ok, and so master we would want to do f22 probibly? 18:13:58 so we can have master running, lets say f22, and slaves can be el6/7 or anything 18:14:21 sounds reasonable to me. ;) 18:14:34 later we can decide what do to about el7 master 18:14:41 (epel or scl or something else) 18:15:07 I don't mind fedora as the master as long as we have people willing to upgrade it and keep it working on newer. 18:15:09 mizdebsk: just curious, but can Jenkins also spin up Openstack instances when needed? Just thinking it would be interresting if it did, as we would have as many builders as we need at any moment and none more 18:15:30 so i would like to create two new instances in the new cloud (one for master and one for slave) and try deploying packaged jenkins 18:15:35 puiterwijk: i think I saw a plugin... not sure tho 18:15:52 puiterwijk: jenkins has hundreds of plugins, i'm pretty sure there is some for openstack 18:16:18 I'm fine with this plan. Any objections? 18:16:30 none from me, sounds good to me 18:16:41 i can volunteer to work on this (unless this is urgent and someone else wants to take this) 18:17:04 I don't think its super urgent... and that would be great if you wanted to work on it. ;) 18:17:20 great, i will post more details on the mailing list 18:17:23 it should be pretty easy to setup. We have our persistent cloud playbooks working pretty well now. 18:17:25 sure, much appreciated. I would be glad to help with it 18:17:26 I like it.. ;) 18:17:44 thanks mizdebsk! 18:17:55 #topic - mdomsch - Retire MM 1.4.4 from Fedora and EPEL repos 18:18:01 so we deferred this from last week. 18:18:11 pingou: are you around ? or adrianr ? 18:18:23 any thoughts on this? IMHO we should take over the package and push mm2 to it. 18:18:32 +1 for voluntering (plus learnign openstack) 18:19:11 nirik, what do you mean by push mm2 to it? 18:19:14 pus mm2 to what ? 18:19:24 the repos? 18:19:28 the package in fedora/epel. 18:19:34 smooge: push mm2 code to the mirrormanager package in Fedora/EPEL 18:19:39 except leave the epel6 one along 18:19:49 ah ok 18:19:50 and the fedora 21/22 ones. just push to rawhide and epel7 18:20:04 I don't think pingou is around 18:20:07 I'll just mail them about this 18:20:23 #info nirik to mail involved parties. 18:20:26 #topic - Mailman3 / HyperKitty migration started. 18:20:31 abompard: you're up. ;) 18:20:50 thanks :-) I've written a status report in the Gobby doc 18:21:04 I don't think there's much discussion to have, it's more of an FYI 18:21:05 we can just dump it here if you like... 18:21:21 I'll summarize 18:21:26 ok 18:21:35 A first batch of automated lists were migrated 18:21:44 but I hit a couple bug, one is blocking 18:21:58 there's a missing feature in mailman3: topic subscriptions 18:22:14 it's not widely used but it's heavily used on the package-announce list 18:22:20 so I rolled it back 18:22:44 Further migration depends on fixing two things 18:23:10 this missing feature, and a bug in Postorius that will cause a 500 error if you try to link your address to an existing one 18:23:18 it's more of a missing feature too really 18:23:37 it would be very nice to have tho. ;) 18:23:53 Also, the migration of the first lists was not properly announced, I'll let you know when I've fixed those problems and am ready to move more lists 18:24:01 well, actually I think we have to have it because people will login with user@fedoraproject.org but likely won't have lists under that address. 18:24:13 (or some people won't anyhow) 18:24:16 nirik: my thinking too 18:24:38 At least I've already made it so the postorius login page is the same as HyperKitty's 18:24:44 thanks abompard. :) keep us posted. 18:24:45 so you get the nice Fedora login button 18:24:50 ah good. 18:24:51 sure 18:25:03 anything else on this? 18:25:06 nope 18:25:07 thanks 18:25:15 abompard, if someone needs to set this up fro another project... how hard is it to change login pages and such? 18:25:15 #topic leader for next week's meeting - kevin 18:25:20 oops. 18:25:22 #undo 18:25:22 Removing item from minutes: 18:25:26 sorry 18:25:27 go ahead. ;) 18:25:35 * Corey84 is in and out for next ~10 mins 18:25:42 smooge: it should be a simple change in the config file 18:26:06 smooge: actually Django has a mechanism for that problem but neither Postorius nor HyperKitty were using it properly 18:26:07 ok thanks. I got asked to see about setting it up for a couple of projects so wanted to get an idea of what I ened to do 18:26:20 smooge: also, I need to send those pull requests 18:26:24 and get it accepted 18:26:39 ok so will talk with you about it in a couple of weeks? 18:26:49 I'm trying to keep the fedora-specific bits to a minimum 18:26:57 excellent. 18:26:58 sometimes that means things take a bit longer 18:27:04 yeah understood. thanks for the info abompard 18:27:16 smooge: sure, feel free to hit me up when you need 18:27:27 #topic leader for next week's meeting - kevin 18:27:38 ok, I am out from this saturday to next saturday. 18:27:45 Would someone like to run the meeting next week? :) 18:27:52 I was planning on being Al Haig and run the meeting 18:27:53 I can run it 18:28:02 oh, go ahead smooge 18:28:09 or I can let puiterwijk do so and I can be Dick Cheney 18:28:12 ok, thanks much smooge 18:28:16 or puiterwijk. ;) 18:28:23 you two can duel for it. 18:28:39 lol 18:28:42 heh. We'll fight it out, but we got it covered I think :) 18:28:51 I guess puiterwijk will have to choose the weapons.. I expect it will be fsck.ext2 at 20 paces 18:29:03 smooge++ 18:29:05 :) 18:29:17 ok, as long as one of you does it. great. 18:29:21 #topic Learn about: Nagios 18:29:28 smooge: you wanted to talk to us about nagios today? 18:29:42 Hi everyone. I have some items I wrote up about nagios that I will paste in channel 18:29:55 I will pause after every paragraph and will answer questions at the end. 18:30:05 Our monitoring solution has been Nagios for at least the last 6 18:30:06 years. We have tried a couple of other ones, but found that they were 18:30:06 lacking some of the script-ability and lack of needing a database 18:30:06 backend that Nagios gave us. 18:30:16 We have 2 Nagios servers, one in our central PHX2 location and one 18:30:17 exterior at Ibiblio. The internal one monitors services that can only 18:30:17 be seen inside the network and exterior one tries to see things as a 18:30:17 'consumer' of Fedora would see things. This can lead to 'Why am I 18:30:17 getting alerts?' when everything looks fine from inside of Fedora but 18:30:18 by reading the alert you can see some come from noc01 (internal 18:30:19 http://admin.fedoraproject.org/nagios/ oauth required) or from noc02 18:30:21 (external http://admin.fedoraproject.org/nagios-external) 18:30:27 Our Nagios setup uses the Nagios Remote Plugin Executor (nrpe) for 18:30:29 most system checks on servers that are being monitored. This is done 18:30:31 over SNMP to try and cut down the number of services required on each 18:30:33 service and possible security issues with SNMP. In general a box 18:30:35 registered in nagios sees that it does not have 'excessive' number of 18:30:37 processes, excessive amounts of disk space used, and some other 18:30:39 general items. Particular servers will have more localized checks like 18:30:41 'is httpd running?' 'does the webpage work', 'is our metadata 18:30:45 correct?' etc 18:30:47 Anyone who has worked in Fedora Infrastructure will have noticed that 18:30:49 the number of alerts have gone down astronomically in the last 3 18:30:51 months. This was due to a HUGE effort by Kevin F to change getting 18:30:53 alerts when and who. So now instead of getting an alert anytime a slow 18:30:55 httpd restart happens, we only get emails and pages if it lasts longer 18:30:57 than X minutes. This has reduced pager fatigue quite a lot. 18:31:07 Configuration of nagios is done via ansible in our public 18:31:08 repository. Anyone who wants to see how we are doing things (or not 18:31:08 doing things) can view it from our git repository 18:31:08 https://infrastructure.fedoraproject.org/cgit/ansible.git 18:31:16 .... 18:31:25 it looked so much better in my emacs window 18:31:38 sorry about that.. 18:32:09 to expand on the notification changes a while back, the first alert just goes to irc in #fedora-noc. If the problem persists for 10min the next alerts go to email/pagers/and irc, and do so every hour after that until acked or recovered. 18:32:58 Any questions from people on nagios. Also suggestions on how I can present this better in the future? 18:33:51 * nirik thinks that all looks good from a high level. 18:33:54 hm. I'll hazard a statement: writing nagios checks is a good opportunity for new contributors who are sysadmin-types but want to get into dev or are dev-types that want to get into sysadmin. 18:34:09 Is there a wiki? 18:34:28 turn off column width in $editor, use newlines organically :P 18:35:09 threebean: yeah, agreed. They can be complex, but if you use 'git grep' and look at a specific host you can see the places you would need to add a new one. 18:35:20 jcvicelli: on nagios config? 18:35:26 randomuser: you docs person you 18:35:33 randomuser, well I did that because it pasted everything as one huge line when I tested in another channel. It looked even worse. but I will experiment on fixing it 18:35:36 Yes 18:35:45 just teasing, smooge :) 18:36:08 jcvicelli, https://infrastructure.fedoraproject.org/infra/docs/nagios.rst is a good start 18:36:20 Cool 18:36:42 * smooge has 'learn to write rst' on his afternoon work list 18:37:19 are generic checks (like free mem or disk storage) performed automatically for every machine known to nagios? only host-specific checks need to be added explicitly? 18:37:39 mizdebsk: yeah, there's a 'servers' group with a bunch of 'standard' checks in it... 18:38:07 I've seen some political-type reasons to use a nagios fork instead of Nagios proper, has there been any evaluation of or discussion about them ? 18:38:31 randomuser: we are using the version in epel, no one has landed any of the forks. ;) 18:38:34 if they did we could. 18:38:40 can someone who is neither sysadmin-main nor sysadmin-noc (like me) acknowledge an alert? 18:38:49 fair enough 18:39:26 mizdebsk: there's a list I think. 18:39:26 mizdebsk, I thought it was just sysadmin-noc but that was a while ago 18:39:31 mizdebsk: no, only the ones on the nagios list can do so 18:39:40 the list is in ansible 18:40:06 https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/nagios_server/files/nagios/cgi.cfg 18:40:24 lets say i want to schedule an outage, how can i disable nagios checks for given hosts in advance? 18:40:55 mizdebsk: you would need to get added there, then you can schedule 'downtime' 18:40:56 you can ping one of the people on that list 18:41:16 you can also do it in ansible playbooks, we have some examples where it sets downtime for a host or service before doing something. 18:41:36 ok, thx for answers, i have no more questions 18:41:55 for example the playbooks/groups/notifs-backend.yml 18:41:56 playbook 18:42:19 we also were pretty generous of putting people in sysadmin-noc in the past 18:42:47 yeah, it's kind of the next step up from apprentice for sysadmin stuff. 18:42:50 it was sort of our 'apprentice' group at one point 18:43:05 yeah, that too 18:43:23 so should people with rbac access also be in it? 18:43:53 in which, the nagios cgi list? 18:44:19 be in sysadmin-noc 18:44:30 sorry I will talk after meeting .. 18:44:43 sure, not sure I am following, but also out of coffee. ;) 18:44:43 random brain firing in the middle of the meeting doesn't keep on target 18:45:39 any other nagios questions from anyone for smooge ? 18:45:52 not from me :)? 18:46:27 thanks smooge! 18:46:34 #topic Open Floor 18:46:38 any items for open floor? 18:46:54 * tflink has one thing he forgot about until a few minutes ago 18:47:30 smooge++ 18:47:30 randomuser: Karma for smooge changed to 5: https://badges.fedoraproject.org/tags/cookie/any 18:47:30 thanks! 18:47:34 sure, fire away 18:47:37 as we move our phabricator instance from the old cloud to infra machines, I'm debating making it less qa specific and opening it up to other fedora groups 18:47:54 so instead of using the current qadevel.fp.o hostname, it'd be something like phab.fp.o 18:48:07 sure, we could do that if you like. 18:48:16 just curious if there were any thoughts on if that's a good/bad idea 18:48:33 well, it might mean you have more support burden... if it's popular, etc. 18:48:39 * tflink is still looking into how much work it'd be to make that happen 18:48:47 but I have no idea how much it would be really 18:49:08 * tflink suspects that it wouldn't be a problem unless it got popular to the point where one machine couldn't keep it all 18:49:16 tflink, I need to make time to catch up with you on a portion of that... buildbot packaging and ansible stuff 18:49:30 yeah 18:49:39 Just fyi guys, im looking for easy fixes to work, but if anyone needs a hand, i can help, i have some time free 18:49:59 jcvicelli: someone was just asking for a script to port trac ticket to pagure :) 18:50:08 jcvicelli: cool. :) I keep meaning to file more easyfixes, but never get around to it. perhaps I will try this week 18:50:16 pingou: oh, nice... yeah 18:50:16 like I said, I'm still getting a better idea of how much time I'd have to put into packaging etc. to make that happen but mostly wanted to see if there were objections before I went farther 18:50:35 * nirik doesn't have any objections really. 18:50:35 tflink: sounds cool to me :) 18:50:46 i think it's a really good idea 18:51:07 randomuser: let me know when you have time 18:51:28 cool. any other items? if not will close out in a minute here... 18:51:34 tflink, will do, busy in the short term but I did want to at least register the intent :) 18:52:14 randomuser: no worries 18:53:06 ok, thanks for coming everyone! 18:53:09 #endmeeting