18:00:00 #startmeeting Infrastructure (2014-10-02) 18:00:00 Meeting started Thu Oct 2 18:00:00 2014 UTC. The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:00 Useful Commands: #action #agreed #halp #info #idea #link #topic. 18:00:01 #meetingname infrastructure 18:00:01 #topic aloha 18:00:01 #chair smooge relrod nirik abadger1999 lmacken dgilmore mdomsch threebean pingou puiterwijk 18:00:01 The meeting name has been set to 'infrastructure' 18:00:01 Current chairs: abadger1999 dgilmore lmacken mdomsch nirik pingou puiterwijk relrod smooge threebean 18:00:14 * threebean is here 18:00:14 * pingou 18:00:18 * tflink is here 18:00:19 * relrod here 18:00:26 * lmacken 18:00:28 * danielbruno is here 18:00:46 * oddshocks here 18:01:22 * RogerBTX present @ laundromat 18:01:33 * mpduty is here 18:01:58 neldogz is here 18:02:01 #topic New folks introductions and Apprentice tasks. 18:02:13 any new folks like to introduce themselves today? 18:02:20 or apprentices with questions, comments or ideas? 18:03:24 ok. ;) moving along then... 18:03:38 #topic Applications status / discussion 18:03:45 any application news this week? :) 18:03:52 new pkgdb in prod today 18:04:04 nice. any list of changes? 18:04:11 mgoodmday 18:04:12 https://github.com/fedora-infra/pkgdb2/blob/master/utility/pkgdb2.spec#L114 18:04:23 * nirik wonders if we shouldn't try and mail at least infra list on updates? just a short 'pkgdb2 updated, here's changes' 18:04:41 I added the update_package_info script that was written a while ago 18:05:01 whose goal is to update the package's info (summary and description) from yum's info (thus the rpm) 18:05:20 also, now the user orphaning a package will loose his/her commit and approveacl ACL 18:05:34 ok. I guess if they need it back they can unorphan. ;) 18:05:40 yes :) 18:05:48 the rest is mostly cosmetic changes in the UI 18:06:07 there will be another release tomorrow to fix something in the emails pkgdb send when it has a problem 18:06:16 and I have a new fedocal release coming after that 18:06:23 busy busy. ;) 18:06:32 (w/ for example the possibility to send a reminder to multiple email addresses) 18:06:40 taskotron production is making progress 18:06:52 and anitya is ready in prod, just waiting for the DNS to sync in: http://140.211.169.229/ 18:07:06 but I'll call for tester before calling it 1.0 :) 18:07:13 waiting on two bits of code, learning the hard way about bits that need to be monitored after restart etc. 18:07:18 #info pkgdb2 update in prod today, another minor one tomorrow. 18:07:45 * dgilmore is kinda here but really not 18:07:52 #info anitya almost ready for production. needs dns to finish getting setup. 18:08:03 #info taskotron almost ready for production. 18:08:28 tflink: aside from those issues everything looking ok from a infra standpoint for you? I know we need monitoring still. 18:08:54 looking into some vpn issues ATM 18:09:04 * pingou needs to add a nagios check for anitya 18:09:10 monitoring, backup (ticket hasn't been filed yet) 18:09:28 ah yeah I am already backing up the db... 18:09:34 but there's possibly more. 18:09:51 there are some files on taskotron01.qa that will need to be backed up 18:10:00 ok. 18:10:07 * lanica is here for the infra meeting (sorry I'm late!) 18:10:14 welcome lanica 18:10:57 ok, anything else pending on the applications side? 18:11:29 eh, I've been doing lots of monitoring stuff this week and will be pushing out some other changes for anitya later. 18:11:41 cool. 18:12:18 general improvements to fedimg and ostree/atomic stuff. nothing outstanding. 18:12:23 making progress 18:12:37 cool 18:12:46 abompard: whats the current word on hyperkitty (if you happen to be around). I'd be fine moving some lists if there's nothing we are waiting on. 18:12:52 oddshocks: cool. 18:13:19 nirik: actually there's currently lots going on around Mailman's SQLAlchemy port 18:13:26 it may be happenning soon 18:13:42 I got bodhi2 test suite back up and running on jenkins this week. Lots of masher hacking as well, but that hasn't hit infra yet. http://jenkins.cloud.fedoraproject.org/job/bodhi/ 18:13:54 so I'm focusing on that, the windows of availability for barry to review code are small 18:13:56 abompard: nice. :) do we want to wait for that? 18:14:15 nirik: yes, if it happens as soon as I think it can 18:14:25 lmacken: nice. 18:14:25 abompard: as in warsaw? 18:14:30 oddshocks: yep 18:14:33 abompard: niceee 18:14:48 lmacken: no weird disappearances on jenkins lately? did we ever figure out what was happening with that? 18:15:03 abompard: champagne \ó/ 18:15:14 pingou: hell yeah 18:15:33 cool. 18:15:38 nirik: as far as plugins disappearing, I think the ansible playbook overwrites them all, and we don't have the fedmsg plugin in git 18:15:47 #info hyperkitty is working on sqlalchemy changes. Will deploy after those land. 18:15:54 lmacken: ah, that would do it. 18:15:56 as for projects disappearing, I have no idea how that happened to begin with, but it has yet to happen again 18:16:19 #info bodhi2 is in jenkins now and running tests on commits 18:16:21 jenkins is just trying to get out of work. slacker. 18:16:22 yeah the plugins need to be installed via the ansible playbook 18:16:52 not via jenkins' UI (or done in both place, the playbook in addition to the UI) 18:17:30 ok. I think relrod setup the fedmsg plugin there? relrod and pingou: can you see if you can get that in ansible (if it's not) 18:17:56 yeah, we need to ansibleize it 18:18:03 cool 18:18:09 relrod: there is also a problem with the yum repo that was added 18:18:11 (the copr one) 18:18:33 pingou: yeah, I know. Jenkins can't talk to copr for some reason. Short term solution is put the RPM from that copr in the infra repo 18:18:36 yeah, we talked about that... 18:18:37 I just need to actually do it 18:18:44 * nirik nods. 18:18:49 relrod: ok cool 18:19:16 ok, anything else on the applications side? 18:19:43 pingou: oh, side question: does anitya file bugzilla bugs on things? or was that a different layer of our old cnucnu thing? 18:19:44 one fedora mobile related thing I guess: 18:20:01 nirik: it'll be a different layer 18:20:23 pingou: ok, was that something tyll was running? does he know to update? or is that something we want to run? 18:20:30 I am working on a system for pushing out nightly updates to it via the Google Play alpha track. This means we can lose the ugly self-updating code that is in Mobile right now. 18:20:49 relrod: cool. 18:20:51 We were going to do this via Jenkins (https://github.com/fedora-infra/mobile/issues/41) but ran into some issues 18:21:02 nirik: we'll probably want a fedmsg-cnucnu as we have fedmsg-fasclient 18:21:09 so it's probably going to be a separate process that just pulls the latest APK from Jenkins and sends it to Google 18:21:12 which is fine 18:21:20 nirik: I'll coordinate w/ tyll 18:21:49 relrod: ok 18:21:53 pingou: sounds good 18:22:02 (the issue is basically "the APK has to be signed, and we don't really want the signing key for it on Jenkins") 18:22:52 relrod: hum, ok... where would that live then/ 18:22:53 ? 18:23:36 or TBD? 18:23:40 nirik: Probably ansible private repo, and the nightly pusher can be a cloud node or something? Doesn't need to be very powerful 18:23:47 ok 18:24:23 If we don't want it in the ansible private repo, open to better suggestions 18:24:40 #action pingou to coordinate with tyll on cnucnu bugzilla filing under the new anitya setup. 18:24:47 I think that would be ok 18:25:05 #info fedora mobile getting setup to be updated from the google play alpha track. 18:25:11 I'm not sure if the alpha signing key can be different than the production track signing key (if anyone has Android experience and knows, poke me? :)) 18:25:27 no idea off hand. 18:25:39 * nirik waits for the ffos html 5 version. ;) 18:25:48 :P 18:25:56 anyway, that's all I have. 18:26:00 cool. thanks. 18:26:12 anything else application wise? or shall I move on? 18:26:27 #topic Sysadmin status / discussion 18:26:35 so, we did a mass reboot earlier this week. 18:26:47 there's a few stragglers we still need to do, but mostly everything is done. 18:26:59 #info mass reboot earlier this week. Most things are rebooted/updated 18:27:18 I migrated db04 (rhel6) to db-koji01 (rhel7) 18:27:47 I killed db04 and keys01. We now have 0 guests at telia01 18:28:04 <> 18:28:10 I need to check what specific settings we are putting on the postgresql.conf file we install with the postgresql_server role, as this seems to be closely related to the memory of the host 18:28:20 I'm also going to retire mirrorlist-serverbeach, because it keeps having problems keeping up. 18:28:26 by telia01 18:28:31 bye* 18:28:45 pingou: yeah, I think we should be more dynamic on it. 18:29:03 pingou: also, I think we just copied our old postgresql.conf in... we should get the rhel7 one and adjust it. 18:29:09 make it a template and use host_vars or so 18:29:12 because we might be missing other things the 7 one can do better 18:29:18 also true 18:29:27 if you want to work on that that would be great. ;) 18:29:33 anitya-backend should have the default el7 one 18:29:41 I can at least provide a diff :) 18:29:59 ok 18:30:15 #info kernel01/02 now have a bunch more memory. Thanks smooge for getting that installed. 18:30:33 #info db04 (rhel6) migrated to db-koji01 (rhel7) 18:30:39 well thanks jesse. all I did was watch from afar :) 18:30:43 #info 0 guests left at telia01 18:30:58 nirik: how many guests left in puppet? 18:31:35 78 18:31:45 k 18:31:48 I'm going to try and move some more next week... 18:32:06 a number are virthosts, so we will have to move all the guests, update, move back 18:32:23 that'll be fun 18:32:34 and of course we still have proxies to convert. thats 8 there 18:33:03 but I am going to try and get there before the end of the year if we can. 18:33:28 #info 78 hosts left in puppet 18:33:44 #topic nagios/alerts recap 18:33:56 * nirik digs up url. where's puiterwijk with it handy when you need him. ;) 18:35:21 .tiny https://admin.fedoraproject.org/nagios/cgi-bin//summary.cgi?report=1&displaytype=3&timeperiod=last7days&smon=10&sday=1&syear=2014&shour=0&smin=0&ssec=0&emon=10&eday=2&eyear=2014&ehour=24&emin=0&esec=0&hostgroup=all&servicegroup=all&host=all&alerttypes=3&statetypes=3&hoststates=7&servicestates=120&limit=25 18:35:23 nirik: http://tinyurl.com/pcpaya7 18:35:39 download01.mgmt was fixed, was a IMM going wonky 18:36:08 the datagrepper I think was the datagrepper db migration that threebean did 18:36:27 the collab03 mail queue is due to the way that check is written. We should fix it. 18:36:47 it looks for anything mroe than 3 or 4 emails in queue, but thats our list server, so sometimes it has a bunch in there it's sending out. 18:36:57 I rebuilt bvirthost10 on the UCS cisco. much swearing and pain 18:37:02 oops sorry 18:37:05 thanks for that smooge 18:37:26 mirrorlist-serverbeach is going away, so that should disappear. 18:38:01 <> 18:38:15 not sure about the bodhi02 ones... lmacken: any errors you have seen from it lately? 18:38:26 might have been the koji outage for the database move? 18:39:29 anything else sysadminy? 18:40:02 #topic Upcoming Tasks/Items 18:40:03 https://apps.fedoraproject.org/calendar/list/infrastructure/ 18:40:11 anything upcoming anyone would like to note or schedule? 18:40:33 I'll note we go into Beta freeze 2014-10-14 18:40:45 so do try and get anything done before then thats big/disruptive. 18:41:05 2 weeks from now 18:41:08 yep. 18:41:09 that'll be there soon 18:41:23 I'm going to try and get some more rhel7/ansible migrations done, but we will see 18:41:42 might do an outage and move db01 over. 18:41:47 will have to look. 18:41:58 #topic Open Floor 18:42:15 anyone have items for open floor? ideas, comments, cookie recipes? 18:42:17 I have two cloud servers to rebuild today. 18:42:31 and I will be out tomorrow after the EPEL meeting 18:42:55 smooge: cool. Let me know when 09 is done and I can play with running the setup on it. 18:43:31 will do so 18:43:47 next week I will be focusing on I2 and budgets 18:43:48 If we can get that all working soon we can look at cloud migrating. ;) 18:43:56 or cloud 18:44:09 nirik: no I haven't seen any errors from bodhi02 aside from the koji outage. 18:44:31 I think thats mostly just a matter of telling everyone: you have until xxxx-xx-xx to terminate your instance(s) and bring them up in the new cloud. 18:44:43 and us doing that on the persistent ones we care about. 18:44:54 oh, we still need to get storage working tho. 18:44:58 lmacken: ok 18:45:09 lmacken: just odd that bodhi02 was in the nagios alerts, but not 01... 18:45:32 nirik: weird, I'll take a look 18:46:11 thanks. 18:46:18 ok, if nothing else, will close out in a minute or so... 18:47:41 Thanks for coming everyone! Lets all continue in #fedora-admin, #fedora-apps and #fedora-noc. 18:47:43 #endmeeting