#fedora-meeting log

18:02:41 <smooge> #startmeeting Infrastructure (2017-10-11)
18:02:41 <zodbot> Meeting started Thu Oct 12 18:02:41 2017 UTC.  The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:02:41 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
18:02:41 <zodbot> The meeting name has been set to 'infrastructure_(2017-10-11)'
18:02:41 <smooge> #meetingname infrastructure
18:02:41 <zodbot> The meeting name has been set to 'infrastructure'
18:02:41 <smooge> #topic aloha
18:02:41 <smooge> #chair smooge relrod nirik abadger1999 dgilmore threebean pingou puiterwijk pbrobinson maxamillio
18:02:41 <zodbot> Current chairs: abadger1999 dgilmore maxamillio nirik pbrobinson pingou puiterwijk relrod smooge threebean
18:02:43 <bowlofeggs> .hello2
18:02:44 <zodbot> bowlofeggs: bowlofeggs 'Randy Barlow' <randy@electronsweatshop.com>
18:02:45 <relrod> here
18:02:53 <clime> oh I thought I have missed the meeting already
18:02:59 <clime> Hello!
18:03:49 <benniej> Hi
18:04:02 <smooge> #topic New folks introductions
18:04:02 <smooge> #info This is a place where people who are interested in Fedora Infrastructure can introduce themselves
18:04:13 <smooge> Hello any new people here this week?
18:04:26 * tflink isn't new, just shows up late
18:04:44 <benniej> 3rd meeting for me
18:04:56 <smooge> no problem. hi everyone
18:05:07 <smooge> #topic announcements and information
18:05:07 <smooge> #info PHX2 Colo Trip, Dec 4th - 9th
18:05:07 <smooge> #info Infrastructure will be down during that trip.
18:05:07 <smooge> #info Final freeze .5 weeks. PLEASE TEST.
18:05:08 <smooge> #info Disaester recovery network stood up. Systems need to be designed/built
18:05:09 <smooge> #info Bodhi-2.12.{0..2} deployed this week.
18:05:11 <smooge> #info Bodhi-2.13.0 is now planned to contain the modular mashing patch
18:05:21 <bowlofeggs> 3D meeting for me
18:05:28 <bowlofeggs> i've got this awesome new 3D IRC client
18:05:33 * cverna waves
18:05:37 <smooge> I want to empasize for many different people. The Dec 4-9th time will be a lot of downtime
18:05:50 <tflink> bowlofeggs: does that mean that text flies at you?
18:05:51 <nirik> morning. sorry I am late
18:05:51 <cverna> bowlofeggs++
18:05:51 <zodbot> cverna: Karma for bowlofeggs changed to 14 (for the f26 release cycle):  https://badges.fedoraproject.org/tags/cookie/any
18:05:56 <bowlofeggs> tflink: hahah yep ☺
18:05:57 <smooge> empanada
18:06:01 <tflink> cool
18:06:01 <cverna> bowlofeggs: did you get a 3D cookie
18:06:13 <bowlofeggs> i sure did and it tasted… digital…
18:06:18 <tflink> do the server folks know about this?
18:06:25 <smooge> tflink, yes
18:06:32 <smooge> they may forget.. but they know :)
18:06:32 <tflink> I think that their separate release of modular server just slipped to the 12th of december
18:07:02 <smooge> it was part of the weird dates they can ship on.
18:07:05 <smooge> aka the 19th
18:07:08 <smooge> vs the 12th
18:07:29 <tflink> ok, just figured I would mention it :)
18:08:03 <smooge> no problem
18:08:19 <smooge> any other announcements?
18:08:20 <nirik> yeah.
18:08:34 <nirik> sorry, I mean yeah to what you were saying, not yeah to announcements. ;)
18:08:54 <bowlofeggs> yeah.
18:09:08 <bowlofeggs> sorry, I mean yeah to what nirik was saying about sorry, I mean yeah to what you were saying, not yeah to announcements. ;)
18:09:12 <smooge> #topic Ticket cleanup
18:09:21 <smooge> #info 5970 Let's put OSTrees on mirrorlist and dl.fpo
18:09:32 <smooge> I just picked a random ticket
18:09:51 <bowlofeggs> we had planned to do that for F26, but as you can see… it didn't happen
18:09:53 <smooge> is this doable? done? time to send to the clearer?
18:09:54 <bowlofeggs> we still want to do it though
18:09:55 <puiterwijk> That needs more work, and is on my todo for post-F27
18:10:05 <smooge> ok
18:10:07 <smooge> well then ..
18:10:19 <bowlofeggs> modularity and CI happened and that went straight to the back burner, on the backup stove, in the guest cabin…
18:10:32 <smooge> #topic ResultsDB Performance/Resources
18:10:32 <smooge> #info there has been some talk about computing resources for production resultsdb, let's discuss more concretely
18:10:54 <tflink> I'm starting to see some errors in Taskotron production and there are a lot of timeouts in resultsdb
18:11:14 <tflink> it would seem that bumping the resources on the db host before freeze would be wise
18:11:16 <smooge> what databases does it use in the backend?
18:11:56 <nirik> There are some resources to update it.
18:12:01 <tflink> postgres
18:12:09 <tflink> or did you mean which host?
18:12:10 <nirik> just need a short outage... just a minute.
18:12:15 <smooge> which host
18:12:17 <tflink> db-qa02.qa
18:12:27 <tflink> it's on virthost-comm03.qa I think
18:12:36 <smooge> ah ok. sounds like nirik has this?
18:12:47 <nirik> I could do it, as long as I knew when. :)
18:13:17 <tflink> when does freeze start? on tuesday?
18:13:55 <smooge> tuesday
18:14:13 <smooge> so today, tomorrow or monday
18:14:18 <smooge> can we do it today?
18:14:28 <tflink> shouldn't be a problem
18:14:47 <tflink> it'll be somewhat unannounced but I don't think many folks will notice
18:15:14 <smooge> ok nirik/tflink how does 19:30 UTC after the meeting sound?
18:15:16 <tflink> pingou: do you resultsdb going down for 20 minutes would/could cause problems with the ci stuff
18:15:26 <nirik> sure. anytime is fine with me.
18:15:40 <tflink> ok, we can figure out the details outside of the meeting
18:16:31 <smooge> okie dokie
18:16:42 <smooge> #topic Cleaning up old sysadmins
18:16:42 <smooge> #info various sysadmins are just bouncing slowly.
18:17:02 * bowlofeggs reads that as "old" as in "age"
18:17:23 <smooge> I cleaned up several thousand "bounce" emails yesterday to a couple of people on sysadmin
18:17:59 <puiterwijk> I would say that anyone who hasn't been active in one way or another in the last 2 months and is in sysadmin-* groups should get a reminder, and if they don't reply in a week, we remove them. We can easily re-add people if they get back
18:18:10 <smooge> mainly because I was tired of every email that gets send to the "sysadmin" group from Oct 1st from sending me 2-3 bounce notifications
18:18:25 <puiterwijk> "active" being either ansible commits, ssh access, infra list comments, or anything else
18:18:27 <smooge> I sent email to the people and ... got bounce notifications
18:18:49 <bowlofeggs> i always laugh when i get OOO replies on my ansible commits
18:18:52 <puiterwijk> smooge: if they bounce, I'd say kick right away. We need a way to get in touch with anyone having access to our infra
18:19:13 * bowlofeggs hates OOO replies in general. e-mail is already asynchronous, no need to say you are away!
18:19:24 <puiterwijk> As said, if they fix their email and ping us back, it's easy to re-add them
18:19:25 <smooge> that was my initial reaction also but I wanted to get some feedback
18:20:26 <puiterwijk> We really need the account email for sysadmins to work in case of problems. If it doesn't, they need to fix it, and until then they can't be in access-giving groups, in my opinion.
18:20:27 * tflink doesn't have an issue with that
18:20:51 <smooge> puiterwijk, when you said "they should get a reminder" do you mean that in the "this is implemented" or "this is how it should be" way?
18:21:08 <puiterwijk> smooge: "this is how it should be". But that's just dormant accounts.
18:21:30 <smooge> no problem. I was just going to start looking to see how that was done :)
18:21:51 <puiterwijk> smooge: if they bounce with an error that doesn't sound temporary ("This user got too many emails in the last 60 seconds" vs "this account is disabled"), remove
18:21:59 <smooge> ok anyway I wanted to ask around and get some feedback from the group
18:22:23 <smooge> puiterwijk, well the bounce is "This user got too many emails in the last 60 seconds" for 10 days
18:22:47 <smooge> so its temporary but not temporary
18:22:51 <puiterwijk> smooge: in that case, if it continues for >1day, I'd also say drop
18:23:09 <smooge> okie dokie
18:23:14 <puiterwijk> Like, IK can understand if you break your mailserver for a day. but if you then don't fix it, we have no way to get in touch.
18:23:21 <puiterwijk> As said, when you fix it, it's easy to re-add
18:23:35 <smooge> I had never seen gmail say it for 10 days
18:23:52 <smooge> and even when I sent them email via gmail.. it gave me the error.
18:23:59 <puiterwijk> Yeah, me neither. That sounds like their email is under heavy DoS.
18:24:16 <smooge> so I am going to remove ours from that DDOS
18:24:18 <smooge> ok
18:24:38 <smooge> #topic Apprentice Open office hours
18:24:51 <smooge> Any outstanding ticket or issues apprentices would like help on?
18:26:32 <smooge> ok next up is something Kevin said he wanted to go over
18:26:35 <smooge> #topic Learn about: OpenShift (by Kevin)
18:27:55 <nirik> oh yeah, I just wanted to give a really high level overview on our openshift setup...
18:28:27 <nirik> so we have currently 2 clusters. one staging and one production. They are both setup the same way... staging is version 3.6 however and prod is still 3.5
18:28:51 <nirik> they each use 6 vm's...
18:29:31 <nirik> 2 are called 'nodes' ie, os-node01 os-node02. These actually run pods of containers for apps.
18:30:07 <nirik> 3 are 'masters' os-master01, 02, 03... etc. These run the config setup and infrastructure.
18:30:51 <nirik> 1 is a 'control' host... ie, os-control01. These are actually not running openshift, instead they run the openshift-ansible playbook that configures and installs and manages the others.
18:30:52 <cverna> are we using openshift ansible to configure the cluster ?
18:31:00 <nirik> :)
18:31:04 <cverna> :)
18:31:04 <puiterwijk> cverna: yes
18:31:28 <netcronin> Enrollment barcode or barcodes
18:31:33 <nirik> the reason we have control hosts is because it allows them to just pull/use openshift ansible without having to mix or import it in our batcave01 setup
18:32:08 <nirik> so batcave goes to control host which in turn runs ansible again. Ansible inside ansible. ;)
18:32:27 <puiterwijk> ansible-ansible-openshift-ansible is the name of the role if I'm remembering correctly
18:32:36 <tflink> ha, nice
18:32:49 <nirik> For applications we have been using ansible roles that have templates and load them into openshift.
18:32:55 <cverna> haha wonder how we could add an extra ansible in there
18:33:06 <nirik> and then openshift tries to make reality match the database
18:33:18 <nirik> ansible-ansible-openshift-ansible-cowbell.
18:33:43 <nirik> anyhow, so far this seems to be working out ok.
18:33:48 <smooge> so we make a tower container inside of openshift
18:34:07 <smooge> which then runs the batcave ansible
18:34:10 <cverna> Do we have atomic hosts ? or are we planing to use some ?
18:34:29 <nirik> we are not currently.
18:34:31 <puiterwijk> cverna: no, we tried, but full atomic setups are not supported by 3.5 because of the ha layer
18:35:00 <puiterwijk> I think they said it was supposed to be supported in 3.6, but I haven't heard anything about that
18:35:00 <nirik> that and it would take some rework to make our base provisioning ansible work with them
18:35:58 <nirik> Oh, and we are using RHOSP.. we tried with origin, but there were issues and we wanted a pretty stable base to try with from the start.
18:36:10 <puiterwijk> nirik: wrong abreviation.
18:36:21 <puiterwijk> RHOSP = Red Hat Open stack Platform
18:36:27 <nirik> I can never keep them straight anymore
18:36:27 <puiterwijk> We use Openshift Container Platform
18:36:33 <puiterwijk> RHOCP
18:36:35 <bowlofeggs> haha
18:37:00 * puiterwijk adds 1 to the counter of "openshift vs openstack misnamings"
18:37:07 <nirik> lets just call them all bruce. ;)
18:37:23 <cverna> cool, some more acronyms to learn about :)
18:37:31 <netcronin> Are we running Ansible Tower at any other point currently?
18:37:34 * puiterwijk thinks we need an app on os.fp.o for the os vs os counter
18:37:36 <puiterwijk> netcronin: no
18:37:51 <puiterwijk> netcronin: and smooge's statement was not true (I hope?), but a joke
18:38:10 <netcronin> Right, just thinking it would be neat to see it run batcave.
18:38:22 <puiterwijk> netcronin: yes. But let's discuss that in open floro.
18:38:27 <netcronin> okay sorry
18:38:27 <puiterwijk> Kevin was explaining stuff
18:38:27 <smooge> I was trying to answer cverna's question of how to put ansible one more place
18:38:39 <nirik> well, that was mostly what I wanted to cover.
18:38:44 <smooge> nirik, anything else after we derailed you?
18:38:52 <nirik> just how it was basically setup. We do have a few apps already in it
18:39:01 <nirik> we need to figure out monitoring
18:39:06 <smooge> yeah...
18:39:20 <nirik> I think we should have nagios monitor the endpoints...
18:40:48 <smooge> that sounds appropriate
18:40:54 <smooge> I guess that is on me :)
18:41:08 <smooge> will see how it is done elsewhere and reimplement
18:41:10 <nirik> like the console and each apps outside dns name end
18:42:47 <smooge> https://pagure.io/fedora-infrastructure/issue/6442
18:43:09 <smooge> anything else?
18:43:23 <smooge> #topic Open Floor
18:43:33 <smooge> Anything for the open floor ?
18:43:52 <netcronin> Do you guys have any easyfix or otherwise tickets you need help with?
18:43:53 <pingou> I brought cake :)
18:44:22 <pingou> 🎂
18:44:23 <puiterwijk> netcronin: just to respond to your questions: no, we won't do ansible tower. Yes, we want to deploy AWX (the open source version) at some point, but that's down the line and on a batcontainer or something like that.
18:44:33 <puiterwijk> pingou++ I can really use some cake at the moment
18:44:49 <pingou> (does that looked like a cake?)
18:44:57 <netcronin> puiterwijk: Makes senese, thanks.
18:44:59 <puiterwijk> pingou: yes
18:45:08 <tflink> batcontainer?
18:45:10 <pingou> cool, it didn't render here :)
18:45:11 <netcronin> puiterwijk: We could probably run it on one of the Docker hosts.
18:45:18 <puiterwijk> tflink: batcave running containers.
18:45:19 <pingou> tflink: sounds like a batmobile :)
18:45:21 <puiterwijk> netcronin: no.
18:45:23 <tflink> makes me think of something like troll-repellant bat spray
18:45:28 <puiterwijk> netcronin: we want it on an entirely separate server
18:45:42 <tflink> makes the trolls fall off and explode, just like sharks :)
18:45:54 <puiterwijk> tflink: :)
18:45:58 <netcronin> puiterwijk: Okay. Is that because you'd be running playbooks that would be intereacting with the docker hosts or some other reason?
18:46:03 * puiterwijk realizes the name might've been unfortunate. The name's just a random idea
18:46:12 <pingou> batmobyle
18:46:20 <nirik> batcomputer
18:46:22 <puiterwijk> netcronin: because the box will have full access to the entire infrastructure. Thus it needs extra protections
18:46:33 <pingou> nirik: robin? :)
18:46:40 <pingou> nirik: or just alfred :D
18:46:45 <netcronin> puiterwijk: Ah, thanks.
18:46:46 <puiterwijk> netcronin: and while containers are made for containerization, they don't always fully contain yet.
18:46:48 <nirik> I don't trust robin with that access. ;)
18:46:57 <pingou> I think Alfred may be best
18:47:00 <netcronin> Sure.
18:47:07 <smooge> I think Joker
18:47:14 * puiterwijk doesn't care about the name.
18:47:16 <nirik> I am eager to get AWX going tho...
18:47:27 <smooge> but stops as it has reached the end
18:47:36 <nirik> I guess we could do a blank vm, install it there and see how it goes. It would have to pass security audit. ;)
18:47:50 <pingou> how would we install it?
18:48:03 <misc> I did a quick verification of AWX, and found nothing :/
18:48:03 <netcronin> That's my question too, I thought they only support installs to containers.
18:48:17 <nirik> right, we would have to install it in containers.
18:48:20 <cverna> would the aws container build by the LIBS ?
18:48:23 <smooge> I expect the server would have docker like the proxies
18:48:27 <cverna> AWX :)
18:49:00 <nirik> misc: how about the various stuff the container has? Always worried how up to date things like that are...
18:49:40 <pingou> no it's fine nirik they all had shell-shock :)
18:50:05 <smooge> ok do we have more here?
18:50:23 <smooge> if not I can end this and you can all go back to punnery and such in #fedora-admin
18:50:45 <smooge> #endmeeting