18:02:41 <smooge> #startmeeting Infrastructure (2017-10-11) 18:02:41 <zodbot> Meeting started Thu Oct 12 18:02:41 2017 UTC. The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:02:41 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic. 18:02:41 <zodbot> The meeting name has been set to 'infrastructure_(2017-10-11)' 18:02:41 <smooge> #meetingname infrastructure 18:02:41 <zodbot> The meeting name has been set to 'infrastructure' 18:02:41 <smooge> #topic aloha 18:02:41 <smooge> #chair smooge relrod nirik abadger1999 dgilmore threebean pingou puiterwijk pbrobinson maxamillio 18:02:41 <zodbot> Current chairs: abadger1999 dgilmore maxamillio nirik pbrobinson pingou puiterwijk relrod smooge threebean 18:02:43 <bowlofeggs> .hello2 18:02:44 <zodbot> bowlofeggs: bowlofeggs 'Randy Barlow' <randy@electronsweatshop.com> 18:02:45 <relrod> here 18:02:53 <clime> oh I thought I have missed the meeting already 18:02:59 <clime> Hello! 18:03:49 <benniej> Hi 18:04:02 <smooge> #topic New folks introductions 18:04:02 <smooge> #info This is a place where people who are interested in Fedora Infrastructure can introduce themselves 18:04:13 <smooge> Hello any new people here this week? 18:04:26 * tflink isn't new, just shows up late 18:04:44 <benniej> 3rd meeting for me 18:04:56 <smooge> no problem. hi everyone 18:05:07 <smooge> #topic announcements and information 18:05:07 <smooge> #info PHX2 Colo Trip, Dec 4th - 9th 18:05:07 <smooge> #info Infrastructure will be down during that trip. 18:05:07 <smooge> #info Final freeze .5 weeks. PLEASE TEST. 18:05:08 <smooge> #info Disaester recovery network stood up. Systems need to be designed/built 18:05:09 <smooge> #info Bodhi-2.12.{0..2} deployed this week. 18:05:11 <smooge> #info Bodhi-2.13.0 is now planned to contain the modular mashing patch 18:05:21 <bowlofeggs> 3D meeting for me 18:05:28 <bowlofeggs> i've got this awesome new 3D IRC client 18:05:33 * cverna waves 18:05:37 <smooge> I want to empasize for many different people. The Dec 4-9th time will be a lot of downtime 18:05:50 <tflink> bowlofeggs: does that mean that text flies at you? 18:05:51 <nirik> morning. sorry I am late 18:05:51 <cverna> bowlofeggs++ 18:05:51 <zodbot> cverna: Karma for bowlofeggs changed to 14 (for the f26 release cycle): https://badges.fedoraproject.org/tags/cookie/any 18:05:56 <bowlofeggs> tflink: hahah yep ☺ 18:05:57 <smooge> empanada 18:06:01 <tflink> cool 18:06:01 <cverna> bowlofeggs: did you get a 3D cookie 18:06:13 <bowlofeggs> i sure did and it tasted… digital… 18:06:18 <tflink> do the server folks know about this? 18:06:25 <smooge> tflink, yes 18:06:32 <smooge> they may forget.. but they know :) 18:06:32 <tflink> I think that their separate release of modular server just slipped to the 12th of december 18:07:02 <smooge> it was part of the weird dates they can ship on. 18:07:05 <smooge> aka the 19th 18:07:08 <smooge> vs the 12th 18:07:29 <tflink> ok, just figured I would mention it :) 18:08:03 <smooge> no problem 18:08:19 <smooge> any other announcements? 18:08:20 <nirik> yeah. 18:08:34 <nirik> sorry, I mean yeah to what you were saying, not yeah to announcements. ;) 18:08:54 <bowlofeggs> yeah. 18:09:08 <bowlofeggs> sorry, I mean yeah to what nirik was saying about sorry, I mean yeah to what you were saying, not yeah to announcements. ;) 18:09:12 <smooge> #topic Ticket cleanup 18:09:21 <smooge> #info 5970 Let's put OSTrees on mirrorlist and dl.fpo 18:09:32 <smooge> I just picked a random ticket 18:09:51 <bowlofeggs> we had planned to do that for F26, but as you can see… it didn't happen 18:09:53 <smooge> is this doable? done? time to send to the clearer? 18:09:54 <bowlofeggs> we still want to do it though 18:09:55 <puiterwijk> That needs more work, and is on my todo for post-F27 18:10:05 <smooge> ok 18:10:07 <smooge> well then .. 18:10:19 <bowlofeggs> modularity and CI happened and that went straight to the back burner, on the backup stove, in the guest cabin… 18:10:32 <smooge> #topic ResultsDB Performance/Resources 18:10:32 <smooge> #info there has been some talk about computing resources for production resultsdb, let's discuss more concretely 18:10:54 <tflink> I'm starting to see some errors in Taskotron production and there are a lot of timeouts in resultsdb 18:11:14 <tflink> it would seem that bumping the resources on the db host before freeze would be wise 18:11:16 <smooge> what databases does it use in the backend? 18:11:56 <nirik> There are some resources to update it. 18:12:01 <tflink> postgres 18:12:09 <tflink> or did you mean which host? 18:12:10 <nirik> just need a short outage... just a minute. 18:12:15 <smooge> which host 18:12:17 <tflink> db-qa02.qa 18:12:27 <tflink> it's on virthost-comm03.qa I think 18:12:36 <smooge> ah ok. sounds like nirik has this? 18:12:47 <nirik> I could do it, as long as I knew when. :) 18:13:17 <tflink> when does freeze start? on tuesday? 18:13:55 <smooge> tuesday 18:14:13 <smooge> so today, tomorrow or monday 18:14:18 <smooge> can we do it today? 18:14:28 <tflink> shouldn't be a problem 18:14:47 <tflink> it'll be somewhat unannounced but I don't think many folks will notice 18:15:14 <smooge> ok nirik/tflink how does 19:30 UTC after the meeting sound? 18:15:16 <tflink> pingou: do you resultsdb going down for 20 minutes would/could cause problems with the ci stuff 18:15:26 <nirik> sure. anytime is fine with me. 18:15:40 <tflink> ok, we can figure out the details outside of the meeting 18:16:31 <smooge> okie dokie 18:16:42 <smooge> #topic Cleaning up old sysadmins 18:16:42 <smooge> #info various sysadmins are just bouncing slowly. 18:17:02 * bowlofeggs reads that as "old" as in "age" 18:17:23 <smooge> I cleaned up several thousand "bounce" emails yesterday to a couple of people on sysadmin 18:17:59 <puiterwijk> I would say that anyone who hasn't been active in one way or another in the last 2 months and is in sysadmin-* groups should get a reminder, and if they don't reply in a week, we remove them. We can easily re-add people if they get back 18:18:10 <smooge> mainly because I was tired of every email that gets send to the "sysadmin" group from Oct 1st from sending me 2-3 bounce notifications 18:18:25 <puiterwijk> "active" being either ansible commits, ssh access, infra list comments, or anything else 18:18:27 <smooge> I sent email to the people and ... got bounce notifications 18:18:49 <bowlofeggs> i always laugh when i get OOO replies on my ansible commits 18:18:52 <puiterwijk> smooge: if they bounce, I'd say kick right away. We need a way to get in touch with anyone having access to our infra 18:19:13 * bowlofeggs hates OOO replies in general. e-mail is already asynchronous, no need to say you are away! 18:19:24 <puiterwijk> As said, if they fix their email and ping us back, it's easy to re-add them 18:19:25 <smooge> that was my initial reaction also but I wanted to get some feedback 18:20:26 <puiterwijk> We really need the account email for sysadmins to work in case of problems. If it doesn't, they need to fix it, and until then they can't be in access-giving groups, in my opinion. 18:20:27 * tflink doesn't have an issue with that 18:20:51 <smooge> puiterwijk, when you said "they should get a reminder" do you mean that in the "this is implemented" or "this is how it should be" way? 18:21:08 <puiterwijk> smooge: "this is how it should be". But that's just dormant accounts. 18:21:30 <smooge> no problem. I was just going to start looking to see how that was done :) 18:21:51 <puiterwijk> smooge: if they bounce with an error that doesn't sound temporary ("This user got too many emails in the last 60 seconds" vs "this account is disabled"), remove 18:21:59 <smooge> ok anyway I wanted to ask around and get some feedback from the group 18:22:23 <smooge> puiterwijk, well the bounce is "This user got too many emails in the last 60 seconds" for 10 days 18:22:47 <smooge> so its temporary but not temporary 18:22:51 <puiterwijk> smooge: in that case, if it continues for >1day, I'd also say drop 18:23:09 <smooge> okie dokie 18:23:14 <puiterwijk> Like, IK can understand if you break your mailserver for a day. but if you then don't fix it, we have no way to get in touch. 18:23:21 <puiterwijk> As said, when you fix it, it's easy to re-add 18:23:35 <smooge> I had never seen gmail say it for 10 days 18:23:52 <smooge> and even when I sent them email via gmail.. it gave me the error. 18:23:59 <puiterwijk> Yeah, me neither. That sounds like their email is under heavy DoS. 18:24:16 <smooge> so I am going to remove ours from that DDOS 18:24:18 <smooge> ok 18:24:38 <smooge> #topic Apprentice Open office hours 18:24:51 <smooge> Any outstanding ticket or issues apprentices would like help on? 18:26:32 <smooge> ok next up is something Kevin said he wanted to go over 18:26:35 <smooge> #topic Learn about: OpenShift (by Kevin) 18:27:55 <nirik> oh yeah, I just wanted to give a really high level overview on our openshift setup... 18:28:27 <nirik> so we have currently 2 clusters. one staging and one production. They are both setup the same way... staging is version 3.6 however and prod is still 3.5 18:28:51 <nirik> they each use 6 vm's... 18:29:31 <nirik> 2 are called 'nodes' ie, os-node01 os-node02. These actually run pods of containers for apps. 18:30:07 <nirik> 3 are 'masters' os-master01, 02, 03... etc. These run the config setup and infrastructure. 18:30:51 <nirik> 1 is a 'control' host... ie, os-control01. These are actually not running openshift, instead they run the openshift-ansible playbook that configures and installs and manages the others. 18:30:52 <cverna> are we using openshift ansible to configure the cluster ? 18:31:00 <nirik> :) 18:31:04 <cverna> :) 18:31:04 <puiterwijk> cverna: yes 18:31:28 <netcronin> Enrollment barcode or barcodes 18:31:33 <nirik> the reason we have control hosts is because it allows them to just pull/use openshift ansible without having to mix or import it in our batcave01 setup 18:32:08 <nirik> so batcave goes to control host which in turn runs ansible again. Ansible inside ansible. ;) 18:32:27 <puiterwijk> ansible-ansible-openshift-ansible is the name of the role if I'm remembering correctly 18:32:36 <tflink> ha, nice 18:32:49 <nirik> For applications we have been using ansible roles that have templates and load them into openshift. 18:32:55 <cverna> haha wonder how we could add an extra ansible in there 18:33:06 <nirik> and then openshift tries to make reality match the database 18:33:18 <nirik> ansible-ansible-openshift-ansible-cowbell. 18:33:43 <nirik> anyhow, so far this seems to be working out ok. 18:33:48 <smooge> so we make a tower container inside of openshift 18:34:07 <smooge> which then runs the batcave ansible 18:34:10 <cverna> Do we have atomic hosts ? or are we planing to use some ? 18:34:29 <nirik> we are not currently. 18:34:31 <puiterwijk> cverna: no, we tried, but full atomic setups are not supported by 3.5 because of the ha layer 18:35:00 <puiterwijk> I think they said it was supposed to be supported in 3.6, but I haven't heard anything about that 18:35:00 <nirik> that and it would take some rework to make our base provisioning ansible work with them 18:35:58 <nirik> Oh, and we are using RHOSP.. we tried with origin, but there were issues and we wanted a pretty stable base to try with from the start. 18:36:10 <puiterwijk> nirik: wrong abreviation. 18:36:21 <puiterwijk> RHOSP = Red Hat Open stack Platform 18:36:27 <nirik> I can never keep them straight anymore 18:36:27 <puiterwijk> We use Openshift Container Platform 18:36:33 <puiterwijk> RHOCP 18:36:35 <bowlofeggs> haha 18:37:00 * puiterwijk adds 1 to the counter of "openshift vs openstack misnamings" 18:37:07 <nirik> lets just call them all bruce. ;) 18:37:23 <cverna> cool, some more acronyms to learn about :) 18:37:31 <netcronin> Are we running Ansible Tower at any other point currently? 18:37:34 * puiterwijk thinks we need an app on os.fp.o for the os vs os counter 18:37:36 <puiterwijk> netcronin: no 18:37:51 <puiterwijk> netcronin: and smooge's statement was not true (I hope?), but a joke 18:38:10 <netcronin> Right, just thinking it would be neat to see it run batcave. 18:38:22 <puiterwijk> netcronin: yes. But let's discuss that in open floro. 18:38:27 <netcronin> okay sorry 18:38:27 <puiterwijk> Kevin was explaining stuff 18:38:27 <smooge> I was trying to answer cverna's question of how to put ansible one more place 18:38:39 <nirik> well, that was mostly what I wanted to cover. 18:38:44 <smooge> nirik, anything else after we derailed you? 18:38:52 <nirik> just how it was basically setup. We do have a few apps already in it 18:39:01 <nirik> we need to figure out monitoring 18:39:06 <smooge> yeah... 18:39:20 <nirik> I think we should have nagios monitor the endpoints... 18:40:48 <smooge> that sounds appropriate 18:40:54 <smooge> I guess that is on me :) 18:41:08 <smooge> will see how it is done elsewhere and reimplement 18:41:10 <nirik> like the console and each apps outside dns name end 18:42:47 <smooge> https://pagure.io/fedora-infrastructure/issue/6442 18:43:09 <smooge> anything else? 18:43:23 <smooge> #topic Open Floor 18:43:33 <smooge> Anything for the open floor ? 18:43:52 <netcronin> Do you guys have any easyfix or otherwise tickets you need help with? 18:43:53 <pingou> I brought cake :) 18:44:22 <pingou> 🎂 18:44:23 <puiterwijk> netcronin: just to respond to your questions: no, we won't do ansible tower. Yes, we want to deploy AWX (the open source version) at some point, but that's down the line and on a batcontainer or something like that. 18:44:33 <puiterwijk> pingou++ I can really use some cake at the moment 18:44:49 <pingou> (does that looked like a cake?) 18:44:57 <netcronin> puiterwijk: Makes senese, thanks. 18:44:59 <puiterwijk> pingou: yes 18:45:08 <tflink> batcontainer? 18:45:10 <pingou> cool, it didn't render here :) 18:45:11 <netcronin> puiterwijk: We could probably run it on one of the Docker hosts. 18:45:18 <puiterwijk> tflink: batcave running containers. 18:45:19 <pingou> tflink: sounds like a batmobile :) 18:45:21 <puiterwijk> netcronin: no. 18:45:23 <tflink> makes me think of something like troll-repellant bat spray 18:45:28 <puiterwijk> netcronin: we want it on an entirely separate server 18:45:42 <tflink> makes the trolls fall off and explode, just like sharks :) 18:45:54 <puiterwijk> tflink: :) 18:45:58 <netcronin> puiterwijk: Okay. Is that because you'd be running playbooks that would be intereacting with the docker hosts or some other reason? 18:46:03 * puiterwijk realizes the name might've been unfortunate. The name's just a random idea 18:46:12 <pingou> batmobyle 18:46:20 <nirik> batcomputer 18:46:22 <puiterwijk> netcronin: because the box will have full access to the entire infrastructure. Thus it needs extra protections 18:46:33 <pingou> nirik: robin? :) 18:46:40 <pingou> nirik: or just alfred :D 18:46:45 <netcronin> puiterwijk: Ah, thanks. 18:46:46 <puiterwijk> netcronin: and while containers are made for containerization, they don't always fully contain yet. 18:46:48 <nirik> I don't trust robin with that access. ;) 18:46:57 <pingou> I think Alfred may be best 18:47:00 <netcronin> Sure. 18:47:07 <smooge> I think Joker 18:47:14 * puiterwijk doesn't care about the name. 18:47:16 <nirik> I am eager to get AWX going tho... 18:47:27 <smooge> but stops as it has reached the end 18:47:36 <nirik> I guess we could do a blank vm, install it there and see how it goes. It would have to pass security audit. ;) 18:47:50 <pingou> how would we install it? 18:48:03 <misc> I did a quick verification of AWX, and found nothing :/ 18:48:03 <netcronin> That's my question too, I thought they only support installs to containers. 18:48:17 <nirik> right, we would have to install it in containers. 18:48:20 <cverna> would the aws container build by the LIBS ? 18:48:23 <smooge> I expect the server would have docker like the proxies 18:48:27 <cverna> AWX :) 18:49:00 <nirik> misc: how about the various stuff the container has? Always worried how up to date things like that are... 18:49:40 <pingou> no it's fine nirik they all had shell-shock :) 18:50:05 <smooge> ok do we have more here? 18:50:23 <smooge> if not I can end this and you can all go back to punnery and such in #fedora-admin 18:50:45 <smooge> #endmeeting