18:02:41 #startmeeting Infrastructure (2017-10-11) 18:02:41 Meeting started Thu Oct 12 18:02:41 2017 UTC. The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:02:41 Useful Commands: #action #agreed #halp #info #idea #link #topic. 18:02:41 The meeting name has been set to 'infrastructure_(2017-10-11)' 18:02:41 #meetingname infrastructure 18:02:41 The meeting name has been set to 'infrastructure' 18:02:41 #topic aloha 18:02:41 #chair smooge relrod nirik abadger1999 dgilmore threebean pingou puiterwijk pbrobinson maxamillio 18:02:41 Current chairs: abadger1999 dgilmore maxamillio nirik pbrobinson pingou puiterwijk relrod smooge threebean 18:02:43 .hello2 18:02:44 bowlofeggs: bowlofeggs 'Randy Barlow' 18:02:45 here 18:02:53 oh I thought I have missed the meeting already 18:02:59 Hello! 18:03:49 Hi 18:04:02 #topic New folks introductions 18:04:02 #info This is a place where people who are interested in Fedora Infrastructure can introduce themselves 18:04:13 Hello any new people here this week? 18:04:26 * tflink isn't new, just shows up late 18:04:44 3rd meeting for me 18:04:56 no problem. hi everyone 18:05:07 #topic announcements and information 18:05:07 #info PHX2 Colo Trip, Dec 4th - 9th 18:05:07 #info Infrastructure will be down during that trip. 18:05:07 #info Final freeze .5 weeks. PLEASE TEST. 18:05:08 #info Disaester recovery network stood up. Systems need to be designed/built 18:05:09 #info Bodhi-2.12.{0..2} deployed this week. 18:05:11 #info Bodhi-2.13.0 is now planned to contain the modular mashing patch 18:05:21 3D meeting for me 18:05:28 i've got this awesome new 3D IRC client 18:05:33 * cverna waves 18:05:37 I want to empasize for many different people. The Dec 4-9th time will be a lot of downtime 18:05:50 bowlofeggs: does that mean that text flies at you? 18:05:51 morning. sorry I am late 18:05:51 bowlofeggs++ 18:05:51 cverna: Karma for bowlofeggs changed to 14 (for the f26 release cycle): https://badges.fedoraproject.org/tags/cookie/any 18:05:56 tflink: hahah yep ☺ 18:05:57 empanada 18:06:01 cool 18:06:01 bowlofeggs: did you get a 3D cookie 18:06:13 i sure did and it tasted… digital… 18:06:18 do the server folks know about this? 18:06:25 tflink, yes 18:06:32 they may forget.. but they know :) 18:06:32 I think that their separate release of modular server just slipped to the 12th of december 18:07:02 it was part of the weird dates they can ship on. 18:07:05 aka the 19th 18:07:08 vs the 12th 18:07:29 ok, just figured I would mention it :) 18:08:03 no problem 18:08:19 any other announcements? 18:08:20 yeah. 18:08:34 sorry, I mean yeah to what you were saying, not yeah to announcements. ;) 18:08:54 yeah. 18:09:08 sorry, I mean yeah to what nirik was saying about sorry, I mean yeah to what you were saying, not yeah to announcements. ;) 18:09:12 #topic Ticket cleanup 18:09:21 #info 5970 Let's put OSTrees on mirrorlist and dl.fpo 18:09:32 I just picked a random ticket 18:09:51 we had planned to do that for F26, but as you can see… it didn't happen 18:09:53 is this doable? done? time to send to the clearer? 18:09:54 we still want to do it though 18:09:55 That needs more work, and is on my todo for post-F27 18:10:05 ok 18:10:07 well then .. 18:10:19 modularity and CI happened and that went straight to the back burner, on the backup stove, in the guest cabin… 18:10:32 #topic ResultsDB Performance/Resources 18:10:32 #info there has been some talk about computing resources for production resultsdb, let's discuss more concretely 18:10:54 I'm starting to see some errors in Taskotron production and there are a lot of timeouts in resultsdb 18:11:14 it would seem that bumping the resources on the db host before freeze would be wise 18:11:16 what databases does it use in the backend? 18:11:56 There are some resources to update it. 18:12:01 postgres 18:12:09 or did you mean which host? 18:12:10 just need a short outage... just a minute. 18:12:15 which host 18:12:17 db-qa02.qa 18:12:27 it's on virthost-comm03.qa I think 18:12:36 ah ok. sounds like nirik has this? 18:12:47 I could do it, as long as I knew when. :) 18:13:17 when does freeze start? on tuesday? 18:13:55 tuesday 18:14:13 so today, tomorrow or monday 18:14:18 can we do it today? 18:14:28 shouldn't be a problem 18:14:47 it'll be somewhat unannounced but I don't think many folks will notice 18:15:14 ok nirik/tflink how does 19:30 UTC after the meeting sound? 18:15:16 pingou: do you resultsdb going down for 20 minutes would/could cause problems with the ci stuff 18:15:26 sure. anytime is fine with me. 18:15:40 ok, we can figure out the details outside of the meeting 18:16:31 okie dokie 18:16:42 #topic Cleaning up old sysadmins 18:16:42 #info various sysadmins are just bouncing slowly. 18:17:02 * bowlofeggs reads that as "old" as in "age" 18:17:23 I cleaned up several thousand "bounce" emails yesterday to a couple of people on sysadmin 18:17:59 I would say that anyone who hasn't been active in one way or another in the last 2 months and is in sysadmin-* groups should get a reminder, and if they don't reply in a week, we remove them. We can easily re-add people if they get back 18:18:10 mainly because I was tired of every email that gets send to the "sysadmin" group from Oct 1st from sending me 2-3 bounce notifications 18:18:25 "active" being either ansible commits, ssh access, infra list comments, or anything else 18:18:27 I sent email to the people and ... got bounce notifications 18:18:49 i always laugh when i get OOO replies on my ansible commits 18:18:52 smooge: if they bounce, I'd say kick right away. We need a way to get in touch with anyone having access to our infra 18:19:13 * bowlofeggs hates OOO replies in general. e-mail is already asynchronous, no need to say you are away! 18:19:24 As said, if they fix their email and ping us back, it's easy to re-add them 18:19:25 that was my initial reaction also but I wanted to get some feedback 18:20:26 We really need the account email for sysadmins to work in case of problems. If it doesn't, they need to fix it, and until then they can't be in access-giving groups, in my opinion. 18:20:27 * tflink doesn't have an issue with that 18:20:51 puiterwijk, when you said "they should get a reminder" do you mean that in the "this is implemented" or "this is how it should be" way? 18:21:08 smooge: "this is how it should be". But that's just dormant accounts. 18:21:30 no problem. I was just going to start looking to see how that was done :) 18:21:51 smooge: if they bounce with an error that doesn't sound temporary ("This user got too many emails in the last 60 seconds" vs "this account is disabled"), remove 18:21:59 ok anyway I wanted to ask around and get some feedback from the group 18:22:23 puiterwijk, well the bounce is "This user got too many emails in the last 60 seconds" for 10 days 18:22:47 so its temporary but not temporary 18:22:51 smooge: in that case, if it continues for >1day, I'd also say drop 18:23:09 okie dokie 18:23:14 Like, IK can understand if you break your mailserver for a day. but if you then don't fix it, we have no way to get in touch. 18:23:21 As said, when you fix it, it's easy to re-add 18:23:35 I had never seen gmail say it for 10 days 18:23:52 and even when I sent them email via gmail.. it gave me the error. 18:23:59 Yeah, me neither. That sounds like their email is under heavy DoS. 18:24:16 so I am going to remove ours from that DDOS 18:24:18 ok 18:24:38 #topic Apprentice Open office hours 18:24:51 Any outstanding ticket or issues apprentices would like help on? 18:26:32 ok next up is something Kevin said he wanted to go over 18:26:35 #topic Learn about: OpenShift (by Kevin) 18:27:55 oh yeah, I just wanted to give a really high level overview on our openshift setup... 18:28:27 so we have currently 2 clusters. one staging and one production. They are both setup the same way... staging is version 3.6 however and prod is still 3.5 18:28:51 they each use 6 vm's... 18:29:31 2 are called 'nodes' ie, os-node01 os-node02. These actually run pods of containers for apps. 18:30:07 3 are 'masters' os-master01, 02, 03... etc. These run the config setup and infrastructure. 18:30:51 1 is a 'control' host... ie, os-control01. These are actually not running openshift, instead they run the openshift-ansible playbook that configures and installs and manages the others. 18:30:52 are we using openshift ansible to configure the cluster ? 18:31:00 :) 18:31:04 :) 18:31:04 cverna: yes 18:31:28 Enrollment barcode or barcodes 18:31:33 the reason we have control hosts is because it allows them to just pull/use openshift ansible without having to mix or import it in our batcave01 setup 18:32:08 so batcave goes to control host which in turn runs ansible again. Ansible inside ansible. ;) 18:32:27 ansible-ansible-openshift-ansible is the name of the role if I'm remembering correctly 18:32:36 ha, nice 18:32:49 For applications we have been using ansible roles that have templates and load them into openshift. 18:32:55 haha wonder how we could add an extra ansible in there 18:33:06 and then openshift tries to make reality match the database 18:33:18 ansible-ansible-openshift-ansible-cowbell. 18:33:43 anyhow, so far this seems to be working out ok. 18:33:48 so we make a tower container inside of openshift 18:34:07 which then runs the batcave ansible 18:34:10 Do we have atomic hosts ? or are we planing to use some ? 18:34:29 we are not currently. 18:34:31 cverna: no, we tried, but full atomic setups are not supported by 3.5 because of the ha layer 18:35:00 I think they said it was supposed to be supported in 3.6, but I haven't heard anything about that 18:35:00 that and it would take some rework to make our base provisioning ansible work with them 18:35:58 Oh, and we are using RHOSP.. we tried with origin, but there were issues and we wanted a pretty stable base to try with from the start. 18:36:10 nirik: wrong abreviation. 18:36:21 RHOSP = Red Hat Open stack Platform 18:36:27 I can never keep them straight anymore 18:36:27 We use Openshift Container Platform 18:36:33 RHOCP 18:36:35 haha 18:37:00 * puiterwijk adds 1 to the counter of "openshift vs openstack misnamings" 18:37:07 lets just call them all bruce. ;) 18:37:23 cool, some more acronyms to learn about :) 18:37:31 Are we running Ansible Tower at any other point currently? 18:37:34 * puiterwijk thinks we need an app on os.fp.o for the os vs os counter 18:37:36 netcronin: no 18:37:51 netcronin: and smooge's statement was not true (I hope?), but a joke 18:38:10 Right, just thinking it would be neat to see it run batcave. 18:38:22 netcronin: yes. But let's discuss that in open floro. 18:38:27 okay sorry 18:38:27 Kevin was explaining stuff 18:38:27 I was trying to answer cverna's question of how to put ansible one more place 18:38:39 well, that was mostly what I wanted to cover. 18:38:44 nirik, anything else after we derailed you? 18:38:52 just how it was basically setup. We do have a few apps already in it 18:39:01 we need to figure out monitoring 18:39:06 yeah... 18:39:20 I think we should have nagios monitor the endpoints... 18:40:48 that sounds appropriate 18:40:54 I guess that is on me :) 18:41:08 will see how it is done elsewhere and reimplement 18:41:10 like the console and each apps outside dns name end 18:42:47 https://pagure.io/fedora-infrastructure/issue/6442 18:43:09 anything else? 18:43:23 #topic Open Floor 18:43:33 Anything for the open floor ? 18:43:52 Do you guys have any easyfix or otherwise tickets you need help with? 18:43:53 I brought cake :) 18:44:22 🎂 18:44:23 netcronin: just to respond to your questions: no, we won't do ansible tower. Yes, we want to deploy AWX (the open source version) at some point, but that's down the line and on a batcontainer or something like that. 18:44:33 pingou++ I can really use some cake at the moment 18:44:49 (does that looked like a cake?) 18:44:57 puiterwijk: Makes senese, thanks. 18:44:59 pingou: yes 18:45:08 batcontainer? 18:45:10 cool, it didn't render here :) 18:45:11 puiterwijk: We could probably run it on one of the Docker hosts. 18:45:18 tflink: batcave running containers. 18:45:19 tflink: sounds like a batmobile :) 18:45:21 netcronin: no. 18:45:23 makes me think of something like troll-repellant bat spray 18:45:28 netcronin: we want it on an entirely separate server 18:45:42 makes the trolls fall off and explode, just like sharks :) 18:45:54 tflink: :) 18:45:58 puiterwijk: Okay. Is that because you'd be running playbooks that would be intereacting with the docker hosts or some other reason? 18:46:03 * puiterwijk realizes the name might've been unfortunate. The name's just a random idea 18:46:12 batmobyle 18:46:20 batcomputer 18:46:22 netcronin: because the box will have full access to the entire infrastructure. Thus it needs extra protections 18:46:33 nirik: robin? :) 18:46:40 nirik: or just alfred :D 18:46:45 puiterwijk: Ah, thanks. 18:46:46 netcronin: and while containers are made for containerization, they don't always fully contain yet. 18:46:48 I don't trust robin with that access. ;) 18:46:57 I think Alfred may be best 18:47:00 Sure. 18:47:07 I think Joker 18:47:14 * puiterwijk doesn't care about the name. 18:47:16 I am eager to get AWX going tho... 18:47:27 but stops as it has reached the end 18:47:36 I guess we could do a blank vm, install it there and see how it goes. It would have to pass security audit. ;) 18:47:50 how would we install it? 18:48:03 I did a quick verification of AWX, and found nothing :/ 18:48:03 That's my question too, I thought they only support installs to containers. 18:48:17 right, we would have to install it in containers. 18:48:20 would the aws container build by the LIBS ? 18:48:23 I expect the server would have docker like the proxies 18:48:27 AWX :) 18:49:00 misc: how about the various stuff the container has? Always worried how up to date things like that are... 18:49:40 no it's fine nirik they all had shell-shock :) 18:50:05 ok do we have more here? 18:50:23 if not I can end this and you can all go back to punnery and such in #fedora-admin 18:50:45 #endmeeting