16:02:58 #startmeeting Infrastructure (2021-01-28) 16:02:58 Meeting started Thu Jan 28 16:02:58 2021 UTC. 16:02:58 This meeting is logged and archived in a public location. 16:02:58 The chair is pingou. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:02:58 Useful Commands: #action #agreed #halp #info #idea #link #topic. 16:02:58 The meeting name has been set to 'infrastructure_(2021-01-28)' 16:03:00 #meetingname infrastructure 16:03:00 The meeting name has been set to 'infrastructure' 16:03:02 #chair nirik pingou smooge cverna mizdebsk mkonecny abompard siddharthvipul mobrien 16:03:02 Current chairs: abompard cverna mizdebsk mkonecny mobrien nirik pingou siddharthvipul smooge 16:03:04 #info Agenda is at: https://board.net/p/fedora-infra 16:03:06 #info About our team: https://docs.fedoraproject.org/en-US/cpe/ 16:03:08 #topic aloha 16:03:10 ó/ 16:03:30 morning 16:03:32 .hello zlopez 16:03:32 Zlopez[m]: zlopez 'Michal Konečný' 16:04:14 .hello2 16:04:15 .hello 16:04:15 austinpowered__: Sorry, but you don't exist 16:04:18 mobrien: (hello ) -- Alias for "hellomynameis $1". 16:04:23 hello 16:04:25 .hi 16:04:26 mobrien: mobrien 'Mark O'Brien' 16:05:45 let's move on 16:05:47 #topic New folks introductions 16:05:49 #info This is a place where people who are interested in Fedora Infrastructure can introduce themselves 16:05:51 #info Getting Started Guide: https://fedoraproject.org/wiki/Infrastructure/GettingStarted 16:05:55 any new folks who would like to introduce themselves? 16:06:55 looks pretty quiet :) 16:06:58 #topic Next chair 16:06:59 #info magic eight ball says: 16:07:01 #info chair 2021-02-04 - siddharthvipul 16:07:02 .hi 16:07:03 darknao: darknao 'Francois Andrieu' 16:07:07 so next week siddharthvipul will chair 16:07:14 do we have a volunteer for Feb 11th? 16:07:32 I could take it 16:07:41 it's your! 16:07:47 #info chair 2021-02-11 - zlopez 16:07:47 thanks Zlopez[m] 16:07:56 any volunteer for the 18th? 16:07:59 Updating my TODO list :-) 16:08:01 ok me 16:08:08 #info chair 2021-02-18 - smooge 16:08:09 Zlopez[m], I need a .org training session 16:08:12 thanks smooge 16:08:22 I am on call now I think right? 16:08:29 it's me this week :) 16:08:40 we can keep the 25th for next week :) 16:08:51 let's move to annoucements 16:08:53 I thought yours ended today and I started 16:08:55 ok sorry 16:08:56 #topic announcements and information 16:08:58 #info CPE Infra&Releng EU-hours team has a Monday through Friday 30 minute meeting going through tickets at 1030 Europe/paris in #centos-meeting 16:09:00 #info CPE Infra&Releng NA-hours team has a Monday through Friday 30 minute meeting going through tickets at 1800 UTC in #fedora-admin 16:09:02 #info Datacenter move is over, but some items still need to be done: see https://fedoraproject.org/wiki/Infrastructure/2020-post-datacenter-move-known-issues 16:09:04 #info Anitya (release-monitoring.org) 1.0.0 is now running in production 16:09:18 any other announcement someone would like to make? 16:09:35 #info mass rebuild is progressing along... 16:09:49 any news on the AAA ? 16:10:13 what is AAA ? 16:10:17 newbie here 16:10:19 the new account system 16:10:24 ok 16:10:30 AAA stand for: Account, Authorization and Authentication 16:10:42 hello together 16:11:18 you could just say noggin too. But not sure it would be _less_ confusing. :) 16:11:26 :) 16:11:36 we have been working through ssh and 2fa issues in staging... making good progress I think 16:11:41 AAA is still not quite there but on the way :) 16:11:58 #info the new AAA system is progressing along in stg, still some work to do though 16:12:07 * mboddu is here 16:12:08 There are a few loose ends to tie up with 2fa as well as finishing touches on 2 apps I think 16:12:10 anything else to announce? 16:12:28 AAA short presentation could be the topic for next week learning section if hasnt been already 16:12:38 * pingou puts it on the list 16:13:04 NExt week is set for IPA. Is that AAA is using? 16:13:06 I think it may be best to move it to the week after as it should be more fully formed at that stage 16:13:10 we need to figure out src.stg.fp.o auth... 16:13:23 austinpowered, IPA is part of AAA 16:13:25 austinpowered: it's part of it yes 16:14:10 could be moved next week, just wanted to add it to the agenda 16:14:14 I know IPA can handle many roles. Looking forward to next week. 16:14:24 it's on the ideas box 16:14:29 ;-) 16:14:31 austinpowered: mee too 16:14:38 me too! 16:14:46 ok, let's move to oncall and we'll get to the learning topic at the end 16:14:48 #topic Oncall 16:14:50 #info https://fedoraproject.org/wiki/Infrastructure/Oncall 16:14:51 oh wait, I'm giving that one... I better read up. :) 16:14:52 #info smooge is oncall for 2021-01-28 to 2021-02-04 16:15:03 anyone to take up oncall after smooge? 16:15:12 Feb 4th to 11th 16:15:29 what is to do onCall ? 16:15:36 .takeoncallus 16:15:43 .oncalltakeus 16:15:43 smooge: Error: You don't have the alias.add capability. If you think that you should have this capability, be sure that you are identified before trying again. The 'whoami' command can tell you if you're identified. 16:15:53 the oncall person is the designed person to be interupted by questions on IRC 16:16:00 I'm happy to take it if no one else... 16:16:05 so other folks can remin focused on their work 16:16:34 if the person being oncall cannot/don't know the answer to the issue, they simply ask that the issue be logged in a ticket 16:16:46 nirik: looks like it'll be yours 16:16:52 #info nirik is oncall for 2021-02-04 to 2021-02-11 16:16:59 anyone for the week after? 16:17:12 i will try it 16:17:17 cool, thanks 16:17:23 #info dtometzki is oncall for 2021-02-11 to 2021-02-18 16:17:27 .oncalltakeus 16:17:27 smooge: Kneel before zod! 16:17:33 #info Summary of last week: (from current oncall ) 16:17:36 so that was me 16:17:41 I've got a few pings 16:17:59 one on the releng side about an eln build running consistently out of space on koji 16:18:13 I've sent them to the releng tracker where it got closed as a temp issue 16:18:29 I've also helped a couple of people on IRC 16:18:47 on having issue pushing to pagure.io over ssh, turned out it was a permission issue on their side on their .ssh folder 16:19:09 and one had a wrong host on .ssh/config to bounce via bastion to an internal host 16:19:16 iirc that's about it 16:19:29 #topic Monitoring discussion [nirik] 16:19:31 #info https://nagios.fedoraproject.org/nagios 16:19:33 #info Go over existing out items and fix 16:19:39 nirik: if you want to take it :) 16:19:43 dtometzki, if you have any questions when on call you can ping me directly. 16:19:51 I don't think there's much change here from last week... let me check tho 16:20:15 I've seen greenwave going above its threshold and then back down a few minutes after 16:20:19 yeah, pretty much exactly the same as last week... 16:20:19 mobrien: many thanks 16:20:27 I wonder if something changed in greenwave that made it a little slower 16:20:46 pingou: well, we added some openqa stuff... but not sure it would cause this 16:20:50 so maybe we can ask greenwave's folks for this and potentially tweak the nagios threshold? 16:21:07 I noticed few warnings about the greenwave queue 16:21:10 nirik: could be that as well if openqa sends more results 16:21:56 sure, seems reasonable 16:22:02 And I fixed the rabbitmq queue for the-new-hotness 16:22:27 Found some bug after moving Anitya 1.0.0 to production 16:25:04 anyhow, nothing else nagios wise... 16:25:19 let's move to the learning topic of the day 16:25:25 #topic Learning topic 16:25:27 #info ansible setup [nirik/mobrien] on 2020-01-28 16:25:31 #info IPA [nirik] on 2020-02-04 16:25:41 mobrien: nirik: the floor is yours 16:25:47 I'll start 16:25:56 This is a relativley large topic so if I miss anything or there are any questions, please speak up as I go as I will be talking on this topic again so all feedback is welcome 16:26:13 Ansible is an automation tool which we use for app deployment and config as well as infrastructure as code 16:26:13 The repo is available here https://pagure.io/fedora-infra/ansible 16:26:16 * nirik is happy to answer questions/expand on anything. 16:26:34 With a few minor exceptions all our infra is controlled by ansible and the hosts can be seen in the inventory dir. 16:26:56 This directory is where most of the host setup lives, it contains the inventory files of all the hosts as well as group_vars and host_vars 16:27:14 The inventory uses an ini file format. The group_vars directly relate to the groups specified in the inventory files and the same for host vars 16:27:22 For example, if a host exists called test.fedoraproject.org in the test group in the inventory. The vars specified in the `host_vars/test.fedoraproject.org` and in `group_vars/test` will automatically apply to that host. 16:27:55 So if you have some variable confusion this is a good place to check 16:28:01 do you use ansible tower or AWX 16:28:02 ? 16:28:05 The precedence for these can be seen here https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#understanding-variable-precedence 16:28:15 bodanel, not at the moment 16:28:24 possibly in the future 16:28:51 Secret variables are held in a separate ansible repo which only members of the sysadmin-main group have. 16:29:05 This repo is hosted on the batcave and is explicitly referenced in the required playbooks. 16:29:21 All the playooks are in the playbooks dir, there are a few admin type playbooks in the base of this directory with most of the rest in subdirectories depending on type. 16:29:35 For example there is an openshift-apps directory, no prizes for guessing whats housed there. 16:30:04 All the roles are in the roles dir which is where the bulk of the work actually gets done, roles are often used by multiple different playbooks so must be written in as generic a way as possible and care taken to correct tagging. 16:30:42 tags can be specified at run time to only run certain tasks, hence their importance 16:30:57 what's the process of making a change: branch and pull request ? 16:31:01 There are a number of templates in a number of places in the repo which contain the bulk of our server configuration, usually in the related role. It is best to do a git grep if you are looking for something specific in these. 16:31:19 bodanel, o make PR's just create a fork to make changes and create a PR against the main branch. 16:31:25 s/o/to 16:31:57 bodanel: Here is the link to repository https://pagure.io/fedora-infra/ansible 16:32:02 thks 16:32:34 Do you guys ever use ad-hoc ansible commands? 16:32:37 Ansible is the source of truth for our infra so a lot of questions about the infra can be answered here. 16:32:55 yep. all the time. :) 16:33:32 ComputerKid[m], for something one off but generally like to keep everything in playbooks for idempotence 16:33:52 That makes sence 16:33:58 *sense 16:34:17 if it is to gether info ansible adhoc is the go to 16:34:28 I often use ad-hoc things when gathering information or doing things to large groups (like our builders) 16:34:30 s/gether/gather 16:34:54 That is most of the config covered, any questions on that or will I move onto how to run them? 16:35:10 or nirik if I left anything important out? 16:35:34 Do you ever deal with needing to run actions on different platforms with playbooks? And if so, how? 16:35:43 I imagine it's all fedora... 16:35:58 ComputerKid[m], we have a lot of el8 hosts 16:36:14 So in the playbooks you'll see when statements to cover this 16:36:21 I'll find an example 16:36:41 :D 16:36:59 yeah, fedora, el7, el8, lots of different arches 16:37:07 https://pagure.io/fedora-infra/ansible/blob/main/f/roles/basessh/tasks/main.yml#_46 16:38:02 This is where the gather_facts in ansible really comes to the fore 16:38:37 there is a number of different setups you can cater for that way 16:38:55 So does that act as an if stament on `make sure python3-libselinux is installed`? 16:39:20 Yep, `when` is `if` in ansible language 16:39:45 That's cool. Thanks for explaining 16:40:14 no problem, any other questions don't hesitate to ask 16:40:37 Who uses these playbooks? 16:40:48 everything is written in house or do you also use roles from galaxy ? 16:41:02 All in house as far as I know 16:41:19 nice 16:41:34 1+ nice 16:41:42 Most people involved in the infra will run a playbook at some stage or at least contribute and someone else run it for them 16:41:48 cool 16:42:28 How do you become part of infra? 16:42:31 So on to how they are run ... 16:42:43 The playbooks must be run from the batcave as there are some hardlinks such as the private repo mentioned earlier. 16:43:00 The batcave is like a bastion server in our infra 16:43:04 batcave i suppose is a hardened server 16:43:14 bodanel, exactly 16:43:28 The master.yml playbook says it has all playbooks and gets run with tags. Do you have scripts that call master.yml with given tags for certain tasks? 16:43:31 it also has the access to everything else via ssh that ansible needs. 16:43:57 I suppose Infra SSHes into it? 16:44:04 austinpowered: we renamed it recently actually, it's 'main.yml' now. ;) 16:44:32 austinpowered, more so that it is run and the desired tag specified based on what the runnner would like to do 16:44:37 there's a cron job that runs nightly that runs that with --check --diff mode... so we can see what it would have changed if it changed things. 16:44:38 I guess I need to do another pull. ;) 16:44:43 any issues when moving from master to main- heard some horror stories ? 16:45:16 bodanel, I wasn't involved in it personally but seemed well handled 16:46:15 thks 16:46:15 To access the batcave you would need some permissions in fas, I can never remember which ones but nirik will be able to answer that 16:46:19 not too much hassle 16:46:38 you have to be in a sysadmin-something group or fi-apprentice 16:47:05 The playbooks can be run using the rbac-playbook command (Role Based Access Control https://bitbucket.org/tflink/ansible_utils/src/master/) 16:47:22 This command assumes that you are in the playbooks directory of the ansible repo so for example to run the sundries playbook which is located in `playbooks/groups/sundries.yml` you would run: 16:47:22 sudo -i rbac-playbook groups/sundries.yml 16:47:39 Permissions to run specific playbooks are controlled by what fas group a user is in. They are normally sysadmin-* i.e. sysadmin-mbs to run the mbs playbook. Although not always that obvious :) 16:48:04 rbac-playbook accepts flags such as (-t) to specify to run only tasks with the tag specified or (-l) to limit to certain hosts. 16:48:04 It does not however allow for extra vars to be passed at run time for security reasons. 16:48:26 wgat is the process to get the group permissions ? 16:48:35 what 16:49:06 If you require permissions to run a playbook raise a ticket on the fedora-infra tracker https://pagure.io/fedora-infrastructure/issues and specify a reason you need the permission 16:49:34 that goes for permissions in general in the infra 16:49:51 ah ok 16:50:08 with fi-aprentice group can you run ad-hoc setup module to look at certain facts ? 16:50:27 nope... 16:50:43 fi-apprentice has read-only access. they can't run playbooks or anything. 16:51:19 running setup is not considered read-only 16:51:21 I got it 16:52:07 i was thinking since it only runs the setup module it gathers facts 16:52:15 and just displays them 16:52:27 but I understand the logic behind denying that 16:52:54 Thats about all I have. Any questions? 16:53:02 Or additions nirik? 16:53:44 not from me. If ansible is considered the source of truth is a good place to start getting dirty 16:53:53 hands dirty 16:53:56 yeah, but it means you can ssh to the host and gather those things... so it's hard to allow that and prevent anything else. ;) 16:53:59 We could switch to "Learning topic discussion" topic 16:54:37 Zlopez[m], haha, sorry my bad 16:54:53 should have done that at the start 16:54:55 Thanks guys, this has been educational 16:55:30 I know there is an ansible command to list tags. But how do you run that against this repo? 16:55:39 no prob ComputerKid, feel free to reach out on #fedora-admin if you have any questions in the future 16:56:05 austinpowered: you should just be able to check out the repo and run it against that... 16:56:35 mobrien: We are in correct topic for the learning topic, but for the discussion we have another, so it's easier to read in log 16:56:46 austinpowered, generally its a good idea to look at some of the tasks that you wish to run to ensure the tag does what you think it should 16:56:47 I have the repo. I'll look at using the command against a directory 16:57:03 #topic Learning topic discussion 16:57:21 out of time - thanks again 16:58:05 I guess we ran over so we don't really have time for open floor. does anyone have anything quick they wish to discuss? 16:58:11 #topic Open Floor 16:58:34 I have, but I'll ask in the main channel instead 16:58:44 thanks darknao 16:58:52 #endmeeting