18:59:21 #startmeeting Fedora Infrastructure Ops Daily Standup Meeting 18:59:21 Meeting started Wed Dec 4 18:59:21 2019 UTC. 18:59:21 This meeting is logged and archived in a public location. 18:59:21 The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:59:21 Useful Commands: #action #agreed #halp #info #idea #link #topic. 18:59:21 The meeting name has been set to 'fedora_infrastructure_ops_daily_standup_meeting' 18:59:31 #chair nirik relrod smooge 18:59:31 Current chairs: nirik relrod smooge 18:59:45 #topic Initial meeting setup and running 18:59:47 you're 30sec early! :) 19:00:07 ?endmeeting? 19:00:50 #info tickets list is https://pagure.io/fedora-infrastructure/issues 19:00:51 So, lets do all 'needs review' in reverse order... 19:00:54 https://pagure.io/fedora-infrastructure/issues?status=Open&order=asc&order_key=date_created&priority=1 19:01:03 .ticket 8439 19:01:04 nirik: Issue #8439: Outage: Upgrade of Copr servers - 2019-12-05 06:00 UTC (Thursday) - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8439 19:01:15 * cverna around :) 19:01:20 thats a copr outage tomorrow... lets just mod it to be 'waiting on external' 19:01:43 OK will do so 19:01:54 .ticket 8441 19:01:56 nirik: Issue #8441: access to statusfpo - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8441 19:02:15 so for this I think we are using a token. 19:02:16 pagure.issue.assigned.added -- smooge assigned ticket fedora-infrastructure#8439 to praiskup https://pagure.io/fedora-infrastructure/issue/8439 19:02:17 pagure.issue.edit -- smooge edited the priority fields of ticket fedora-infrastructure#8439 https://pagure.io/fedora-infrastructure/issue/8439 19:02:17 * nirik checks 19:02:56 so I could just send them that... but... 19:02:58 yeah, ~/.aws/credentials 19:03:11 I think there is a way to get awscli to auth using saml2... 19:03:22 not great for automation, but might be good in this use? 19:03:35 ooh 19:03:39 could we build automatically from the github commit ? 19:05:12 I'm not sure... perhaps? but it would have to have the token to push to aws? 19:05:18 * smooge is still trying to find the darn repo 19:05:25 it's on github... 19:05:44 https://github.com/fedora-infra/statusfpo/ 19:07:19 yeah.. I cleaned up my local drive 19:07:29 so, don't want to spend too much time here... how about I just send (securely) credentials for now and add them to noc github group to push and we redesign this at some point 19:07:29 we could use github2fedmsg to react on the commit and trigger a playbook or something like that 19:07:30 'cleaned' 19:07:43 nirik, sounds good 19:07:51 +1 19:07:54 put the idea on the ticket queue for future development 19:07:55 cverna: hum, yeah, that might work... or loopabull ? 19:08:12 but yeah, lots of things we could do, just not sure it's worth the time. 19:08:16 ha yes loopabull could work 19:08:36 if it's easy with loopabull, I'd be for it... 19:08:45 then we don't need to send credentials around 19:09:13 does anyone have cycles to try it? or should we punt? 19:09:15 I ll add a ticket to the Community Fire Fighting team, that might something they can look at 19:09:37 pagure.issue.assigned.added -- smooge assigned ticket fedora-infrastructure#8441 to kevin https://pagure.io/fedora-infrastructure/issue/8441 19:09:38 pagure.issue.edit -- smooge edited the priority fields of ticket fedora-infrastructure#8441 https://pagure.io/fedora-infrastructure/issue/8441 19:09:38 there's also "GitHub Actions" which can do stuff like this pretty easily and you can store the token encrypted in GitHub. We'd just want to make sure that token is way limited. 19:09:40 cverna: ok, cool. +1 19:09:50 oops missed you did that 19:10:23 cverna: point back to us, happy to describe current setup or how we would like it to work. 19:10:45 * nirik is wondering if something more blog like would be better for status, but perhaps thats just me 19:11:04 .ticket 8442 19:11:05 nirik: Issue #8442: aarch64 vms for OSBS cluster - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8442 19:11:16 so, lets see... 19:11:39 smooge: which buildvmhost-aarch64's are ready? 10 to... ? 19:12:29 10-17 19:13:17 so, the one with the stg ones on it... we will need to change vlans on the interfaces again, right? 19:13:22 I was going to give 2 nodes on 15, 2 on 16 and 2 on 17 19:13:32 oh shoot 19:13:44 no, lets do all on 2 of them 19:13:46 or just have them be on the .129 anyway 19:13:57 one for prod one for stg. 19:14:08 those machines can easily run 3 vms... 19:14:37 so, we still have no idea when we will have 18+ connected right? 19:14:46 pagure.issue.tag.added -- mizdebsk tagged ticket fedora-infrastructure#8439: outage https://pagure.io/fedora-infrastructure/issue/8439 19:15:01 so my idea was master01/stg-master01 on one, master02/stg-master02 on another and worker01/stg-worker01 on the 3rd. That way they would reboot safely 19:15:08 nirik, probably not til next year 19:15:18 I am assuming February 19:15:32 these only have 1 master. ;) 19:15:57 wow. really? thats anoying. no budget? 19:16:08 so my idea was master01/stg-master01 on one, worker01/stg-worker01 on another and worker02/stg-worker02 on the 3rd. That way they would reboot safely 19:16:32 nirik, end of year purchasing/fulfillment and other crisis 19:16:42 well, these are not like a regular openshift... it should be safe to reboot them as long as they are not actively building 19:16:45 cverna: ^ right? 19:16:55 yes 19:16:56 anything we put in December usually only comes in on February 19:17:11 ok in that case.. I 3 on 16 and 3 on 17 19:17:14 so, I would vastly prefer to split stg and prod so we can reboot the stg one with all the rest of the staging buildsys 19:17:21 it is really just using the build capability of openshift 19:17:28 and also they are vastly overpowered for just 3 vm's. 19:17:34 cverna, what networks does it need to be on 19:17:47 prod I assume 10.5.125. 19:18:00 nirik, put the staging builder on it also? 19:18:35 they need to be able to communicate with the x86_64 cluster 19:18:43 I have added some details in the ticket 19:18:50 so wait... 19:19:03 we have already the 2 masters (from when we tried this before) 19:19:08 they are both on 129 19:19:19 which might have been due to moonshot doom 19:19:24 it was 19:19:36 trying to get those on different vlans caused other nodes to go off 19:19:50 yes these master could be reused if needed (for builders ?) 19:19:50 I think prod can be on 129 (that's what the other arm builders are on) 19:19:54 the firewall though I believe is set up 19:20:22 I guess we could leave the stg one on 129 to avoid dealing with moving vlans 19:21:01 I wonder... could we put them all on one node? 19:21:38 we could for the time being until we get more of these boxes enabled 19:22:03 yeah, and we see how many we have left after all the people take them for things. ;) 19:22:07 so, lets use 17 19:22:18 stick em all on there. 19:22:58 so for firewall: 19:23:02 "Both clusters x86_64 and aarch64 are talking to each others using REST API on the port 8443, this port needs to be open" 19:23:10 is that the masters? 19:23:13 or ? 19:23:20 masters and nodes 19:23:26 "open" isn't something we can do 19:23:49 what does open mean 19:23:52 we can open specific source -> dest on specific ports. 19:24:35 hum, what net are the x86_64 ones on 19:25:03 so, they are on 125. 19:25:26 nirik, I can get 17 br1 put on the .125 19:25:29 so we either a) get RHIT to adjust firewalls from 129 -> 125 nets, or b) just change vlans to put the new ones on 125. 19:25:30 so my understanding is that the x86_64 master needs to be able to communicate with the aarch64 master and nodes 19:25:35 that can be done by Friday I expect 19:25:52 and the aarch64 master and nodes needs to communicate with the x86_64 master 19:25:54 br0 put on the .125 19:26:02 stg ones are .128 19:26:13 so put br1 on 128? 19:26:33 doesn't this need to talk to storage to write its stuff? 19:26:37 there is also the port 22 for ansible to be able to deploy the cluster from osbs-control01 19:26:58 smooge: no it goes to the oci registry 19:26:59 I think it uploads to the koji hub? 19:27:04 or registery... ok 19:27:26 it only uploads logs to the koji-hub + meta data about the build 19:27:53 but that's goes through the x86_64 master 19:28:16 so I think 8443 is allowed between the 128 and 125 so we should be ok with it just on the 125 19:28:28 we are coming up to the half-hour on this meeting 19:28:38 yeah, so move br0 to 125 and br1 to 128 and we put them all on there for now. Also, perhaps we change buildvmhost-aarch64-17 to buildvmhost-aarch64-osbs-01 ? I guess that doesn't matter too much 19:28:55 I will put in the work order and make the dns changes 19:29:22 I think the rename makes a lot of sense because we will know where to look for broken stuff easier 19:29:25 if the vm's are on 125, no rhit firewall between x86_64 and aarch64. ;) 19:29:36 and likewise the stg one 19:29:45 yep 19:29:47 +1 here for renaming, too 19:29:56 so, smooge: you want to do this ticket? can you update it with all this? 19:30:01 I will take the ticket 19:30:06 I did the previous osbs stuff 19:30:10 and quickly... 19:30:12 so I have the best take on it 19:30:12 .ticket 8444 19:30:13 nirik: Issue #8444: Decomission ci-cc-rdu01 - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8444 19:30:23 thats waiting on external (next monday) 19:30:43 thats all the needs reviews. ;) so lets stop here. 19:32:03 thanks all :) 19:32:24 pagure.issue.assigned.added -- smooge assigned ticket fedora-infrastructure#8442 to smooge https://pagure.io/fedora-infrastructure/issue/8442 19:32:25 pagure.issue.edit -- smooge edited the priority fields of ticket fedora-infrastructure#8442 https://pagure.io/fedora-infrastructure/issue/8442 19:32:26 pagure.issue.comment.added -- smooge commented on ticket fedora-infrastructure#8442: "aarch64 vms for OSBS cluster" https://pagure.io/fedora-infrastructure/issue/8442#comment-614866 19:32:31 #endmeeting