18:59:21 <smooge> #startmeeting Fedora Infrastructure Ops Daily Standup Meeting
18:59:21 <zodbot> Meeting started Wed Dec  4 18:59:21 2019 UTC.
18:59:21 <zodbot> This meeting is logged and archived in a public location.
18:59:21 <zodbot> The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:59:21 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
18:59:21 <zodbot> The meeting name has been set to 'fedora_infrastructure_ops_daily_standup_meeting'
18:59:31 <smooge> #chair nirik relrod smooge
18:59:31 <zodbot> Current chairs: nirik relrod smooge
18:59:45 <smooge> #topic Initial meeting setup and running
18:59:47 <nirik> you're 30sec early! :)
19:00:07 <smooge> ?endmeeting?
19:00:50 <smooge> #info tickets list is https://pagure.io/fedora-infrastructure/issues
19:00:51 <nirik> So, lets do all 'needs review' in reverse order...
19:00:54 <nirik> https://pagure.io/fedora-infrastructure/issues?status=Open&order=asc&order_key=date_created&priority=1
19:01:03 <nirik> .ticket 8439
19:01:04 <zodbot> nirik: Issue #8439: Outage: Upgrade of Copr servers - 2019-12-05 06:00 UTC (Thursday) - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8439
19:01:15 * cverna around :)
19:01:20 <nirik> thats a copr outage tomorrow... lets just mod it to be 'waiting on external'
19:01:43 <smooge> OK will do so
19:01:54 <nirik> .ticket 8441
19:01:56 <zodbot> nirik: Issue #8441: access to statusfpo - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8441
19:02:15 <nirik> so for this I think we are using a token.
19:02:16 <fm-admin> pagure.issue.assigned.added -- smooge assigned ticket fedora-infrastructure#8439 to praiskup https://pagure.io/fedora-infrastructure/issue/8439
19:02:17 <fm-admin> pagure.issue.edit -- smooge edited the priority fields of ticket fedora-infrastructure#8439 https://pagure.io/fedora-infrastructure/issue/8439
19:02:17 * nirik checks
19:02:56 <nirik> so I could just send them that... but...
19:02:58 <relrod> yeah, ~/.aws/credentials
19:03:11 <nirik> I think there is a way to get awscli to auth using saml2...
19:03:22 <nirik> not great for automation, but might be good in this use?
19:03:35 <relrod> ooh
19:03:39 <cverna> could we build automatically from the github commit ?
19:05:12 <nirik> I'm not sure... perhaps? but it would have to have the token to push to aws?
19:05:18 * smooge is still trying to find the darn repo
19:05:25 <nirik> it's on github...
19:05:44 <nirik> https://github.com/fedora-infra/statusfpo/
19:07:19 <smooge> yeah.. I cleaned up my local drive
19:07:29 <nirik> so, don't want to spend too much time here... how about I just send (securely) credentials for now and add them to noc github group to push and we redesign this at some point
19:07:29 <cverna> we could use github2fedmsg to react on the commit and trigger a playbook or something like that
19:07:30 <smooge> 'cleaned'
19:07:43 <smooge> nirik, sounds good
19:07:51 <cverna> +1
19:07:54 <smooge> put the idea on the ticket queue for future development
19:07:55 <nirik> cverna: hum, yeah, that might work... or loopabull ?
19:08:12 <nirik> but yeah, lots of things we could do, just not sure it's worth the time.
19:08:16 <cverna> ha yes loopabull could work
19:08:36 <nirik> if it's easy with loopabull, I'd be for it...
19:08:45 <nirik> then we don't need to send credentials around
19:09:13 <nirik> does anyone have cycles to try it? or should we punt?
19:09:15 <cverna> I ll add a ticket to the Community Fire Fighting team, that might something they can look at
19:09:37 <fm-admin> pagure.issue.assigned.added -- smooge assigned ticket fedora-infrastructure#8441 to kevin https://pagure.io/fedora-infrastructure/issue/8441
19:09:38 <fm-admin> pagure.issue.edit -- smooge edited the priority fields of ticket fedora-infrastructure#8441 https://pagure.io/fedora-infrastructure/issue/8441
19:09:38 <relrod> there's also "GitHub Actions" which can do stuff like this pretty easily and you can store the token encrypted in GitHub. We'd just want to make sure that token is way limited.
19:09:40 <nirik> cverna: ok, cool. +1
19:09:50 <smooge> oops missed you did that
19:10:23 <nirik> cverna: point back to us, happy to describe current setup or how we would like it to work.
19:10:45 * nirik is wondering if something more blog like would be better for status, but perhaps thats just me
19:11:04 <nirik> .ticket 8442
19:11:05 <zodbot> nirik: Issue #8442: aarch64 vms for OSBS cluster - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8442
19:11:16 <nirik> so, lets see...
19:11:39 <nirik> smooge: which buildvmhost-aarch64's are ready? 10 to... ?
19:12:29 <smooge> 10-17
19:13:17 <nirik> so, the one with the stg ones on it... we will need to change vlans on the interfaces again, right?
19:13:22 <smooge> I was going to give 2 nodes on 15, 2 on 16 and 2 on 17
19:13:32 <smooge> oh shoot
19:13:44 <nirik> no, lets do all on 2 of them
19:13:46 <smooge> or just have them be on the .129 anyway
19:13:57 <nirik> one for prod one for stg.
19:14:08 <nirik> those machines can easily run 3 vms...
19:14:37 <nirik> so, we still have no idea when we will have 18+ connected right?
19:14:46 <fm-admin> pagure.issue.tag.added -- mizdebsk tagged ticket fedora-infrastructure#8439: outage https://pagure.io/fedora-infrastructure/issue/8439
19:15:01 <smooge> so my idea was master01/stg-master01 on one, master02/stg-master02 on another and worker01/stg-worker01 on the 3rd. That way they would reboot safely
19:15:08 <smooge> nirik, probably not til next year
19:15:18 <smooge> I am assuming February
19:15:32 <nirik> these only have 1 master. ;)
19:15:57 <nirik> wow. really? thats anoying. no budget?
19:16:08 <smooge> so my idea was master01/stg-master01 on one, worker01/stg-worker01 on another and worker02/stg-worker02 on the 3rd. That way they would reboot safely
19:16:32 <smooge> nirik, end of year purchasing/fulfillment and other crisis
19:16:42 <nirik> well, these are not like a regular openshift... it should be safe to reboot them as long as they are not actively building
19:16:45 <nirik> cverna: ^ right?
19:16:55 <cverna> yes
19:16:56 <smooge> anything we put in December usually only comes in on February
19:17:11 <smooge> ok in that case.. I 3 on 16 and 3 on 17
19:17:14 <nirik> so, I would vastly prefer to split stg and prod so we can reboot the stg one with all the rest of the staging buildsys
19:17:21 <cverna> it is really just using the build capability of openshift
19:17:28 <nirik> and also they are vastly overpowered for just 3 vm's.
19:17:34 <smooge> cverna, what networks does it need to be on
19:17:47 <nirik> prod I assume 10.5.125.
19:18:00 <smooge> nirik, put the staging builder on it also?
19:18:35 <cverna> they need to be able to communicate with the x86_64 cluster
19:18:43 <cverna> I have added some details in the ticket
19:18:50 <nirik> so wait...
19:19:03 <nirik> we have already the 2 masters (from when we tried this before)
19:19:08 <nirik> they are both on 129
19:19:19 <nirik> which might have been due to moonshot doom
19:19:24 <smooge> it was
19:19:36 <smooge> trying to get those on different vlans caused other nodes to go off
19:19:50 <cverna> yes these master could be reused if needed (for builders ?)
19:19:50 <nirik> I think prod can be on 129 (that's what the other arm builders are on)
19:19:54 <smooge> the firewall though I believe is set up
19:20:22 <nirik> I guess we could leave the stg one on 129 to avoid dealing with moving vlans
19:21:01 <nirik> I wonder... could we put them all on one node?
19:21:38 <smooge> we could for the time being until we get more of these boxes enabled
19:22:03 <nirik> yeah, and we see how many we have left after all the people take them for things. ;)
19:22:07 <nirik> so, lets use 17
19:22:18 <nirik> stick em all on there.
19:22:58 <nirik> so for firewall:
19:23:02 <nirik> "Both clusters x86_64 and aarch64 are talking to each others using REST API on the port 8443, this port needs to be open"
19:23:10 <nirik> is that the masters?
19:23:13 <nirik> or ?
19:23:20 <cverna> masters and nodes
19:23:26 <nirik> "open" isn't something we can do
19:23:49 <smooge> what does open mean
19:23:52 <nirik> we can open specific source -> dest on specific ports.
19:24:35 <nirik> hum, what net are the x86_64 ones on
19:25:03 <nirik> so, they are on 125.
19:25:26 <smooge> nirik, I can get 17 br1 put on the .125
19:25:29 <nirik> so we either a) get RHIT to adjust firewalls from 129 -> 125 nets, or b) just change vlans to put the new ones on 125.
19:25:30 <cverna> so my understanding is that the x86_64 master needs to be able to communicate with the aarch64 master and nodes
19:25:35 <smooge> that can be done by Friday I expect
19:25:52 <cverna> and the aarch64 master and nodes needs to communicate with the x86_64 master
19:25:54 <smooge> br0 put on the .125
19:26:02 <nirik> stg ones are .128
19:26:13 <nirik> so put br1 on 128?
19:26:33 <smooge> doesn't this need to talk to storage to write its stuff?
19:26:37 <cverna> there is also the port 22 for ansible to be able to deploy the cluster from osbs-control01
19:26:58 <cverna> smooge: no it goes to the oci registry
19:26:59 <nirik> I think it uploads to the koji hub?
19:27:04 <nirik> or registery... ok
19:27:26 <cverna> it only uploads logs to the koji-hub + meta data about the build
19:27:53 <cverna> but that's goes through the x86_64 master
19:28:16 <smooge> so I think 8443 is allowed between the 128 and 125 so we should be ok with it just on the 125
19:28:28 <smooge> we are coming up to the half-hour on this meeting
19:28:38 <nirik> yeah, so move br0 to 125 and br1 to 128 and we put them all on there for now. Also, perhaps we change buildvmhost-aarch64-17 to buildvmhost-aarch64-osbs-01 ? I guess that doesn't matter too much
19:28:55 <smooge> I will put in the work order and make the dns changes
19:29:22 <smooge> I think the rename makes a lot of sense because we will know where to look for broken stuff easier
19:29:25 <nirik> if the vm's are on 125, no rhit firewall between x86_64 and aarch64. ;)
19:29:36 <nirik> and likewise the stg one
19:29:45 <smooge> yep
19:29:47 <relrod> +1 here for renaming, too
19:29:56 <nirik> so, smooge: you want to do this ticket? can you update it with all this?
19:30:01 <smooge> I will take the ticket
19:30:06 <smooge> I did the previous osbs stuff
19:30:10 <nirik> and quickly...
19:30:12 <smooge> so I have the best take on it
19:30:12 <nirik> .ticket 8444
19:30:13 <zodbot> nirik: Issue #8444: Decomission ci-cc-rdu01 - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8444
19:30:23 <nirik> thats waiting on external (next monday)
19:30:43 <nirik> thats all the needs reviews. ;) so lets stop here.
19:32:03 <cverna> thanks all :)
19:32:24 <fm-admin> pagure.issue.assigned.added -- smooge assigned ticket fedora-infrastructure#8442 to smooge https://pagure.io/fedora-infrastructure/issue/8442
19:32:25 <fm-admin> pagure.issue.edit -- smooge edited the priority fields of ticket fedora-infrastructure#8442 https://pagure.io/fedora-infrastructure/issue/8442
19:32:26 <fm-admin> pagure.issue.comment.added -- smooge commented on ticket fedora-infrastructure#8442: "aarch64 vms for OSBS cluster" https://pagure.io/fedora-infrastructure/issue/8442#comment-614866
19:32:31 <smooge> #endmeeting