16:30:37 <lucab> #startmeeting fedora_coreos_meeting
16:30:37 <zodbot> Meeting started Wed Mar  3 16:30:37 2021 UTC.
16:30:37 <zodbot> This meeting is logged and archived in a public location.
16:30:37 <zodbot> The chair is lucab. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:30:37 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
16:30:37 <zodbot> The meeting name has been set to 'fedora_coreos_meeting'
16:30:59 <lucab> #chair jlebon
16:30:59 <zodbot> Current chairs: jlebon lucab
16:31:00 <bgilbert> .hello2
16:31:01 <zodbot> bgilbert: bgilbert 'Benjamin Gilbert' <bgilbert@backtick.net>
16:31:23 <travier> .hello siosm
16:31:24 <zodbot> travier: siosm 'Timothรฉe Ravier' <travier@redhat.com>
16:31:51 <jbrooks> .hello jasonbrooks
16:31:52 <zodbot> jbrooks: jasonbrooks 'Jason Brooks' <jbrooks@redhat.com>
16:32:07 <lucab> #chair bgilbert travier jbrooks
16:32:07 <zodbot> Current chairs: bgilbert jbrooks jlebon lucab travier
16:32:27 <lucab> #topic roll call
16:32:30 <walters> .hello2
16:32:31 <zodbot> walters: walters 'Colin Walters' <walters@redhat.com>
16:32:56 <dustymabe> .hello2
16:32:58 <PanGoat> .hello jaimelm
16:33:00 <zodbot> dustymabe: dustymabe 'Dusty Mabe' <dusty@dustymabe.com>
16:33:03 <zodbot> PanGoat: jaimelm 'Jaime Magiera' <jaimelm@umich.edu>
16:33:13 <lucab> #chair walters dustymabe PanGoat
16:33:13 <zodbot> Current chairs: PanGoat bgilbert dustymabe jbrooks jlebon lucab travier walters
16:33:39 <lucab> plenty of folks today, nice! I think we can start already
16:33:54 <lucab> #topic Action items from last meeting
16:34:09 <lucab> - bgilbert to investigate FCCT check for too-small rootfs
16:34:13 <bgilbert> #action bgilbert to investigate FCCT check for too-small rootfs
16:34:14 <bgilbert> :-(
16:34:37 <lucab> - jlebon and travier will take an action to submit a proposal for one of the two rpm-ostree ones (TBD)
16:35:07 <travier> This is done. We submitted https://pagure.io/mentored-projects/issue/99
16:35:09 <lucab> context was: Outreachy 2021 proposals
16:35:19 <lucab> #link https://pagure.io/mentored-projects/issue/99
16:35:35 <dustymabe> Thanks to everyone who contributed there!
16:36:06 <lucab> ok great, let's jump to the topics
16:36:28 <lucab> travier: I see https://github.com/coreos/fedora-coreos-tracker/issues/738, is it for today or was it covered already?
16:36:51 <travier> Not covered but maybe we should start with the nm-cloud-setup one?
16:36:57 <lucab> (I was offline for the last couple of meetings)
16:37:00 <lucab> travier: ack
16:37:41 <lucab> #topic nm-cloud-setup integration
16:37:54 <lucab> #link https://github.com/coreos/fedora-coreos-tracker/issues/320
16:38:09 <lucab> this one I had it pending for some time
16:38:36 <lucab> it was initially a spike from my side to see whether we had all pieces in place to properly integrate this
16:39:00 <lucab> the answer is yes and the summary is in the last comment https://github.com/coreos/fedora-coreos-tracker/issues/320#issuecomment-788074991
16:39:28 <bgilbert> lucab: that comment doesn't completely spell out what nm-cloud-setup will do for us.  could you summarize?
16:39:41 <lucab> I sent a PR at the time https://github.com/coreos/fedora-coreos-config/pull/760 which got stale in the meanwhile but I can refresh
16:40:11 <lucab> bgilbert: upstream gently covered all the features in https://networkmanager.pages.freedesktop.org/NetworkManager/NetworkManager/nm-cloud-setup.html
16:40:52 <travier> This one was also linked in https://github.com/openshift/os/pull/508
16:41:05 <lucab> bgilbert: my tldr is, once the basic DHCP network is up on a cloud node, it can take care of more advanced stuff or other details that can change dynamically
16:41:38 <dustymabe> thanks for digging in to this lucab
16:41:57 <lucab> travier: I don't think nm-cloud-setup takes care of that, but let me dig because there is a dedicated ticket on NM for that
16:42:24 <bgilbert> okay... so it needs initial DHCP networking and it only claims to support AWS/GCP/Azure.
16:42:39 <bgilbert> which makes it sound like it's not solving a problem that we really have?
16:42:59 <bgilbert> i.e. we'd still need to go the Afterburn route if we wanted to support DO, and we'd need to get code into NM to support Packet
16:43:48 <lucab> bgilbert: correct, it won't solve the "non-dhcp network in initramfs" problem
16:43:49 <jlebon> does it have to fix a problem though?  isn't this essentially just improving platform enablement?
16:44:50 <travier> but is it really improving on what we have and will it conflict / create issues for existing setups?
16:45:10 <travier> Not taking a stance here, really asking as I don't know the details
16:45:18 <PanGoat> Do you have written anywhere a policy/practice on accepting/implementing such changes?
16:45:36 <PanGoat> If it does A,B,C yes, if not, no.
16:45:58 <PanGoat> In other words, is it written anywhere it has to fix a problem?
16:46:05 <lucab> travier: indeed, https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/407
16:48:14 <lucab> from what I got, the idea is that the NM team offers this to take care of dynamic quirks on cloud platforms
16:48:18 <dustymabe> my general feel is that nm-cloud-setup will help us take care of some fringe features of various clouds
16:48:51 <walters> One concern I have about both is the need to actually test this stuff across multiple clouds reliably which we aren't doing well at today, particularly in the upstream component git repos
16:49:07 <lucab> then we can decide whether it's worth shipping / configuring / enabling on FCOS
16:49:09 <travier> walters: +1
16:49:27 <dustymabe> could be a good candidate for the `next` stream?
16:49:32 <dustymabe> ship/enable there first
16:49:40 <travier> Not against shipping/enabling. But we need to test it
16:49:50 <walters> I hope at some point to use OpenShift Prow CI for some of this because we maintain credentials and resource management for a lot of clouds there - but NM is on GL which makes this harder
16:50:33 <jlebon> if we're agreed that the functionality itself is beneficial to have, then we can start with just shipping it but disabled, so that it's easier to test in clouds
16:50:42 <PanGoat> ^
16:50:56 <lorbus> wrt https://github.com/openshift/os/pull/508 - we ship the gcp-routes script for bootstrap machines in okd-machine-os currently, maybe we could try to add nm-cloud-setup to FCOS, i.e try getting it to work in OKD first
16:51:24 <lorbus> then we can try to delete that script first from okd-machine-os, then from RHCOS
16:51:27 <walters> yeah, that would be a useful path
16:51:31 <PanGoat> nice
16:51:33 <lucab> what do we define as tested? that it does what it says in the docs? it doesn't assure that it doesn't break some other assumption up in the stack
16:51:41 <travier> lorbus: +1
16:52:11 <bgilbert> to be clear, I'd love to have this sort of functionality.  I'm just cautious of introducing a compatibility constraint when we don't know that it will address the things we need it for
16:52:19 <lucab> there are three concerns in the existing PR: 1) shipping the binary 2) making the service aware of the platform 3) auto-enabling it
16:52:19 <travier> lucab: from reading the issue, it appears that we would be the first consumers so the pain is fresh
16:52:31 <travier> paint*
16:52:37 <PanGoat> or pain
16:52:39 <PanGoat> depending
16:53:00 <dustymabe> :)
16:53:01 <lucab> travier: correct, and it's a chicken-egg problem as upstream does not have CI for that
16:53:01 <walters> lucab: excellent point, we can do 1 and 2 and make it easy to do 3 via a custom build or even Ignition perhaps that could be useful
16:53:13 <PanGoat> +1 ignition
16:54:15 <lucab> yes, we can decide to stop at any of the 1 - 2 - 3 steps
16:54:47 <travier> Agree for 1 & 2 which should help us decide if we want to move forward.
16:54:50 <lorbus> so first step would be to add and disable it, right?
16:54:50 <lorbus> we can then easily enable and fiddle with in OKD-machine-os
16:54:57 <dustymabe> 1 and 2 - ๐Ÿ‘
16:55:10 <travier> bgilbert: concerns with doing 1 & 2?
16:55:18 <lucab> and leaving 3 to just an Ignition fragment provided by the user should be feasible
16:55:46 <lorbus> ๐Ÿ‘๏ธ
16:56:15 <bgilbert> travier: just the usual ones re compat constraints.  but we have established precedent for doing this, so sgtm
16:56:46 <travier> I guess we could mask it to mark it as not ready / in progress?
16:56:52 <lucab> backtracking a bit, the only problem we directly hit that it may fix is https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/407
16:57:50 <lucab> travier: it shouldn't be enabled by default by the package IIRC
16:58:54 <lucab> bgilbert: is compat concern just for having the binary/units around, or only if we auto-enable it?
16:59:18 <bgilbert> the concern is that someone will enable it themselves and then complain if we want to remove it again
16:59:45 <bgilbert> it's a pretty pro-forma objection at this point; as I said, there's precedent for this approach.  just wanted it on the table.
16:59:49 <dustymabe> i.e. if we ship it we need to continue shipping it
16:59:52 <bgilbert> right
17:00:57 <lucab> I don't have an answer to that
17:01:09 <bgilbert> yup, and I'm not blocking on it
17:02:24 <lucab> ok, so it sounds like I should drop the auto-enabling part for the PR and add test for the platform we already have
17:02:35 <lucab> i.e. aws and gcp
17:03:13 <lorbus> +1
17:03:20 <bgilbert> wfm
17:03:23 <dustymabe> +1
17:03:26 <PanGoat> +1
17:03:27 <lucab> (with the test directly enabling the unit via Ignition)
17:03:33 <PanGoat> ^^
17:03:36 <jlebon> totally lost connection for a while there
17:03:38 <jlebon> what's the proposal?
17:03:45 <PanGoat> no proposal yet
17:03:58 <PanGoat> clarifying
17:03:59 <dustymabe> #action jlebon fix all bugs and write all new features
17:04:04 <dustymabe> :)
17:04:06 <dustymabe> #undo
17:04:06 <zodbot> Removing item from minutes: ACTION by dustymabe at 17:03:59 : jlebon fix all bugs and write all new features
17:04:06 <PanGoat> ha
17:04:10 <jlebon> heh
17:04:13 <travier> :D
17:04:18 <jlebon> dustymabe: i'll remember this for when you come back :)
17:04:25 <lorbus> lmao
17:04:25 <bgilbert> dustymabe: that's usually implicit, no?  ;-)
17:04:27 <dustymabe> ๐Ÿคฃ
17:04:35 <PanGoat> So... proposal time?
17:04:41 <PanGoat> sounds like there is clarity
17:04:45 <lucab> #proposal ship nm-cloud-setup package, forward the platform env flags, do not automatically enable the unit, add kola tests for AWS and GCP (unit enabled via Ignition)
17:04:57 <lorbus> yay
17:04:59 <travier> agree
17:05:12 <PanGoat> ack
17:05:26 <walters> :shipit:
17:05:27 <dustymabe> ๐Ÿ‘
17:05:28 <jlebon> ack
17:06:12 <bgilbert> ack
17:06:21 <lucab> #action lucab to refresh the existing nm-cloud-setup PR, dropping the auto-enable part
17:06:44 <lucab> #action lucab to track the nm-cloud-setup kola testing in a ticket and followup on that
17:06:48 <lucab> ack, thanks all
17:07:44 <lucab> jlebon: is https://github.com/coreos/fedora-coreos-tracker/issues/17 for today
17:07:45 <lucab> ?
17:08:10 <lucab> moving either to that or to travier's tiers
17:08:25 <jlebon> lucab: meh, i kinda lost context on this honestly, so we can drop it for now
17:08:34 <lucab> ack
17:09:01 <lucab> #topic Create support tiers for platforms supported by Fedora CoreOS
17:09:10 <lucab> #link https://github.com/coreos/fedora-coreos-tracker/issues/738
17:09:25 <lucab> travier: here you go
17:09:32 <travier> So description is in the issue but the general idea is that support/complexity increases with the number of platforms
17:09:59 <travier> This suggest categorizing our platforms into tiers like Rust does to convey support status
17:10:49 <lucab> #link https://doc.rust-lang.org/nightly/rustc/platform-support.html
17:11:13 <travier> The suggested tiers description are in the ticket
17:11:27 <travier> along with current platforms repartition
17:11:59 <travier> This is merely a docs / public facing support promise
17:12:10 <travier> well linked with CI status
17:12:37 <dustymabe> suggestion.. in our public facing communications let's try not to use the words "support" and "guarantee"
17:12:45 <bgilbert> dustymabe: +1
17:12:46 <dustymabe> I do like the word "confidence"
17:12:57 <dustymabe> which i see you've used in there
17:13:49 <PanGoat> Hmmmm... this could hinder adoption. For example, if someone is looking to deploy OKD and sees that FCOS on vSphere is "Tier 2" and not directly tested, they may not feel confidence themselves in going the FCOS/OKD route.
17:14:07 <jlebon> i definitely like the idea!
17:14:25 <jlebon> PanGoat: they'd be making better informed decisions
17:14:29 <dustymabe> PanGoat: better for them to have the information up front and make the decision
17:14:42 <PanGoat> fair enough
17:14:59 <dustymabe> it might also be an opportunity for someone to contribute so that we could test on vshpere for every CI run
17:15:03 <PanGoat> ^^
17:15:23 <PanGoat> right, that was my next thought. Leveraging the thing we talked about a few meetings ago, which I haven't worked on yet.
17:15:36 <lucab> also OKD has different set of tested platforms, and we tests different things that may not be relevant to OKD
17:15:39 <PanGoat> I did put in an issue, but haven't built out the outreach part yet.
17:15:52 <travier> It merely formalizes the status quo which is hidden in the CI config
17:16:06 <dustymabe> travier: for tier 3 - what does that mean "We could then decide that Tier 3 platform artifacts are only built once" ?
17:16:45 <travier> The idea is to build them only at the end after testing on other platforms to save time
17:16:55 <travier> for the main CI /release jobs
17:17:09 <dustymabe> ahh ok. that makes sense
17:17:17 <PanGoat> ahhh, "It worked over there, build it once to include here"
17:17:22 <dustymabe> i thought you were saying they are only built once every 3 months or something
17:17:23 <lucab> I think jlebon also wanted to stop building some artifacts on non-release jobs
17:18:16 <jlebon> yeah, in that model tier 1 support would be the barometer to use
17:18:19 <lucab> https://github.com/coreos/fedora-coreos-tracker/issues/719#issuecomment-763894721
17:18:21 <dustymabe> Honestly we could probably drop some of the Tier2 ones you have listed in to tier 3
17:18:37 <jlebon> i.e. rawhide only builds for tier 1 platforms
17:19:20 <PanGoat> and define a threshold of going from 2->1. For example, I can test on vSphere all day, but I'm one data point.
17:19:22 <lucab> there is even a lower tier of platforms we know about, have some bits in place in Ignition/Afterburn/etc, but we don't even build
17:19:50 <dustymabe> it seems like this issue has a lot of components
17:19:59 <dustymabe> 1) is it a good idea to break things into tiers?
17:20:09 <dustymabe> 2) how do we present that information to the community
17:20:26 <dustymabe> 3) how do we define the tiers and more strategically test/use resources
17:20:50 <lucab> (e.g. azurestack, ibmcloud-classic, cloudstack, and some more)
17:20:53 <PanGoat> nice breakdown. accurate.
17:20:55 <dustymabe> so we could start with #1 and then dive into the other ones ?
17:21:46 <jlebon> SGTM
17:22:09 <travier> +1
17:22:13 <dustymabe> any opposed to the idea of having tiers?
17:22:18 <PanGoat> ack
17:22:24 <lucab> is anybody against the tiers idea, or maybe have a different approach in mind?
17:23:04 * dustymabe notes the "tiers" idea could be useful when we introduce other architectures too
17:23:25 <lorbus> I'd really like for all OKD supported platforms to be in Tier 1..
17:23:26 <dustymabe> +1 for tiers, I like the idea
17:23:40 <travier> lorbus: what are those platforms?
17:23:43 <PanGoat> lorbus, me too :)
17:24:10 <lorbus> AWS, GCP, OpenStack, BareMetal, VMware, Azure, IBM
17:24:12 <PanGoat> like... vsphere
17:24:33 <travier> Tier 2 is not "bad", it's just an honest assessment of our current test coverage
17:24:34 <lorbus> QMEU for ovirt, too
17:25:10 <lucab> for reference, Debian has qualification criterias for architectures, but I like tiers more has they offer a more precise nuance https://ftp-master.debian.org/archive-criteria.html
17:25:26 <lorbus> I have no strong stance against this if the goal is to move those platforms up to Tier 1 eventually :)
17:25:38 <PanGoat> +1 lorbus
17:26:05 * dustymabe notes Fedora has Primary and Secondary architectures and also different classes for deliverables
17:26:41 <travier> And there is no process to move from one tier to the other other than actually having test coverage
17:26:59 <lucab> I think OKD may actually benefit from a similar thing. That is, is OKD on IBM actually CI tested?
17:27:10 <PanGoat> I envision the testing outreach that I mentiond previously actively gathering testers for the purpose of moving to Tier I
17:27:14 <travier> so platforms moving to another tier is entirely dependent on how much support we can get in CI
17:27:21 <lorbus> OKD on IBM doesn't exist at all, yet :)
17:27:51 <lorbus> i.e. there are no container builds for it, nor for any platform other than x86
17:28:02 <lorbus> but that'll change soon :)
17:28:04 <PanGoat> "XXX is currently Tier II tested. Please help us by..."
17:28:34 <travier> So maybe those are not support tiers but test tiers
17:28:38 <PanGoat> ^^
17:28:43 <PanGoat> I was just going to write that
17:28:47 <travier> :)
17:28:57 <PanGoat> It would be much better framing
17:29:15 <lorbus> yeh that makes sense. Support is something that is usually paid for, too..
17:29:38 <PanGoat> I think most of us agreed "supported" is not the best wording.
17:29:48 <PanGoat> right
17:30:10 * PanGoat puts out a hat on the sidewalk
17:30:11 <lucab> travier: "confidence tiers" is what dustymabe suggested, I think
17:30:14 <travier> +1 for renaming to test tiers
17:30:33 <travier> confidence tiers is good too indeed
17:30:55 * dustymabe has to step away - sounds like we're going in the right direction :)
17:31:01 <dustymabe> ๐Ÿ‘‹
17:31:07 <PanGoat> Using "testing" would help dovetail into the ultimate goal: getting to Tier I
17:31:19 <lucab> we are also almost out of time
17:31:25 <travier> Let's move this to next week? We don't need to decide right now
17:31:37 <PanGoat> Yes. Great discussion.
17:32:05 <lucab> yep, I think we generally agreed that we like the tiers idea, just need to iron out the details
17:32:30 <lorbus> +1
17:32:47 <lucab> travier: are you going to note a summary in the ticket or should I?
17:32:55 <travier> lucab: doing right now
17:33:15 <lucab> ok
17:33:48 <lucab> #action travier to note that we generally like the tiers idea, and that we'll figure out the details in a followup
17:33:56 <lucab> that's all for today
17:34:10 <lucab> #topic Open Floor
17:34:39 * PanGoat breakdances
17:34:41 <lucab> (although we are already over time, so unless there is anything important I'm going to close in a bit)
17:34:47 <jbrooks> The coreos twitter handle poll competed: https://twitter.com/coreos/status/1364006780796170242
17:35:04 <travier> +1
17:35:07 <jbrooks> 265 votes, 86% pro 14% con
17:35:08 <bgilbert> jbrooks: +1
17:35:19 <PanGoat> awesome
17:35:22 <bgilbert> thanks for handling that
17:35:24 <jbrooks> It ran for 7 days
17:35:27 <jlebon> jbrooks++
17:35:36 <lorbus> just wanted to say it was awesome to see Steve and travier in yesterday's OKD WG meeting! thank you for showing up :)
17:35:37 <travier> So I guess we can share FCOS news there without shame :)
17:35:45 <lorbus> ^ awesome news!
17:35:49 <travier> lorbus: thanks :)
17:36:04 <jbrooks> Maybe we can make it an item to discuss next time?
17:36:12 <lucab> jbrooks: let's note that in ticket and proceed with the logistics?
17:36:23 <jbrooks> lucab, cool
17:36:32 <jbrooks> I'll find the ticket and update
17:36:34 <lucab> #link https://twitter.com/coreos/status/1364006780796170242
17:36:46 <lucab> jbrooks: thanks, I don't have it at hand
17:37:13 <PanGoat> jbrooks: you had to get the credentials, right? Are those stored safely where others can get it them without jumping through the same hoops you did? That's something not to be lost.
17:37:17 <lucab> #action jbrooks to note the result of the twitter poll, and followup for logistic steps
17:37:33 <lucab> I'm closing here, let's jump back to our channel
17:37:42 <travier> thanks lucab!
17:37:51 <PanGoat> thank you lucab
17:37:52 <lucab> #endmeeting