16:30:37 #startmeeting fedora_coreos_meeting 16:30:37 Meeting started Wed Mar 3 16:30:37 2021 UTC. 16:30:37 This meeting is logged and archived in a public location. 16:30:37 The chair is lucab. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:30:37 Useful Commands: #action #agreed #halp #info #idea #link #topic. 16:30:37 The meeting name has been set to 'fedora_coreos_meeting' 16:30:59 #chair jlebon 16:30:59 Current chairs: jlebon lucab 16:31:00 .hello2 16:31:01 bgilbert: bgilbert 'Benjamin Gilbert' 16:31:23 .hello siosm 16:31:24 travier: siosm 'Timothรฉe Ravier' 16:31:51 .hello jasonbrooks 16:31:52 jbrooks: jasonbrooks 'Jason Brooks' 16:32:07 #chair bgilbert travier jbrooks 16:32:07 Current chairs: bgilbert jbrooks jlebon lucab travier 16:32:27 #topic roll call 16:32:30 .hello2 16:32:31 walters: walters 'Colin Walters' 16:32:56 .hello2 16:32:58 .hello jaimelm 16:33:00 dustymabe: dustymabe 'Dusty Mabe' 16:33:03 PanGoat: jaimelm 'Jaime Magiera' 16:33:13 #chair walters dustymabe PanGoat 16:33:13 Current chairs: PanGoat bgilbert dustymabe jbrooks jlebon lucab travier walters 16:33:39 plenty of folks today, nice! I think we can start already 16:33:54 #topic Action items from last meeting 16:34:09 - bgilbert to investigate FCCT check for too-small rootfs 16:34:13 #action bgilbert to investigate FCCT check for too-small rootfs 16:34:14 :-( 16:34:37 - jlebon and travier will take an action to submit a proposal for one of the two rpm-ostree ones (TBD) 16:35:07 This is done. We submitted https://pagure.io/mentored-projects/issue/99 16:35:09 context was: Outreachy 2021 proposals 16:35:19 #link https://pagure.io/mentored-projects/issue/99 16:35:35 Thanks to everyone who contributed there! 16:36:06 ok great, let's jump to the topics 16:36:28 travier: I see https://github.com/coreos/fedora-coreos-tracker/issues/738, is it for today or was it covered already? 16:36:51 Not covered but maybe we should start with the nm-cloud-setup one? 16:36:57 (I was offline for the last couple of meetings) 16:37:00 travier: ack 16:37:41 #topic nm-cloud-setup integration 16:37:54 #link https://github.com/coreos/fedora-coreos-tracker/issues/320 16:38:09 this one I had it pending for some time 16:38:36 it was initially a spike from my side to see whether we had all pieces in place to properly integrate this 16:39:00 the answer is yes and the summary is in the last comment https://github.com/coreos/fedora-coreos-tracker/issues/320#issuecomment-788074991 16:39:28 lucab: that comment doesn't completely spell out what nm-cloud-setup will do for us. could you summarize? 16:39:41 I sent a PR at the time https://github.com/coreos/fedora-coreos-config/pull/760 which got stale in the meanwhile but I can refresh 16:40:11 bgilbert: upstream gently covered all the features in https://networkmanager.pages.freedesktop.org/NetworkManager/NetworkManager/nm-cloud-setup.html 16:40:52 This one was also linked in https://github.com/openshift/os/pull/508 16:41:05 bgilbert: my tldr is, once the basic DHCP network is up on a cloud node, it can take care of more advanced stuff or other details that can change dynamically 16:41:38 thanks for digging in to this lucab 16:41:57 travier: I don't think nm-cloud-setup takes care of that, but let me dig because there is a dedicated ticket on NM for that 16:42:24 okay... so it needs initial DHCP networking and it only claims to support AWS/GCP/Azure. 16:42:39 which makes it sound like it's not solving a problem that we really have? 16:42:59 i.e. we'd still need to go the Afterburn route if we wanted to support DO, and we'd need to get code into NM to support Packet 16:43:48 bgilbert: correct, it won't solve the "non-dhcp network in initramfs" problem 16:43:49 does it have to fix a problem though? isn't this essentially just improving platform enablement? 16:44:50 but is it really improving on what we have and will it conflict / create issues for existing setups? 16:45:10 Not taking a stance here, really asking as I don't know the details 16:45:18 Do you have written anywhere a policy/practice on accepting/implementing such changes? 16:45:36 If it does A,B,C yes, if not, no. 16:45:58 In other words, is it written anywhere it has to fix a problem? 16:46:05 travier: indeed, https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/407 16:48:14 from what I got, the idea is that the NM team offers this to take care of dynamic quirks on cloud platforms 16:48:18 my general feel is that nm-cloud-setup will help us take care of some fringe features of various clouds 16:48:51 One concern I have about both is the need to actually test this stuff across multiple clouds reliably which we aren't doing well at today, particularly in the upstream component git repos 16:49:07 then we can decide whether it's worth shipping / configuring / enabling on FCOS 16:49:09 walters: +1 16:49:27 could be a good candidate for the `next` stream? 16:49:32 ship/enable there first 16:49:40 Not against shipping/enabling. But we need to test it 16:49:50 I hope at some point to use OpenShift Prow CI for some of this because we maintain credentials and resource management for a lot of clouds there - but NM is on GL which makes this harder 16:50:33 if we're agreed that the functionality itself is beneficial to have, then we can start with just shipping it but disabled, so that it's easier to test in clouds 16:50:42 ^ 16:50:56 wrt https://github.com/openshift/os/pull/508 - we ship the gcp-routes script for bootstrap machines in okd-machine-os currently, maybe we could try to add nm-cloud-setup to FCOS, i.e try getting it to work in OKD first 16:51:24 then we can try to delete that script first from okd-machine-os, then from RHCOS 16:51:27 yeah, that would be a useful path 16:51:31 nice 16:51:33 what do we define as tested? that it does what it says in the docs? it doesn't assure that it doesn't break some other assumption up in the stack 16:51:41 lorbus: +1 16:52:11 to be clear, I'd love to have this sort of functionality. I'm just cautious of introducing a compatibility constraint when we don't know that it will address the things we need it for 16:52:19 there are three concerns in the existing PR: 1) shipping the binary 2) making the service aware of the platform 3) auto-enabling it 16:52:19 lucab: from reading the issue, it appears that we would be the first consumers so the pain is fresh 16:52:31 paint* 16:52:37 or pain 16:52:39 depending 16:53:00 :) 16:53:01 travier: correct, and it's a chicken-egg problem as upstream does not have CI for that 16:53:01 lucab: excellent point, we can do 1 and 2 and make it easy to do 3 via a custom build or even Ignition perhaps that could be useful 16:53:13 +1 ignition 16:54:15 yes, we can decide to stop at any of the 1 - 2 - 3 steps 16:54:47 Agree for 1 & 2 which should help us decide if we want to move forward. 16:54:50 so first step would be to add and disable it, right? 16:54:50 we can then easily enable and fiddle with in OKD-machine-os 16:54:57 1 and 2 - ๐Ÿ‘ 16:55:10 bgilbert: concerns with doing 1 & 2? 16:55:18 and leaving 3 to just an Ignition fragment provided by the user should be feasible 16:55:46 ๐Ÿ‘๏ธ 16:56:15 travier: just the usual ones re compat constraints. but we have established precedent for doing this, so sgtm 16:56:46 I guess we could mask it to mark it as not ready / in progress? 16:56:52 backtracking a bit, the only problem we directly hit that it may fix is https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/407 16:57:50 travier: it shouldn't be enabled by default by the package IIRC 16:58:54 bgilbert: is compat concern just for having the binary/units around, or only if we auto-enable it? 16:59:18 the concern is that someone will enable it themselves and then complain if we want to remove it again 16:59:45 it's a pretty pro-forma objection at this point; as I said, there's precedent for this approach. just wanted it on the table. 16:59:49 i.e. if we ship it we need to continue shipping it 16:59:52 right 17:00:57 I don't have an answer to that 17:01:09 yup, and I'm not blocking on it 17:02:24 ok, so it sounds like I should drop the auto-enabling part for the PR and add test for the platform we already have 17:02:35 i.e. aws and gcp 17:03:13 +1 17:03:20 wfm 17:03:23 +1 17:03:26 +1 17:03:27 (with the test directly enabling the unit via Ignition) 17:03:33 ^^ 17:03:36 totally lost connection for a while there 17:03:38 what's the proposal? 17:03:45 no proposal yet 17:03:58 clarifying 17:03:59 #action jlebon fix all bugs and write all new features 17:04:04 :) 17:04:06 #undo 17:04:06 Removing item from minutes: ACTION by dustymabe at 17:03:59 : jlebon fix all bugs and write all new features 17:04:06 ha 17:04:10 heh 17:04:13 :D 17:04:18 dustymabe: i'll remember this for when you come back :) 17:04:25 lmao 17:04:25 dustymabe: that's usually implicit, no? ;-) 17:04:27 ๐Ÿคฃ 17:04:35 So... proposal time? 17:04:41 sounds like there is clarity 17:04:45 #proposal ship nm-cloud-setup package, forward the platform env flags, do not automatically enable the unit, add kola tests for AWS and GCP (unit enabled via Ignition) 17:04:57 yay 17:04:59 agree 17:05:12 ack 17:05:26 :shipit: 17:05:27 ๐Ÿ‘ 17:05:28 ack 17:06:12 ack 17:06:21 #action lucab to refresh the existing nm-cloud-setup PR, dropping the auto-enable part 17:06:44 #action lucab to track the nm-cloud-setup kola testing in a ticket and followup on that 17:06:48 ack, thanks all 17:07:44 jlebon: is https://github.com/coreos/fedora-coreos-tracker/issues/17 for today 17:07:45 ? 17:08:10 moving either to that or to travier's tiers 17:08:25 lucab: meh, i kinda lost context on this honestly, so we can drop it for now 17:08:34 ack 17:09:01 #topic Create support tiers for platforms supported by Fedora CoreOS 17:09:10 #link https://github.com/coreos/fedora-coreos-tracker/issues/738 17:09:25 travier: here you go 17:09:32 So description is in the issue but the general idea is that support/complexity increases with the number of platforms 17:09:59 This suggest categorizing our platforms into tiers like Rust does to convey support status 17:10:49 #link https://doc.rust-lang.org/nightly/rustc/platform-support.html 17:11:13 The suggested tiers description are in the ticket 17:11:27 along with current platforms repartition 17:11:59 This is merely a docs / public facing support promise 17:12:10 well linked with CI status 17:12:37 suggestion.. in our public facing communications let's try not to use the words "support" and "guarantee" 17:12:45 dustymabe: +1 17:12:46 I do like the word "confidence" 17:12:57 which i see you've used in there 17:13:49 Hmmmm... this could hinder adoption. For example, if someone is looking to deploy OKD and sees that FCOS on vSphere is "Tier 2" and not directly tested, they may not feel confidence themselves in going the FCOS/OKD route. 17:14:07 i definitely like the idea! 17:14:25 PanGoat: they'd be making better informed decisions 17:14:29 PanGoat: better for them to have the information up front and make the decision 17:14:42 fair enough 17:14:59 it might also be an opportunity for someone to contribute so that we could test on vshpere for every CI run 17:15:03 ^^ 17:15:23 right, that was my next thought. Leveraging the thing we talked about a few meetings ago, which I haven't worked on yet. 17:15:36 also OKD has different set of tested platforms, and we tests different things that may not be relevant to OKD 17:15:39 I did put in an issue, but haven't built out the outreach part yet. 17:15:52 It merely formalizes the status quo which is hidden in the CI config 17:16:06 travier: for tier 3 - what does that mean "We could then decide that Tier 3 platform artifacts are only built once" ? 17:16:45 The idea is to build them only at the end after testing on other platforms to save time 17:16:55 for the main CI /release jobs 17:17:09 ahh ok. that makes sense 17:17:17 ahhh, "It worked over there, build it once to include here" 17:17:22 i thought you were saying they are only built once every 3 months or something 17:17:23 I think jlebon also wanted to stop building some artifacts on non-release jobs 17:18:16 yeah, in that model tier 1 support would be the barometer to use 17:18:19 https://github.com/coreos/fedora-coreos-tracker/issues/719#issuecomment-763894721 17:18:21 Honestly we could probably drop some of the Tier2 ones you have listed in to tier 3 17:18:37 i.e. rawhide only builds for tier 1 platforms 17:19:20 and define a threshold of going from 2->1. For example, I can test on vSphere all day, but I'm one data point. 17:19:22 there is even a lower tier of platforms we know about, have some bits in place in Ignition/Afterburn/etc, but we don't even build 17:19:50 it seems like this issue has a lot of components 17:19:59 1) is it a good idea to break things into tiers? 17:20:09 2) how do we present that information to the community 17:20:26 3) how do we define the tiers and more strategically test/use resources 17:20:50 (e.g. azurestack, ibmcloud-classic, cloudstack, and some more) 17:20:53 nice breakdown. accurate. 17:20:55 so we could start with #1 and then dive into the other ones ? 17:21:46 SGTM 17:22:09 +1 17:22:13 any opposed to the idea of having tiers? 17:22:18 ack 17:22:24 is anybody against the tiers idea, or maybe have a different approach in mind? 17:23:04 * dustymabe notes the "tiers" idea could be useful when we introduce other architectures too 17:23:25 I'd really like for all OKD supported platforms to be in Tier 1.. 17:23:26 +1 for tiers, I like the idea 17:23:40 lorbus: what are those platforms? 17:23:43 lorbus, me too :) 17:24:10 AWS, GCP, OpenStack, BareMetal, VMware, Azure, IBM 17:24:12 like... vsphere 17:24:33 Tier 2 is not "bad", it's just an honest assessment of our current test coverage 17:24:34 QMEU for ovirt, too 17:25:10 for reference, Debian has qualification criterias for architectures, but I like tiers more has they offer a more precise nuance https://ftp-master.debian.org/archive-criteria.html 17:25:26 I have no strong stance against this if the goal is to move those platforms up to Tier 1 eventually :) 17:25:38 +1 lorbus 17:26:05 * dustymabe notes Fedora has Primary and Secondary architectures and also different classes for deliverables 17:26:41 And there is no process to move from one tier to the other other than actually having test coverage 17:26:59 I think OKD may actually benefit from a similar thing. That is, is OKD on IBM actually CI tested? 17:27:10 I envision the testing outreach that I mentiond previously actively gathering testers for the purpose of moving to Tier I 17:27:14 so platforms moving to another tier is entirely dependent on how much support we can get in CI 17:27:21 OKD on IBM doesn't exist at all, yet :) 17:27:51 i.e. there are no container builds for it, nor for any platform other than x86 17:28:02 but that'll change soon :) 17:28:04 "XXX is currently Tier II tested. Please help us by..." 17:28:34 So maybe those are not support tiers but test tiers 17:28:38 ^^ 17:28:43 I was just going to write that 17:28:47 :) 17:28:57 It would be much better framing 17:29:15 yeh that makes sense. Support is something that is usually paid for, too.. 17:29:38 I think most of us agreed "supported" is not the best wording. 17:29:48 right 17:30:10 * PanGoat puts out a hat on the sidewalk 17:30:11 travier: "confidence tiers" is what dustymabe suggested, I think 17:30:14 +1 for renaming to test tiers 17:30:33 confidence tiers is good too indeed 17:30:55 * dustymabe has to step away - sounds like we're going in the right direction :) 17:31:01 ๐Ÿ‘‹ 17:31:07 Using "testing" would help dovetail into the ultimate goal: getting to Tier I 17:31:19 we are also almost out of time 17:31:25 Let's move this to next week? We don't need to decide right now 17:31:37 Yes. Great discussion. 17:32:05 yep, I think we generally agreed that we like the tiers idea, just need to iron out the details 17:32:30 +1 17:32:47 travier: are you going to note a summary in the ticket or should I? 17:32:55 lucab: doing right now 17:33:15 ok 17:33:48 #action travier to note that we generally like the tiers idea, and that we'll figure out the details in a followup 17:33:56 that's all for today 17:34:10 #topic Open Floor 17:34:39 * PanGoat breakdances 17:34:41 (although we are already over time, so unless there is anything important I'm going to close in a bit) 17:34:47 The coreos twitter handle poll competed: https://twitter.com/coreos/status/1364006780796170242 17:35:04 +1 17:35:07 265 votes, 86% pro 14% con 17:35:08 jbrooks: +1 17:35:19 awesome 17:35:22 thanks for handling that 17:35:24 It ran for 7 days 17:35:27 jbrooks++ 17:35:36 just wanted to say it was awesome to see Steve and travier in yesterday's OKD WG meeting! thank you for showing up :) 17:35:37 So I guess we can share FCOS news there without shame :) 17:35:45 ^ awesome news! 17:35:49 lorbus: thanks :) 17:36:04 Maybe we can make it an item to discuss next time? 17:36:12 jbrooks: let's note that in ticket and proceed with the logistics? 17:36:23 lucab, cool 17:36:32 I'll find the ticket and update 17:36:34 #link https://twitter.com/coreos/status/1364006780796170242 17:36:46 jbrooks: thanks, I don't have it at hand 17:37:13 jbrooks: you had to get the credentials, right? Are those stored safely where others can get it them without jumping through the same hoops you did? That's something not to be lost. 17:37:17 #action jbrooks to note the result of the twitter poll, and followup for logistic steps 17:37:33 I'm closing here, let's jump back to our channel 17:37:42 thanks lucab! 17:37:51 thank you lucab 17:37:52 #endmeeting