16:30:22 <jlebon> #startmeeting fedora_coreos_meeting
16:30:22 <zodbot> Meeting started Wed Aug 25 16:30:22 2021 UTC.
16:30:22 <zodbot> This meeting is logged and archived in a public location.
16:30:22 <zodbot> The chair is jlebon. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:30:22 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
16:30:22 <zodbot> The meeting name has been set to 'fedora_coreos_meeting'
16:30:39 <lucab> .hi
16:30:40 <zodbot> lucab: lucab 'Luca BRUNO' <lucab@redhat.com>
16:30:45 <dustymabe> .hi
16:30:46 <zodbot> dustymabe: dustymabe 'Dusty Mabe' <dusty@dustymabe.com>
16:30:52 <skunkerk> .hello sohank2602
16:30:53 <zodbot> skunkerk: sohank2602 'Sohan Kunkerkar' <skunkerk@redhat.com>
16:31:10 <jlebon> #chair lucab dustymabe skunkerk
16:31:10 <zodbot> Current chairs: dustymabe jlebon lucab skunkerk
16:31:28 <vandercool> .hi
16:31:29 <zodbot> vandercool: Sorry, but user 'vandercool' does not exist
16:31:36 <vandercool> oof
16:31:43 <jeffnowicki> .hi
16:31:44 <zodbot> jeffnowicki: Sorry, but user 'jeffnowicki' does not exist
16:31:59 <copperi> .hi
16:32:00 <zodbot> copperi: copperi 'Jan Kuparinen' <copper_fin@hotmail.com>
16:32:27 <jlebon> #chair vandercool jeffnowicki copperi
16:32:27 <zodbot> Current chairs: copperi dustymabe jeffnowicki jlebon lucab skunkerk vandercool
16:33:28 <jlebon> let's wait one more minute :)
16:33:35 <travier> .hello siosm
16:33:36 <zodbot> travier: siosm 'Timothée Ravier' <travier@redhat.com>
16:34:06 <jbrooks> .hello jasonbrooks
16:34:07 <zodbot> jbrooks: jasonbrooks 'Jason Brooks' <jbrooks@redhat.com>
16:34:17 <jlebon> #chair travier jbrooks
16:34:17 <zodbot> Current chairs: copperi dustymabe jbrooks jeffnowicki jlebon lucab skunkerk travier vandercool
16:34:23 <bgilbert> .hi
16:34:24 <zodbot> bgilbert: bgilbert 'Benjamin Gilbert' <bgilbert@backtick.net>
16:34:26 <jlebon> #chair bgilbert
16:34:26 <zodbot> Current chairs: bgilbert copperi dustymabe jbrooks jeffnowicki jlebon lucab skunkerk travier vandercool
16:34:45 <jlebon> alrighty, we can start I think
16:34:48 <jlebon> #topic Action items from last meeting
16:34:53 <jlebon> * dustymabe will open a ticket and look for volunteers to investigate "Third-party Software Mechanism"
16:34:56 <jlebon> * dustymabe will open a ticket and look for volunteers to investigate "Remove authselect-compat package"
16:34:59 <jlebon> * bgilbert to add empty partitions, update Butane, re-enable tests
16:35:16 <jlebon> #info dustymabe opened https://github.com/coreos/fedora-coreos-tracker/issues/932
16:35:22 <jlebon> #info dustymabe opened https://github.com/coreos/fedora-coreos-tracker/issues/933
16:35:37 <bgilbert> haven't had a chance to work on it yet, but it's tracked in a ticket, so I won't re-action
16:35:57 <jlebon> bgilbert: +1
16:36:04 <jlebon> are those specific steps detailed there?
16:36:23 <jlebon> ahh yup
16:36:37 <jlebon> ok cool let's move on then
16:36:41 <jaimelm> .hello2
16:36:42 <zodbot> jaimelm: jaimelm 'Jaime Magiera' <jaimelm@umich.edu>
16:36:59 <jlebon> copperi, jeffnowicki, vandercool: were you here to discuss https://github.com/coreos/fedora-coreos-tracker/issues/931 ?
16:37:10 <vandercool> yes
16:37:30 <jeffnowicki> yes... good discussion/debate with this ticket opened https://github.com/openshift/openshift-docs/issues/35793
16:37:31 <jlebon> wasn't planning to bring it up in this meeting, but we can do so now if you'd like and then you're free to stick around or /part
16:37:55 <jlebon> #topic IBM Cloud image variant should be 120GB
16:37:56 <jeffnowicki> looking for a doc update to resolve
16:37:59 <jlebon> #link https://github.com/coreos/fedora-coreos-tracker/issues/931
16:38:24 <dustymabe> vandercool: jeffnowicki: thanks for coming to the community meeting to discuss this!
16:38:36 <jlebon> ok cool, so is the final outcome here that nothing actually needs to change in the OS, and it's just a docs update?
16:38:58 <jeffnowicki> the debate now will be how 'deep' to go with the doc update... scope to ibm cloud only?
16:39:07 <dustymabe> jlebon: i think that's the way we were leaning in the ticket, assuming that change does land in the docs
16:39:20 <jeffnowicki> +1
16:39:55 <jlebon> jeffnowicki: for most clouds, we ship a much smaller disk size and rely on auto-growing
16:40:07 <jlebon> seems like IBM cloud is special in that respect
16:40:12 <dustymabe> jeffnowicki: yeah the "how deep to go" in the docs question really needs the openshift team input
16:41:14 <jeffnowicki> hmm... so when did openshift bump min requirement to 120gb... and why?
16:41:35 <travier> jeffnowicki: this is best asked to the OCP team :)
16:41:51 <jeffnowicki> understood
16:42:37 <jlebon> would it make sense to keep discussing this in the docs issue for now until we have more info? we can always discuss this again in the community mtg if warranted
16:42:58 <dustymabe> jeffnowicki: am I clear in understanding that ibmcloud doesn't have the ability to dynamically specify the size of the disk when we create an instance?
16:43:07 <jeffnowicki> would be good (IMO) if 'default' image size spec would meet min requirements... size to min supported value... vs. provision and then resize
16:44:04 <jlebon> dustymabe: woah, that'd be surprising
16:45:14 <bgilbert> jeffnowicki: FCOS is not scoped to any particular application.  we try to keep images as small as practical to reduce the minimum disk space requirement for the distro
16:45:27 <davdunc> .hello2
16:45:28 <zodbot> davdunc: davdunc 'David Duncan' <davdunc@amazon.com>
16:46:03 <jlebon> #chair davdunc
16:46:03 <zodbot> Current chairs: bgilbert copperi davdunc dustymabe jbrooks jeffnowicki jlebon lucab skunkerk travier vandercool
16:46:20 <davdunc> Definitely don't want to have oversized disk images for the ec2 side of the house.
16:46:32 <jeffnowicki> i'm good with further discussion in docs issue... appreciate any feedback from this group and perhaps we get that properly inserted into issue
16:46:45 <jlebon> jeffnowicki: ack thanks!
16:47:00 <travier> jeffnowicki: We could really use an answer to @dustymabe's question
16:47:16 <travier> if you can ask to right folks, that would be great!
16:47:27 <jeffnowicki> ack will do
16:47:45 <jlebon> cool, ok let's move on then
16:47:53 <dustymabe> jeffnowicki: if you're having trouble finding the right people to ask questions feel free to grab me and I'll see if I can help
16:48:07 <jlebon> #topic tracker: Fedora 35 changes considerations
16:48:11 <jlebon> #link https://github.com/coreos/fedora-coreos-tracker/issues/856
16:48:44 <jlebon> we did the first half of this last week, now we're at the "Self-Contained Changes"
16:49:17 <jlebon> I count 9 with X TRIAGE
16:49:25 <jlebon> let's just go through these quickly
16:49:35 <jaimelm> 2.2 seems unapplicable.
16:49:41 <jaimelm> non*
16:49:51 <jlebon> "Enhanced Inscript as default Indic IM"
16:50:08 <travier> 2.5 SKIP?
16:50:09 <jlebon> yeah, can skip that. we don't ship locales
16:50:26 <jaimelm> hehe
16:50:26 <jlebon> "GHC 8.10 and Stackage lts-18" SKIP, agreed
16:50:29 <travier> 2.8 SKIP?
16:50:49 <travier> 2.13 SKIP (workstation only)
16:50:51 <jlebon> "libmemcached-awesome" skip
16:51:12 <jlebon> 2.8 I think it's worth sanity-checking that pkglayering still works fine since libvirt si a common layering target
16:51:19 <travier> 2.15 SKIP (we do not use Anaconda)
16:51:54 <jaimelm> same with 2.20
16:51:57 <dustymabe> though travier - might be worth investigating the anaconda change for SB
16:52:15 <travier> dustymabe: true. Will do
16:52:31 <jlebon> "2.9 x TRIAGE Optimal LUKS Encryption Sector Size" --> should sanity-check Ignition LUKS support works fine with the updated cryptsetup
16:52:41 <jlebon> though our CI will do that for us :)
16:53:03 <dustymabe> if it's already landed in rawhide, then it's already doing that for us
16:53:07 <jlebon> but might be worth checking that we're allocated the right header size
16:53:48 <jlebon> "2.20 x TRIAGE Sphinx 4" SKIP   we don't ship Python
16:54:02 <jlebon> 2.21 PipeWire SKIP
16:54:43 <jlebon> so it seems to me like it's just 2.8 and 2.9
16:54:57 <jlebon> agreed/disagreed?
16:55:24 <jaimelm> ack
16:55:28 <dustymabe> ack
16:55:37 <dustymabe> should we #info the decision on the others (for the record)?
16:55:45 <jaimelm> ^^
16:55:59 <davdunc> dustymabe: that sounds right.
16:56:09 <jlebon> dustymabe: there's quite a few. hmm, ok how about something like:
16:56:36 <jlebon> #info currently of all the self-contained changes, only "Libvirt Modular Daemons" and "Optimal LUKS Encryption Sector Size" deserve closer investigation
16:57:03 <jaimelm> Including the numbers has value I think
16:57:12 <dustymabe> that works.. hopefully the list doesn't get new additions
16:57:17 <jlebon> #action jlebon will open a ticket and look for volunteers to investigate "Optimal LUKS Encryption Sector Size"
16:57:18 <jaimelm> (of the non-applicable)
16:57:19 <dustymabe> jaimelm: unfortunately the numbers do change
16:57:28 <jlebon> #action jlebon will open a ticket and look for volunteers to investigate "Libvirt Modular Daemons"
16:57:31 <dustymabe> i've been using the "name"
16:57:37 <jaimelm> true
16:58:17 <jlebon> dustymabe: we'll keep the issue TRIAGE and SKIP as the source of truth -- will update the ticket after the meeting
16:58:30 <jlebon> then any new changes which need TRIAGE will be noticed
16:58:54 <dustymabe> +1
16:58:55 <jlebon> ok cool, are we ready to move on?
16:59:10 <dustymabe> yes
16:59:11 <jlebon> #topic Support cloud-specific grub fragments
16:59:14 <jlebon> #link https://github.com/coreos/fedora-coreos-tracker/issues/110
16:59:30 <jlebon> bgilbert: do you want to introduce this and where we are?
16:59:42 <bgilbert> sure
17:00:08 <bgilbert> we're actively moving forward on disabling serial console by default
17:00:40 <bgilbert> but some cloud platforms use serial console exclusively, so we need to be able to have per-platform defaults.
17:01:13 <bgilbert> some of us have been discussing the best way to achieve that
17:01:22 <bgilbert> basically there seem to be two approaches:
17:01:56 <bgilbert> 1. teach grub to understand what platform it's on, and apply conditionals (or load a grub config file fragment specific to the platform)
17:02:38 <bgilbert> 2. template the grub.cfg and kernel arguments per platform
17:03:08 <bgilbert> for 1, teaching grub is non-trivial because of the primitives we have available.  also it complicates matters on non-x86 because of other bootloaders.
17:03:30 <bgilbert> for 2, we'd now have platform images that differ more than just in the platform ID
17:04:09 <bgilbert> for 2 we'd also need to teach coreos-installer to apply the same templating when installing with --platform to override the platform ID
17:05:16 <bgilbert> 2 is not quite as clean as 1 in the abstract sense, but seems easier to implement, especially since it doesn't implicate support for multiple bootloaders
17:05:27 <jlebon> IOW, 1 is dynamic, while 2 is static/hardcoded
17:05:30 <bgilbert> right
17:05:38 <bgilbert> so we've been leaning toward 2
17:05:59 <travier> 2 is also easier to revert / debug / workaround in case of issues as far as I understand
17:06:10 <bgilbert> yeah, I'd say so
17:06:17 <jlebon> that doesn't preclude us moving to 1 later on. it should mostly be an implementation detail
17:06:24 <bgilbert> the idea is to have per-platform kargs and per-platform grub commands (for switching grub to the serial console)
17:06:36 <dustymabe> i prefer 1, but it's not as easy to implement as 2 and I'm not the one implmenting so I'm cool with 2
17:06:37 <walters> jlebon: hmm seems harder to avoid duplicate kargs in that move though right?
17:06:51 <bgilbert> the latter doesn't affect non-grub but is also harmless for non-grub
17:07:11 <bgilbert> walters: this only affects new installs in any event
17:07:16 <walters> anyways agreed with above, me 2
17:07:22 <lucab> bgilbert: I don't remember what CL was doing in this area. Was there any useful lesson to learn from that?
17:08:14 <jlebon> walters: assuming we don't start updating grub configs (which we currently don't), I think it would be ok
17:08:16 <bgilbert> lucab: CL had a per-platform OEM partition image, so the grub.cfg could just unconditionally source a per-platform config fragment without actually knowing the platform ID
17:08:25 <walters> jlebon: ok, true
17:08:30 <bgilbert> that fragment could then set a kargs variable that was unconditionally included in the kargs
17:08:55 <bgilbert> so CL didn't have a couple of the constraints we have, including zipl
17:09:41 <lucab> ack thanks, so not much to learn or draw inspiration from
17:10:02 <bgilbert> it was the starting point for approach 1, but isn't really a good fit, yeah
17:10:43 <jlebon> is anyone strongly oppposed to 2?
17:11:39 <jlebon> sounds like a no :)
17:11:48 <jlebon> bgilbert: do you want to make a proposal?
17:13:10 <bgilbert> #proposed we will pursue a static approach to cloud-specific bootloader configs, where grub commands and kargs are templated at image build time and by coreos-installer install --platform at install time.
17:13:30 <bgilbert> does that cover it?
17:13:32 <lucab> +1
17:13:34 <jaimelm> ack
17:13:48 <jlebon> ack
17:14:09 <dustymabe> ack
17:14:16 <skunkerk> +1
17:14:21 <davdunc> +1
17:14:23 <travier> +1
17:14:28 <jlebon> SHIPIT
17:14:42 <dustymabe> i think the exact defaults on each platform might need some discussion (or at least added reasoning to the definition so it can be challenged later)
17:15:08 <jaimelm> issues for each where it can be hashed out?
17:15:24 <bgilbert> I don't think we need anything that heavyweight
17:15:43 <bgilbert> I went through cloud provider docs to pull out what each platform needs
17:15:50 <jaimelm> ahhh, cool
17:15:56 <bgilbert> I'm sure there are mistakes, but we can always just fix them
17:16:11 <jlebon> yeah, i think we should just match what we currently have in terms of console kargs, and anything forward which need changes can be discussed
17:16:24 <bgilbert> jlebon: we can't actually just match current
17:16:37 <bgilbert> because current is both consoles on all platforms, and we're trying to change that
17:16:53 <bgilbert> this change will need a coreos-status post anyway, and we can get some feedback in the testing channel
17:17:08 <jlebon> bgilbert: right, we did discuss that already in another ticket, so it's fine
17:17:12 <bgilbert> +1
17:17:48 <jlebon> so IOW, worth clarifying there's two moves happening here: moving to templating, and implementing the console= ticket
17:18:02 <bgilbert> yup
17:18:03 <jlebon> want to #agree the proposal above?
17:18:24 <bgilbert> #agreed we will pursue a static approach to cloud-specific bootloader configs, where grub commands and kargs are templated at image build time and by coreos-installer install --platform at install time.
17:18:46 <jlebon> cool. alrighty, let's move on then
17:18:56 <jlebon> #topic Consider disabling emergency shell timeout and reboot if an error is hit in the initramfs on first boot
17:18:59 <jlebon> #link https://github.com/coreos/fedora-coreos-tracker/issues/928
17:19:06 <jlebon> i can introduce this one
17:20:33 <jlebon> someone internally hit an issue where they booted RHCOS, the Ignition disks stage failed because TPM wasn't enabled, then we went to the "press Enter for the emergency shell or we'll reboot in 5 minutes" prompt, then we rebooted, and then RHCOS, thinking it's still first boot failed with some other error, completely masking the original error
17:21:06 <jlebon> if someone isn't watching the console (and if it's VGA and not serial, it's basically lost forever), they might not see the actual error
17:21:30 <jlebon> so the proposal is to not reboot and just always wait at the emergency shell, forever
17:22:26 <jlebon> because first boot is special and except in some special cases (kargs) our firstboot code pretty much doesn't account for rerunning on top of a partially firstbooted machine
17:23:10 <bgilbert> the reboot is a holdover from CL where it was part of the automatic rollback infrastructure.  in FCOS we have not pursued automatic rollback yet; there are a lot of moving parts we'd need to build.
17:24:01 <bgilbert> given that we're not close to needing it, I'd support dropping the autoreboot.  however, I've heard that there are users depending on it.
17:24:11 <walters> agreed on just removing the autoreboot
17:24:34 <jaimelm> +1
17:24:40 <jlebon> i wonder if we should even just automatically enter the shell
17:25:01 <bgilbert> jlebon: that's what would happen if that code goes away, yes
17:25:12 <bgilbert> jlebon: "press enter" is just so we know to stop the timer
17:25:20 <jlebon> bgilbert: gotcha
17:25:46 <jlebon> re. automatic rollback, I think it's compatible with this because we're only doing it on first boot
17:27:02 <jaimelm> time check
17:27:03 <jlebon> if there are people relying on this, i think we should solve their issues a better way
17:27:08 <bgilbert> jlebon: +1
17:27:12 <travier> +1
17:27:15 <jlebon> ok cool, let me make a quick proposal
17:27:17 <jaimelm> jlebon: yes
17:27:25 <bgilbert> I can ping a couple people to try to track down those users
17:27:37 <miabbott> +1 to removing autoreboot on first boot.  note the proper deprecation period for OCP (therefore RHCOS) is three releases, so if we know people are relying on this we'd have to give them a runway. (or at least figure out if they could adapt quicker)
17:27:59 <jlebon> #proposed we will disable the automatic reboot timeout upon hitting emergency.target in the initramfs on first boot
17:28:15 <dustymabe> ack
17:28:22 <jaimelm> ack
17:28:25 <dustymabe> would be nice to also implement https://github.com/coreos/fedora-coreos-tracker/issues/796 while we are in there
17:28:35 <walters> (I am coming up blank on thinking how someone could be relying on this)
17:28:48 <jlebon> dustymabe: you're right, that's closely related
17:28:57 <dustymabe> walters: I can think of one
17:29:07 <jaimelm> do tell
17:29:10 <jlebon> walters: i could imagine before our networking ordering was rock solid
17:29:24 <jlebon> "rock solid" might be going to far... but better than it was
17:29:24 <walters> hmm but Ignition retries
17:29:29 <bgilbert> dustymabe: seems better not to attach additional requirements though
17:29:45 <travier> +1
17:30:09 <dustymabe> bgilbert: not required, just in the same ballpark
17:30:11 <jaimelm> keep it separate but keep it in mind.
17:30:13 <bgilbert> jlebon: only on first boot?
17:30:14 <bgilbert> dustymabe: +1
17:31:01 <jaimelm> time
17:31:08 <jlebon> bgilbert: just being conservative since that's where we care most about it
17:31:25 <bgilbert> jlebon: by the same reasoning, though, I think it's preferable to delete the code
17:31:34 <jlebon> though implementation-wise, agreed it'd be easier to just do it all the time
17:31:52 <bgilbert> okay, cool, as long as we're aligned on what we're implying
17:31:54 <jlebon> subsequent boots should be similar everytime so I don't think the same reasoning applies
17:31:55 <travier> miabbott: I don't think this warrants a 3 releases deprecation period. Might depend to how we classify that in https://docs.openshift.com/container-platform/4.8/rest_api/understanding-api-support-tiers.html
17:31:56 <bgilbert> +1 to proposed
17:32:46 <jlebon> #agreed we will disable the automatic reboot timeout upon hitting emergency.target in the initramfs on first boot
17:32:57 <jlebon> looks like we're over time :|
17:33:00 <jlebon> let's do a quick open floor
17:33:03 <jlebon> #topic open floor
17:33:27 <jlebon> anything anyone wants to bring up?
17:33:29 <miabbott> travier: i don't really want to use 3 releases either, just noting the "official" ways.  (we already got burned once when we removed `dhclient`)
17:34:30 <jlebon> ok closing in 30s
17:34:59 <jlebon> #endmeeting