16:30:22 #startmeeting fedora_coreos_meeting 16:30:22 Meeting started Wed Aug 25 16:30:22 2021 UTC. 16:30:22 This meeting is logged and archived in a public location. 16:30:22 The chair is jlebon. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:30:22 Useful Commands: #action #agreed #halp #info #idea #link #topic. 16:30:22 The meeting name has been set to 'fedora_coreos_meeting' 16:30:39 .hi 16:30:40 lucab: lucab 'Luca BRUNO' 16:30:45 .hi 16:30:46 dustymabe: dustymabe 'Dusty Mabe' 16:30:52 .hello sohank2602 16:30:53 skunkerk: sohank2602 'Sohan Kunkerkar' 16:31:10 #chair lucab dustymabe skunkerk 16:31:10 Current chairs: dustymabe jlebon lucab skunkerk 16:31:28 .hi 16:31:29 vandercool: Sorry, but user 'vandercool' does not exist 16:31:36 oof 16:31:43 .hi 16:31:44 jeffnowicki: Sorry, but user 'jeffnowicki' does not exist 16:31:59 .hi 16:32:00 copperi: copperi 'Jan Kuparinen' 16:32:27 #chair vandercool jeffnowicki copperi 16:32:27 Current chairs: copperi dustymabe jeffnowicki jlebon lucab skunkerk vandercool 16:33:28 let's wait one more minute :) 16:33:35 .hello siosm 16:33:36 travier: siosm 'Timothée Ravier' 16:34:06 .hello jasonbrooks 16:34:07 jbrooks: jasonbrooks 'Jason Brooks' 16:34:17 #chair travier jbrooks 16:34:17 Current chairs: copperi dustymabe jbrooks jeffnowicki jlebon lucab skunkerk travier vandercool 16:34:23 .hi 16:34:24 bgilbert: bgilbert 'Benjamin Gilbert' 16:34:26 #chair bgilbert 16:34:26 Current chairs: bgilbert copperi dustymabe jbrooks jeffnowicki jlebon lucab skunkerk travier vandercool 16:34:45 alrighty, we can start I think 16:34:48 #topic Action items from last meeting 16:34:53 * dustymabe will open a ticket and look for volunteers to investigate "Third-party Software Mechanism" 16:34:56 * dustymabe will open a ticket and look for volunteers to investigate "Remove authselect-compat package" 16:34:59 * bgilbert to add empty partitions, update Butane, re-enable tests 16:35:16 #info dustymabe opened https://github.com/coreos/fedora-coreos-tracker/issues/932 16:35:22 #info dustymabe opened https://github.com/coreos/fedora-coreos-tracker/issues/933 16:35:37 haven't had a chance to work on it yet, but it's tracked in a ticket, so I won't re-action 16:35:57 bgilbert: +1 16:36:04 are those specific steps detailed there? 16:36:23 ahh yup 16:36:37 ok cool let's move on then 16:36:41 .hello2 16:36:42 jaimelm: jaimelm 'Jaime Magiera' 16:36:59 copperi, jeffnowicki, vandercool: were you here to discuss https://github.com/coreos/fedora-coreos-tracker/issues/931 ? 16:37:10 yes 16:37:30 yes... good discussion/debate with this ticket opened https://github.com/openshift/openshift-docs/issues/35793 16:37:31 wasn't planning to bring it up in this meeting, but we can do so now if you'd like and then you're free to stick around or /part 16:37:55 #topic IBM Cloud image variant should be 120GB 16:37:56 looking for a doc update to resolve 16:37:59 #link https://github.com/coreos/fedora-coreos-tracker/issues/931 16:38:24 vandercool: jeffnowicki: thanks for coming to the community meeting to discuss this! 16:38:36 ok cool, so is the final outcome here that nothing actually needs to change in the OS, and it's just a docs update? 16:38:58 the debate now will be how 'deep' to go with the doc update... scope to ibm cloud only? 16:39:07 jlebon: i think that's the way we were leaning in the ticket, assuming that change does land in the docs 16:39:20 +1 16:39:55 jeffnowicki: for most clouds, we ship a much smaller disk size and rely on auto-growing 16:40:07 seems like IBM cloud is special in that respect 16:40:12 jeffnowicki: yeah the "how deep to go" in the docs question really needs the openshift team input 16:41:14 hmm... so when did openshift bump min requirement to 120gb... and why? 16:41:35 jeffnowicki: this is best asked to the OCP team :) 16:41:51 understood 16:42:37 would it make sense to keep discussing this in the docs issue for now until we have more info? we can always discuss this again in the community mtg if warranted 16:42:58 jeffnowicki: am I clear in understanding that ibmcloud doesn't have the ability to dynamically specify the size of the disk when we create an instance? 16:43:07 would be good (IMO) if 'default' image size spec would meet min requirements... size to min supported value... vs. provision and then resize 16:44:04 dustymabe: woah, that'd be surprising 16:45:14 jeffnowicki: FCOS is not scoped to any particular application. we try to keep images as small as practical to reduce the minimum disk space requirement for the distro 16:45:27 .hello2 16:45:28 davdunc: davdunc 'David Duncan' 16:46:03 #chair davdunc 16:46:03 Current chairs: bgilbert copperi davdunc dustymabe jbrooks jeffnowicki jlebon lucab skunkerk travier vandercool 16:46:20 Definitely don't want to have oversized disk images for the ec2 side of the house. 16:46:32 i'm good with further discussion in docs issue... appreciate any feedback from this group and perhaps we get that properly inserted into issue 16:46:45 jeffnowicki: ack thanks! 16:47:00 jeffnowicki: We could really use an answer to @dustymabe's question 16:47:16 if you can ask to right folks, that would be great! 16:47:27 ack will do 16:47:45 cool, ok let's move on then 16:47:53 jeffnowicki: if you're having trouble finding the right people to ask questions feel free to grab me and I'll see if I can help 16:48:07 #topic tracker: Fedora 35 changes considerations 16:48:11 #link https://github.com/coreos/fedora-coreos-tracker/issues/856 16:48:44 we did the first half of this last week, now we're at the "Self-Contained Changes" 16:49:17 I count 9 with X TRIAGE 16:49:25 let's just go through these quickly 16:49:35 2.2 seems unapplicable. 16:49:41 non* 16:49:51 "Enhanced Inscript as default Indic IM" 16:50:08 2.5 SKIP? 16:50:09 yeah, can skip that. we don't ship locales 16:50:26 hehe 16:50:26 "GHC 8.10 and Stackage lts-18" SKIP, agreed 16:50:29 2.8 SKIP? 16:50:49 2.13 SKIP (workstation only) 16:50:51 "libmemcached-awesome" skip 16:51:12 2.8 I think it's worth sanity-checking that pkglayering still works fine since libvirt si a common layering target 16:51:19 2.15 SKIP (we do not use Anaconda) 16:51:54 same with 2.20 16:51:57 though travier - might be worth investigating the anaconda change for SB 16:52:15 dustymabe: true. Will do 16:52:31 "2.9 x TRIAGE Optimal LUKS Encryption Sector Size" --> should sanity-check Ignition LUKS support works fine with the updated cryptsetup 16:52:41 though our CI will do that for us :) 16:53:03 if it's already landed in rawhide, then it's already doing that for us 16:53:07 but might be worth checking that we're allocated the right header size 16:53:48 "2.20 x TRIAGE Sphinx 4" SKIP we don't ship Python 16:54:02 2.21 PipeWire SKIP 16:54:43 so it seems to me like it's just 2.8 and 2.9 16:54:57 agreed/disagreed? 16:55:24 ack 16:55:28 ack 16:55:37 should we #info the decision on the others (for the record)? 16:55:45 ^^ 16:55:59 dustymabe: that sounds right. 16:56:09 dustymabe: there's quite a few. hmm, ok how about something like: 16:56:36 #info currently of all the self-contained changes, only "Libvirt Modular Daemons" and "Optimal LUKS Encryption Sector Size" deserve closer investigation 16:57:03 Including the numbers has value I think 16:57:12 that works.. hopefully the list doesn't get new additions 16:57:17 #action jlebon will open a ticket and look for volunteers to investigate "Optimal LUKS Encryption Sector Size" 16:57:18 (of the non-applicable) 16:57:19 jaimelm: unfortunately the numbers do change 16:57:28 #action jlebon will open a ticket and look for volunteers to investigate "Libvirt Modular Daemons" 16:57:31 i've been using the "name" 16:57:37 true 16:58:17 dustymabe: we'll keep the issue TRIAGE and SKIP as the source of truth -- will update the ticket after the meeting 16:58:30 then any new changes which need TRIAGE will be noticed 16:58:54 +1 16:58:55 ok cool, are we ready to move on? 16:59:10 yes 16:59:11 #topic Support cloud-specific grub fragments 16:59:14 #link https://github.com/coreos/fedora-coreos-tracker/issues/110 16:59:30 bgilbert: do you want to introduce this and where we are? 16:59:42 sure 17:00:08 we're actively moving forward on disabling serial console by default 17:00:40 but some cloud platforms use serial console exclusively, so we need to be able to have per-platform defaults. 17:01:13 some of us have been discussing the best way to achieve that 17:01:22 basically there seem to be two approaches: 17:01:56 1. teach grub to understand what platform it's on, and apply conditionals (or load a grub config file fragment specific to the platform) 17:02:38 2. template the grub.cfg and kernel arguments per platform 17:03:08 for 1, teaching grub is non-trivial because of the primitives we have available. also it complicates matters on non-x86 because of other bootloaders. 17:03:30 for 2, we'd now have platform images that differ more than just in the platform ID 17:04:09 for 2 we'd also need to teach coreos-installer to apply the same templating when installing with --platform to override the platform ID 17:05:16 2 is not quite as clean as 1 in the abstract sense, but seems easier to implement, especially since it doesn't implicate support for multiple bootloaders 17:05:27 IOW, 1 is dynamic, while 2 is static/hardcoded 17:05:30 right 17:05:38 so we've been leaning toward 2 17:05:59 2 is also easier to revert / debug / workaround in case of issues as far as I understand 17:06:10 yeah, I'd say so 17:06:17 that doesn't preclude us moving to 1 later on. it should mostly be an implementation detail 17:06:24 the idea is to have per-platform kargs and per-platform grub commands (for switching grub to the serial console) 17:06:36 i prefer 1, but it's not as easy to implement as 2 and I'm not the one implmenting so I'm cool with 2 17:06:37 jlebon: hmm seems harder to avoid duplicate kargs in that move though right? 17:06:51 the latter doesn't affect non-grub but is also harmless for non-grub 17:07:11 walters: this only affects new installs in any event 17:07:16 anyways agreed with above, me 2 17:07:22 bgilbert: I don't remember what CL was doing in this area. Was there any useful lesson to learn from that? 17:08:14 walters: assuming we don't start updating grub configs (which we currently don't), I think it would be ok 17:08:16 lucab: CL had a per-platform OEM partition image, so the grub.cfg could just unconditionally source a per-platform config fragment without actually knowing the platform ID 17:08:25 jlebon: ok, true 17:08:30 that fragment could then set a kargs variable that was unconditionally included in the kargs 17:08:55 so CL didn't have a couple of the constraints we have, including zipl 17:09:41 ack thanks, so not much to learn or draw inspiration from 17:10:02 it was the starting point for approach 1, but isn't really a good fit, yeah 17:10:43 is anyone strongly oppposed to 2? 17:11:39 sounds like a no :) 17:11:48 bgilbert: do you want to make a proposal? 17:13:10 #proposed we will pursue a static approach to cloud-specific bootloader configs, where grub commands and kargs are templated at image build time and by coreos-installer install --platform at install time. 17:13:30 does that cover it? 17:13:32 +1 17:13:34 ack 17:13:48 ack 17:14:09 ack 17:14:16 +1 17:14:21 +1 17:14:23 +1 17:14:28 SHIPIT 17:14:42 i think the exact defaults on each platform might need some discussion (or at least added reasoning to the definition so it can be challenged later) 17:15:08 issues for each where it can be hashed out? 17:15:24 I don't think we need anything that heavyweight 17:15:43 I went through cloud provider docs to pull out what each platform needs 17:15:50 ahhh, cool 17:15:56 I'm sure there are mistakes, but we can always just fix them 17:16:11 yeah, i think we should just match what we currently have in terms of console kargs, and anything forward which need changes can be discussed 17:16:24 jlebon: we can't actually just match current 17:16:37 because current is both consoles on all platforms, and we're trying to change that 17:16:53 this change will need a coreos-status post anyway, and we can get some feedback in the testing channel 17:17:08 bgilbert: right, we did discuss that already in another ticket, so it's fine 17:17:12 +1 17:17:48 so IOW, worth clarifying there's two moves happening here: moving to templating, and implementing the console= ticket 17:18:02 yup 17:18:03 want to #agree the proposal above? 17:18:24 #agreed we will pursue a static approach to cloud-specific bootloader configs, where grub commands and kargs are templated at image build time and by coreos-installer install --platform at install time. 17:18:46 cool. alrighty, let's move on then 17:18:56 #topic Consider disabling emergency shell timeout and reboot if an error is hit in the initramfs on first boot 17:18:59 #link https://github.com/coreos/fedora-coreos-tracker/issues/928 17:19:06 i can introduce this one 17:20:33 someone internally hit an issue where they booted RHCOS, the Ignition disks stage failed because TPM wasn't enabled, then we went to the "press Enter for the emergency shell or we'll reboot in 5 minutes" prompt, then we rebooted, and then RHCOS, thinking it's still first boot failed with some other error, completely masking the original error 17:21:06 if someone isn't watching the console (and if it's VGA and not serial, it's basically lost forever), they might not see the actual error 17:21:30 so the proposal is to not reboot and just always wait at the emergency shell, forever 17:22:26 because first boot is special and except in some special cases (kargs) our firstboot code pretty much doesn't account for rerunning on top of a partially firstbooted machine 17:23:10 the reboot is a holdover from CL where it was part of the automatic rollback infrastructure. in FCOS we have not pursued automatic rollback yet; there are a lot of moving parts we'd need to build. 17:24:01 given that we're not close to needing it, I'd support dropping the autoreboot. however, I've heard that there are users depending on it. 17:24:11 agreed on just removing the autoreboot 17:24:34 +1 17:24:40 i wonder if we should even just automatically enter the shell 17:25:01 jlebon: that's what would happen if that code goes away, yes 17:25:12 jlebon: "press enter" is just so we know to stop the timer 17:25:20 bgilbert: gotcha 17:25:46 re. automatic rollback, I think it's compatible with this because we're only doing it on first boot 17:27:02 time check 17:27:03 if there are people relying on this, i think we should solve their issues a better way 17:27:08 jlebon: +1 17:27:12 +1 17:27:15 ok cool, let me make a quick proposal 17:27:17 jlebon: yes 17:27:25 I can ping a couple people to try to track down those users 17:27:37 +1 to removing autoreboot on first boot. note the proper deprecation period for OCP (therefore RHCOS) is three releases, so if we know people are relying on this we'd have to give them a runway. (or at least figure out if they could adapt quicker) 17:27:59 #proposed we will disable the automatic reboot timeout upon hitting emergency.target in the initramfs on first boot 17:28:15 ack 17:28:22 ack 17:28:25 would be nice to also implement https://github.com/coreos/fedora-coreos-tracker/issues/796 while we are in there 17:28:35 (I am coming up blank on thinking how someone could be relying on this) 17:28:48 dustymabe: you're right, that's closely related 17:28:57 walters: I can think of one 17:29:07 do tell 17:29:10 walters: i could imagine before our networking ordering was rock solid 17:29:24 "rock solid" might be going to far... but better than it was 17:29:24 hmm but Ignition retries 17:29:29 dustymabe: seems better not to attach additional requirements though 17:29:45 +1 17:30:09 bgilbert: not required, just in the same ballpark 17:30:11 keep it separate but keep it in mind. 17:30:13 jlebon: only on first boot? 17:30:14 dustymabe: +1 17:31:01 time 17:31:08 bgilbert: just being conservative since that's where we care most about it 17:31:25 jlebon: by the same reasoning, though, I think it's preferable to delete the code 17:31:34 though implementation-wise, agreed it'd be easier to just do it all the time 17:31:52 okay, cool, as long as we're aligned on what we're implying 17:31:54 subsequent boots should be similar everytime so I don't think the same reasoning applies 17:31:55 miabbott: I don't think this warrants a 3 releases deprecation period. Might depend to how we classify that in https://docs.openshift.com/container-platform/4.8/rest_api/understanding-api-support-tiers.html 17:31:56 +1 to proposed 17:32:46 #agreed we will disable the automatic reboot timeout upon hitting emergency.target in the initramfs on first boot 17:32:57 looks like we're over time :| 17:33:00 let's do a quick open floor 17:33:03 #topic open floor 17:33:27 anything anyone wants to bring up? 17:33:29 travier: i don't really want to use 3 releases either, just noting the "official" ways. (we already got burned once when we removed `dhclient`) 17:34:30 ok closing in 30s 17:34:59 #endmeeting