16:32:15 #startmeeting fedora_coreos_meeting 16:32:15 Meeting started Wed Aug 24 16:32:15 2022 UTC. 16:32:15 This meeting is logged and archived in a public location. 16:32:15 The chair is dustymabe. Information about MeetBot at https://fedoraproject.org/wiki/Zodbot#Meeting_Functions. 16:32:15 Useful Commands: #action #agreed #halp #info #idea #link #topic. 16:32:15 The meeting name has been set to 'fedora_coreos_meeting' 16:32:19 #topic roll call 16:32:21 .hi 16:32:22 dustymabe: dustymabe 'Dusty Mabe' 16:32:32 .hi 16:32:33 jmarrero: jmarrero 'Joseph Marrero' 16:32:49 .hello jasonbrooks 16:32:49 jbrooks: jasonbrooks 'Jason Brooks' 16:33:29 .hi 16:33:30 lorbus: lorbus 'Christian Glombek' 16:33:40 #chair jmarrero jbrooks lorbus 16:33:40 Current chairs: dustymabe jbrooks jmarrero lorbus 16:34:21 .hi 16:34:22 ravanelli: ravanelli 'Renata Ravanelli' 16:34:46 .hello2 16:34:47 jlebon: jlebon 'None' 16:34:53 #chari jlebon ravanelli 16:35:08 * dustymabe is hoping to have bgilbert and walters around to discuss a few things 16:35:48 .hi 16:35:49 aaradhak: aaradhak 'Aashish Radhakrishnan' 16:36:09 #chair aaradhak 16:36:09 Current chairs: aaradhak dustymabe jbrooks jmarrero lorbus 16:36:13 .hi 16:36:14 bgilbert: bgilbert 'Benjamin Gilbert' 16:36:20 #chair bgilbert travier 16:36:20 Current chairs: aaradhak bgilbert dustymabe jbrooks jmarrero lorbus travier 16:36:30 ok let's get started 16:36:36 #topic Action items from last meeting 16:36:49 #info no action items from last meeting! 16:37:00 #topic tracker: Fedora 37 changes considerations 16:37:16 .hi siosm 16:37:16 #link https://github.com/coreos/fedora-coreos-tracker/issues/1222 16:37:17 travier: Sorry, but user 'travier' does not exist 16:37:24 .hello siosm 16:37:25 travier: siosm 'Timothée Ravier' 16:38:23 ok a few updates on this topic.. the macaddresspolicy change fell out (thomas got busy and went on vacation) but we'll get it into F38. https://fedoraproject.org/wiki/Changes/MAC_Address_Policy_none 16:39:12 #info the hostname change got into f38/f37: https://github.com/coreos/fedora-coreos-tracker/issues/902#issuecomment-1225839825 16:39:19 #info the macaddresspolicy change fell out (thomas got busy and went on vacation) but we'll get it into F38. https://fedoraproject.org/wiki/Changes/MAC_Address_Policy_none 16:39:35 we have a few new self contained changes to review: 16:39:52 subtopic 225. Haskell GHC 8.10.7 & Stackage LTS 18.28 16:40:06 i'll have to leave at the halfway point for another meeting 16:40:31 nothing for us to do here. we don't ship those 16:40:40 subtopic 226. Mumble 1.4 16:40:47 nothing for us to do here. we don't ship mumble 16:40:48 dustymabe: nice work on the hostname change! that was a long road :) 16:40:59 jlebon: thanks :) 16:41:08 subtopic 227. Emacs 28 16:41:14 nothing for us to do here. we don't ship emacs 16:41:26 ok that's all the new items 16:41:35 anything else change related that we should discuss? 16:41:38 👍👍 16:41:44 looks good 16:41:54 there is the "Preset All Systemd Units on First Boot" - should we discuss status on that one? 16:42:31 it's in, I still need to verify that it works for us, but we're not dependent on it. we're still carrying the workaround in Ignition for now 16:42:47 i don't think there's anything new to discuss 16:43:13 there is also 118. BIOS boot.iso with GRUB2 16:43:26 which doesn't affect us directly 16:43:46 but we should consider trying to revisit these topics so we don't just drop them forever 16:44:13 once they fall off the current view of the world it's hard to remember to go back to them 16:45:10 let's file an issue for it? 16:45:11 👍 16:45:11 ok i'll move on to the next ticket 16:45:19 we hav https://github.com/coreos/fedora-coreos-tracker/issues/1231 16:45:34 ack, nice 16:45:51 yeah we have an issue. we just need to remember to followup 16:45:59 #topic Document /boot requirements and constrains when installing/upgrading kernels 16:46:03 #link https://github.com/coreos/fedora-coreos-tracker/issues/1247 16:46:28 ok I tagged this one with the meeting label 16:47:06 basically we have a few efforts underway here to help change our /boot contstraints 16:47:13 AFAIK there is 16:47:23 1. change compression algorithm (underway) 16:47:47 2. change rpm-ostree behavior to opportunistically cleanup rollback deployment if needed (in discussion) 16:47:55 3. change the size of /boot/ partition 16:48:01 (in discussion) 16:48:15 anything else I'm missing that hasn't already been disqualified? 16:49:04 (also, do those 3 look correctly characterized?) 16:49:46 seems right 16:49:58 yeah 16:50:19 ok so let's continue the discussion here 16:51:06 the reason I'm bringing this up is because we are looking to add ppc64le and the /boot contents there are larger than the other platforms (see https://github.com/coreos/fedora-coreos-tracker/issues/987#issuecomment-1221438641) 16:51:30 so it would be nice to have at least one of these mitigations in place before we ship that arch 16:51:44 so let's skip discussion on 1. since it's already in progress 16:52:49 for 3. are we seriously considering changing the /boot/ partition size and what all would that take (I imagine we'd want to do it across the board to retain symetry among most of our arches like we've had before 16:53:22 i touched on this in https://github.com/coreos/fedora-coreos-tracker/issues/1247#issuecomment-1177907272 16:53:41 i think we should, but we also need to be very careful about it given https://github.com/coreos/fedora-coreos-tracker/issues/1247#issuecomment-1190704602 16:54:02 I'm not sure we can 16:54:30 historically, at different times, we've documented two approaches for setting the rootfs size 16:54:53 (when creating an additional data partition on the same disk) 16:55:03 the older one was to set the starting offset of the data partition 16:55:21 and the current one is to use "resize: true" (now that we have that) and set the size of the rootfs directly 16:56:14 I'd say we should unify the docs, prepare the change to a bigger /boot, announce it and wait 6 months with reminders at 3 months? 16:56:18 if we had only done the older one, we could resize the bootfs up and the rootfs down and not risk clobbering anything 16:56:44 but with the newer one, we're stuck. we've carefully provided advice on how to write Ignition configs so the data partition won't be clobbered, and the advice was bad 16:57:09 travier: the old docs are gone, but old deployed Ignition configs may not be 16:57:58 bgilbert: and there is no way we can detect a "reprovision" and safely error? 16:58:01 yes, I understand that it requires our users to interact with their systems or change their configs 16:58:04 bgilbert: hmm, are you saying we can't require users to change their configs? 16:58:25 announcements are all well and good, but clobbering user data on systems configured according to our advice is Very Bad, and we can't guarantee that everyone will see an announcement 16:58:34 dustymabe: boot changes in general may also break ppc64le, since it uses petiboot. I don't size change changes will break it, but it is good to check with command for GRUB2 will change 16:59:04 if we can automatically detect & error out, that would address my concern, but in general I don't think we can 16:59:05 ravanelli: but power has not been shipped yet? 16:59:09 for focs 16:59:20 no, I think the idea is too 16:59:24 the user isn't required to indicate their intention to reprovision nondestructively, and reprovisioning destructively is a valid operation 16:59:37 (plus Ignition doesn't even know the new partition is there, so it'd have to happen somehow in coreos-installer) 17:00:29 there is indeed a lot of potential for tricky issues 17:01:42 Can we have an opt-in flag to resize /boot ? 17:01:44 it might help if we enumerate (offline) the cases 17:01:57 and indicate in which cases you end up with data loss 17:01:59 so that we don't touch the default but let folks opt-in for it? 17:02:23 travier: yes, if we're willing to incur a transposefs run for everyone who sets it 17:02:24 travier: i mean, people can already resize boot today 17:02:30 i see the concern, but I'm also concerned about being handcuffed forever 17:02:46 sure you can resize today but it's very manual 17:03:04 moving down gigs of data on first boot isn't a great UX 17:03:06 travier: as in a manually crafted butane config? 17:03:17 (still automated, though)? 17:03:18 Butane could certainly have a flag that desugars to resizing boot and recreating root at firstboot 17:03:39 https://github.com/coreos/fedora-coreos-docs/issues/410 17:03:48 (or we could ship two images, ugh) 17:03:48 correct 17:03:52 https://github.com/coreos/fedora-coreos-tracker/issues/1196#issuecomment-1132428498 17:04:00 travier: right 17:04:11 bgilbert: I'd definitely prefer not :) 17:04:14 me too 17:04:42 we could push this problem down the road 17:04:52 honestly a combination of 1 & 2 would probably suffice 17:05:00 shipping two image would have the advantage that we could say that the previous on is deprecated and announce a 1 year switch 17:05:00 perhaps we could also look at other potential space savings 17:05:03 for example 17:05:10 thus folks would have to look at it 17:05:13 could not just ignore it 17:05:17 travier: I don't really think that solves the problem 17:05:27 (I don't think we're special in this regard. e.g. Fedora has to continue working with old small /boot partitions forever) 17:05:57 * dustymabe goes back to old me and whispers in his ear to set /boot to at least 512M 17:06:19 we are slightly special in that we bake in a bunch of statically compiled (larger) files in our initramfs 17:06:37 (well okay, Anaconda-based Fedora can make new installations larger) 17:07:48 travier: if we update the coreos-installer defaults, it doesn't solve the problem, yeah 17:08:41 we would not be able to. it would be another "platform" 17:08:44 qemu2 17:08:46 * dustymabe wonders if we could take any cue based on the Ignition config version 17:08:52 but I agree that it's ugly 17:08:58 probaly not 17:09:12 yeah, I don't think so 17:09:15 would require a lot of assumptions 17:09:27 bgilbert: but unlike Anaconda-based Fedora, we put a lot more emphasis on reprovisionability 17:09:31 yup 17:09:50 this has me thinking about Colin's split-initramfs approach again 17:10:00 which I argued against pretty strongly on complexity grounds 17:10:09 yeah, I was with you 17:10:19 but it does have the advantage that we're handling the consequences of our decisions ourselves rather than pushing them onto the user 17:11:30 i'll call that... 17:11:40 4. split-initramfs/rootfs binaries 17:11:52 seriously though.. can we revisit 2. ? 17:11:58 do we have a sense of when to stop? i.e., if zstd give us some space back, and the rpm-ostree changes give us some flexibility, etc., 17:12:04 when do we call it good enough? 17:12:14 dustymabe: yes, let's 17:12:55 so IIUC 2. basically says "if we need extra space to finalize deployment, then we clean up the rollback files first" 17:12:59 the rpm-ostree change alone would fix this, but we don't want to rely on it too much 17:13:29 * dustymabe brb - please continue discussion 17:13:45 because you'd lose your rollback 17:14:45 https://github.com/ostreedev/ostree/issues/2670#issuecomment-1179341883 17:15:33 yeah 17:15:47 bgilbert: i'm still not over the fact that we can't change new images for this 17:15:54 * dustymabe back 17:15:55 i feel like we should have that freedom 17:16:21 it's unlikely to be the last of its kind 17:16:53 honestly I think if you lose your rollback you're in the same position as if you can't upgrade because you don't have enough space 17:17:16 jlebon: image-based auto-upgrading OSes don't have infinite degrees of freedom, sadly :-( 17:17:27 keep in mind here that the rollback you are losing is the one that you already haven't been running for two weeks 17:17:43 jlebon: we've known that 17:17:48 dustymabe: the reason you can't upgrade may have nothing to do with the ENOSPC check 17:18:35 but wouldn't we only be cleaning it up if we progressed enough in the upgrade to get to the final stage ? 17:19:05 i.e. most "upgrade" problems would have been cleared by that point 17:19:56 * dustymabe has a loose approximation about how rpm-ostree works, so clearly people who know better can tell me where I'm wrong 17:20:00 there's still things it does afterwards that could fail, but i think that's true, yes 17:20:51 ok we're running short on time.. 17:20:54 anyway, don't want to belabor this. we can chat more in the ticket! 17:20:58 any conclusions we want to draw at this point? 17:21:13 or paths forward (i.e. more investigation here or there?) 17:21:38 +1 on we should have that freedom. It feels like we should not be limited forever. We need some sort of system for these "breaking changes" if we don't have one already. We could detect if there is enough space to upgrade and change the partition size and if no space to resize, then error out while upgrading and leave it alone. 17:21:43 I think more investigation makes sense before drawing the line. 17:22:16 jmarrero: we don't have a mechanism for restructuring partitions at runtime 17:22:19 FCOS doesn't use LVM 17:22:28 (and can't really, Ignition doesn't support it) 17:22:44 dustymabe: there's still the ppc question 17:23:00 in principle we can have different partition sizes for different platforms, though I think we previously decided not to do that 17:23:07 right. IOW we're going to still need other solutions other than "resize /boot" unless we want to force people to reprovision existing systems 17:23:33 bgilbert: correct. I'd prefer to keep them in line if we can 17:23:34 if we're committing long-term to deal with it on existing platforms, then i'd say it's probably not worth diverging 17:24:00 "how much fix" do we need to be comfortable shipping ppc? 17:24:04 jlebon: i.e. might as well ship ppc64le since we have to deal with it anyway? 17:24:17 bgilbert: I assume the compression fix would be sufficient 17:24:22 okay 17:24:30 but would need to do some final testing 17:24:34 dustymabe: right yeah. and RHCOS is already shipping ppc64le with that layout 17:24:48 jlebon: the 384M layout? 17:25:11 dustymabe: yup 17:25:20 interesting.. 17:25:32 ok let me try to summarize 17:27:17 #proposed we discussed the different options for solving this general problem here today. Right now we don't see a clear path forward for changing the /boot/ partition size without risking data loss while re-provisioning systems. We're going to investigate the other options and also brainstorm on how we can increase the /boot partition in the future. For now we'll try to get at least the 17:27:18 compression mitigation in place and move forward with shipping ppc64le. 17:28:08 I'll add some more context in the ticket too 17:28:12 ack/nack? 17:28:34 ack 17:28:49 ack 17:28:58 ack 17:29:13 disagree about not diverging 17:29:22 we're hitting this on ppc now 17:29:35 so clearly the size requirements are not the same for each platforms 17:29:49 but ack for the proposed 17:30:01 well not the ppc part 17:30:13 I think we should fix it now as we have the option 17:30:17 I guess we were focused on FCOS here (which makes sense) 17:30:34 sure, but the size issues are the same 17:30:38 but it's possible the compression solution might not work there (does moving to zstd work their?) 17:30:44 travier: that'd mean we'd need to conditionalize the ppc boot size for FCOS/RHCOS, right? 17:30:47 travier: ^^ that's the difference 17:30:58 dustymabe: no zstd in RHEL 8 17:31:18 could we use more aggressive xz there? 17:31:41 (have to go sorry) 17:31:43 I guess we could require 1 to be in place for FCOS and 2. would solve the problem for RHCOS 17:31:45 dustymabe: we don't use xz at all right now. yes, but it slows down boot by seconds. 17:32:04 bgilbert: yeah, could be OK for one platform 17:32:11 sorry, one arch 17:32:32 ok I'll mark this as agreed and we'll continue to discuss options for RHCOS in the appropriate places for that 17:32:38 #agreed we discussed the different options for solving this general problem here today. Right now we don't see a clear path forward for changing the /boot/ partition size without risking data loss while re-provisioning systems. We're going to investigate the other options and also brainstorm on how we can increase the /boot partition in the future. For now we'll try to get at least the 17:32:40 compression mitigation in place and move forward with shipping ppc64le. 17:33:07 yeah, one arch in RHCOS for the lifetime of RHEL 8 could be okay 17:33:16 anyone able to hang around for the other two meeting topics? was hoping to get to them since jlebon is going to be AFK for some weeks? 17:33:35 I can 17:33:47 I can also 17:33:49 #topic NetworkManager: consider defaulting to EUI-64 for IPv6 SLAAC (at least on OpenStack) 17:33:58 #link https://github.com/coreos/fedora-coreos-tracker/issues/907 17:34:06 jlebon: you tagged this one I think 17:34:20 yup 17:34:53 i've talked to an SME and put the TL;DR in https://github.com/coreos/fedora-coreos-tracker/issues/907#issuecomment-1210894052 17:35:54 so I think that leans us towards doing the change, but for OpenStack only since the platform expects it 17:36:13 WFM 17:36:46 * dustymabe wonders if we need to consider "upgrading" systems to be different than "newly deployed" ones here 17:37:57 i was thinking it'd be for newly deployed systems only 17:38:19 via runtime conditinals on firstboot 17:38:24 which means we probably need a barrier that writes out a config describing the current behavior 17:38:57 ahh "runtime conditionals" meaning we dynamically apply the config? 17:39:02 on first boot 17:39:16 yeah, in e.g. `coreos-teardown-initramfs` where we have other config propagation bits 17:39:42 actually, better done before ignition-files 17:39:50 so it can be overridden if one really wants 17:39:58 maybe let's leave the implementation to a followup discussion 17:40:02 #proposed we will set ipv6.addr-gen-mode=eui64 as the default on our OpenStack platform since the platform expects this to be the case. We will attempt to leave currently deployed systems alone so that we don't change an existing system's IP address. 17:40:19 ack 17:40:41 wfm 17:40:46 any opposed? 17:40:48 ack from me :) 17:41:17 #agreed we will set ipv6.addr-gen-mode=eui64 as the default on our OpenStack platform since the platform expects this to be the case. We will attempt to leave currently deployed systems alone so that we don't change an existing system's IP address. 17:41:27 #topic Pinning coreos-assembler in FCOS releases 17:41:31 #link https://github.com/coreos/fedora-coreos-tracker/issues/1068 17:41:35 jlebon: you again :) 17:41:48 we can push this one if you'd like, or can discuss it today 17:42:40 the TL;DR is: this is unblocked now. let's fix it! i added a strawman at the end 17:42:49 cool to discuss here or keep it there 17:43:27 I think there are some details here that might be tricky to get right 17:43:45 agree 17:44:25 maybe since we are over time let's push it and maybe work out some of the details offline and bring back a better proposal to the meeting 17:44:33 in short, though. I think I'm in favor of pinning 17:44:36 to make things more reliable 17:45:09 sure, SGTM. i think if we keep it simple, it'll be likelier to get implemented :) 17:45:21 always true :) 17:45:24 #topic open floor 17:45:32 anyone have topics for open floor (sorry about the late meeting) 17:46:01 #info f37 test week for fcos is tentatively sept 19-23 17:46:22 which.. /me checks - happens to be when he has scheduled vacation 17:46:36 sigh.. might need to be the week before that :) 17:46:46 #undo 17:46:46 Removing item from minutes: INFO by dustymabe at 17:46:01 : f37 test week for fcos is tentatively sept 19-23 17:46:54 i'll circle back with SumantroMukherje on that 17:47:00 any other topics for open floor? 17:48:17 #endmeeting