16:30:05 #startmeeting fedora_coreos_meeting 16:30:05 Meeting started Wed Jul 13 16:30:05 2022 UTC. 16:30:05 This meeting is logged and archived in a public location. 16:30:05 The chair is jlebon. Information about MeetBot at https://fedoraproject.org/wiki/Zodbot#Meeting_Functions. 16:30:05 Useful Commands: #action #agreed #halp #info #idea #link #topic. 16:30:05 The meeting name has been set to 'fedora_coreos_meeting' 16:30:11 #topic roll call 16:30:30 .hi 16:30:31 dustymabe: dustymabe 'Dusty Mabe' 16:30:37 .hi 16:30:38 lucab: lucab 'Luca BRUNO' 16:31:40 #chair dustymabe lucab 16:31:40 Current chairs: dustymabe jlebon lucab 16:31:42 .hi 16:31:43 ravanelli: ravanelli 'Renata Ravanelli' 16:31:44 .hello jasonbrooks 16:31:46 jbrooks: jasonbrooks 'Jason Brooks' 16:31:47 .hi 16:31:49 saqali: saqali 'Saqib Ali' 16:32:00 #chair ravanelli jbrooks saqali 16:32:00 Current chairs: dustymabe jbrooks jlebon lucab ravanelli saqali 16:32:41 .hi siosm 16:32:42 travier: Sorry, but user 'travier' does not exist 16:32:48 .hello siosm 16:32:49 travier: siosm 'Timothée Ravier' 16:32:58 .hello miabbott 16:33:01 miabbott: miabbott 'Micah Abbott' 16:33:36 #chair travier miabbott 16:33:36 Current chairs: dustymabe jbrooks jlebon lucab miabbott ravanelli saqali travier 16:33:46 let's wait another 30s 16:34:10 .hi 16:34:11 bgilbert: bgilbert 'Benjamin Gilbert' 16:34:21 #chair bgilbert 16:34:21 Current chairs: bgilbert dustymabe jbrooks jlebon lucab miabbott ravanelli saqali travier 16:34:25 ok, let's get started! 16:34:30 #topic Action items from last meeting 16:34:35 * jlebon to open investigation tickets for the IMA/FIDO changes 16:34:51 #info jlebon filed https://github.com/coreos/fedora-coreos-tracker/issues/1252 16:35:26 i suspect we'll bring this up in a future meeting when there's some discussion that happened 16:35:32 let's move on 16:35:49 #topic New Package Request: zstd 16:35:51 #link https://github.com/coreos/fedora-coreos-tracker/issues/1251 16:35:59 bgilbert: want to introduce this one? 16:36:06 sure 16:37:00 in https://github.com/coreos/fedora-coreos-tracker/issues/1247 we've been discussing the size of /boot and ways to fit more kernel/initrd pairs into it 16:37:05 .hi 16:37:06 jmarrero: jmarrero 'Joseph Marrero' 16:37:15 #chair jmarrero 16:37:15 Current chairs: bgilbert dustymabe jbrooks jlebon jmarrero lucab miabbott ravanelli saqali travier 16:37:47 #link https://github.com/coreos/fedora-coreos-tracker/issues/1247 16:37:50 we're currently compressing the initrd with gzip. if we switch to zstd, we can *both* reduce the initrd size by megabytes and reduce the boot time by hundreds of milliseconds 16:38:21 bgilbert: thanks for actually testing boot time too! 16:38:34 the exact initrd size reduction seems to vary depending on who tests it and when - jlebon got different numbers than I did, and I got different numbers last night than when I tested it 16:38:42 not sure what's going on there 16:39:06 #link https://github.com/coreos/fedora-coreos-tracker/issues/1247#issuecomment-1179490347 16:39:13 how much variance are you seeing? 16:40:02 last night I saw 104 MiB -> 74 MiB, the table there ^ has 104 -> 88, and you found 77 -> 68 16:40:27 a few questions 16:40:34 oh wow, that's quite a bit 16:40:46 worth digging into that I think 16:40:53 1. does this have any implications for people using PXE today? 16:41:11 but anyway, in order for dracut to use zstd, we need to ship the binary. we're already shipping the library because RPM needs it. 16:41:18 2. what does other Fedora variants do? should we consider trying to get others to act similarly? 16:41:23 (which means the first CVE I listed in the bug isn't relevant, since it's in the lib) 16:42:15 1. shouldn't. on x86_64, the bootloader treats the initrd as opaque. 16:43:13 that means the PXE firmware is not involved in any decompression and just maps the blob into memory as-is, right? 16:43:18 at least Silverblue uses gzip. i wouldn't be surprised if the others did as well 16:43:21 2. not sure. I haven't measured _compression_ time, which might be relevant for host-side initrd generation. and it might make tools unhappy? 16:43:30 lucab: right 16:43:49 i.e. any initrd inspection tools that don't support zstd 16:44:13 note that this is with -19, which is not the dracut default if you ask for --zstd; dracut defaults to -15 16:45:01 one other thing: before we can switch our initrd, coreos-installer must be updated to support zstd or `pxe customize` will break. I have a working draft PR, which will need to make it into a release 16:45:05 the PR isn't trivial unfortunately 16:45:37 (trivial PRs are boring to review!) 16:45:42 lucab: :-) 16:45:57 send me your trivial PRs, send the non-trivial ones to lucab 16:46:01 hah 16:46:04 :-P 16:46:22 note also that the el8 kernel doesn't support zstd initrds, so this would be an FCOS-only thing 16:46:30 el9 does 16:46:31 FCOS and el9, yeah 16:47:20 dustymabe: are you suggesting trying to drive this as a Fedora change? 16:47:30 I can not find a change for Fedora in general but this would be a good change to push to all variants 16:47:36 ah, I was just looking for that, bummer 16:47:57 jlebon: i'm not suggesting we block on that, but I do like to diverge from the other variants as little as possible 16:48:06 dustymabe: +1 16:48:09 i.e. we can push this change but followup and try to get other variants to consider it 16:48:17 given where we are in the change process, it'd be an F38 thing I think 16:48:23 bgilbert: correct 16:48:33 so we could include the FCOS experience as a PoC 16:48:35 eventual consistency 16:48:41 :) 16:49:02 #proposed We will add the zstd package to FCOS, and switch our initrd to use it. 16:49:06 zstd is already used a lot in Fedora elsewhere now so that should not be controversial (famou last words) 16:49:11 famous* 16:49:17 +1 16:49:23 for the proposal 16:49:31 +1 16:49:33 +1 16:49:39 ack 16:50:05 * dustymabe kind of wishes we could switch our image artifacts to be compressed with zstd too 16:50:34 #agreed We will add the zstd package to FCOS, and switch our initrd to use it. 16:50:39 cool, thanks all 16:50:47 there's a thread on devel about using zstd to compress modules + mentions initrd in passing - https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/YYSL4WQ6A337UNAR3AYBWKAROJ42SBOO/#KFIAXNVCODGO2ODOZ53OM5MB3PWUQ3TY 16:50:51 bgilbert: does libzstd bump its soname often? 16:51:15 lucab: oh, hmm 16:51:24 (pretty old thread, though) 16:51:25 looks like it's on 1 right now, but I haven't checked the policy 16:51:46 (just asking, for coreos-installer) 16:51:57 right 16:52:01 dustymabe: my understanding is xz still has better compression ratio 16:52:17 ok cool, ready to move on? 16:52:28 it's also 1 in RHEL 8 16:52:28 +1 16:52:40 lucab: we don't ship separate coreos-installer binaries, unlike Ignition and Butane 16:52:55 #topic extend grub boot prompt timeout on platforms with full console access 16:52:58 #link https://github.com/coreos/fedora-coreos-tracker/issues/1236 16:52:59 except for mirror.openshift.com, which arguably only has to run on RHEL (though we'll have two relevant RHEL majors) 16:53:05 looks like this one has had the meeting label for a while 16:53:10 dustymabe: want to take that one? 16:53:21 ahh yes 16:53:48 bgilbert: that's a good situation then I think wrt dynamic loading 16:53:58 lucab: +1 16:54:00 I think here I'm arguing to extend the grub timeout on platforms where console access is reasonable to something more than 1s 16:54:37 on platforms where we can't have console access (for whatever reason) we leave it alone or possibly make it 0 there 16:55:30 in the ticket bgilbert makes some good arguments for leaving it as is 16:55:47 I'm OK with either outcome, just wanted to start the discussion 16:56:51 concretely which platforms would this affect then? 16:58:02 any platform where console access is possible 16:58:19 well - I should say 2 way console access (read/write) 16:58:38 previously we could discount AWS here, but they recently added read/write serial console support (which has been amazing) 16:58:53 oh sweet, didn't know that 16:59:06 We'd have to look at the others to see what the limitations of each are 16:59:22 * dustymabe doesn't have that knowledge readily available 16:59:29 yeah, i'm still wary of adding e.g. 4 seconds of boot time on every boot to multiple platforms 16:59:40 have we had users report this as a legitimate UX issue? 16:59:49 correct, I'm just advocating for "more than 1s" 16:59:59 doesn't have to jump to 5 17:00:06 even 2 is twice as much time as we have today 17:00:11 haha 17:01:03 to summarize my arguments against: boot happens frequently (at least every OS release) and debugging happens approximately never. 17:01:12 the countdown can be stopped by pressing literally any key, so the user doesn't really need to know what they're doing 17:02:03 in cases where the boot menu is going by too fast, even 5 seconds won't help much: 20-minute BIOS POSTs and clouds that are slow to start a remote console 17:02:24 we can also add documentation about this in the troubleshooting section 17:02:25 and we have debugging docs that can explain how to interrupt the boot 17:02:28 yup 17:02:34 the "any key" part is important information - but still the user might not necessarily know that (I didn't until recently) 17:02:47 yeah docs help here 17:03:36 another hypothetical situation: keyboard doesn't initialize fast enough 17:04:27 I've definitely hit sitatuions like that on my Rpi (or at least I think that was the problem) 17:04:28 dustymabe: hmm, haven't seen that case 17:05:09 either way. I think I've talked enough. Will let others discuss 17:05:11 to repeat my earlier question, has any user (other than dustymabe) reported this as an issue in the past? 17:05:40 possibly relevant: I just tested timeout=0 in qemu and haven't been able to get to a menu at all 17:06:10 at least one thinks its useful :) 17:06:28 this was a slightly different problem: https://github.com/coreos/fedora-coreos-config/pull/281 17:06:42 dustymabe: are you mostly concerned with first-boot debugging, or any boot in general? 17:06:42 i.e. the Live ISO - not the FCOS disk image 17:07:05 lucab: any boot - any time I need to catch grub (for whatever reason) 17:07:19 maybe my system hangs on boot after upgrade and I need to select the rollback kernel entry 17:07:23 i think that live ISO PR predates iso kargs 17:08:30 if you've ever had a system with a long reboot/POST cycle, you know how frustrating it is to miss the GRUB prompt 17:08:35 jlebon: yeah 17:09:54 also, just holding down a key the whole time isn't great.. what if your previous menus (BIOS/UEFI) have special keys too? you have to make sure you're using one that doesn't affect them 17:10:00 "boot takes too long so let's make it longer" is an argument :-D 17:10:13 :) 17:10:47 I think I'm not being very convincing - and that is OK :) - we probably shouldn't waste too much time here, though 17:12:07 yeah, tricky... i'm not sure where to go from here. should we keep discussing this in the ticket? 17:12:45 nah. I think we can close it out 17:12:46 i think this would be more convincing if multiple people felt frustrated by the short timeout 17:12:49 IMO we should close it out. we can always revisit if a need arises, but I concur that there doesn't seem to be much demand for this right now 17:12:52 yup 17:13:11 I won't be much opposed to a 1s -> 2s change, but at the same time I'd be doubtful about any real advantage of doing that 17:13:28 ok, let's move on. thanks for bringing this up dustymabe! 17:13:36 jlebon: can you #info or something? 17:14:34 #info there seems to be insufficient demand for a longer timeout. we will not pursue this for now but may reconsider in the future pending more convincing feedback. 17:14:42 does that work? 17:14:59 if it's any `console`ation I'm sure we'll be talking about this again 17:15:19 WFM 17:15:28 maybe s/demand/desire/ 17:15:37 #undo 17:15:37 Removing item from minutes: INFO by jlebon at 17:14:34 : there seems to be insufficient demand for a longer timeout. we will not pursue this for now but may reconsider in the future pending more convincing feedback. 17:15:42 #info there seems to be insufficient desire for a longer timeout. we will not pursue this for now but may reconsider in the future pending more convincing feedback. 17:15:58 ok cool 17:16:02 #topic Change default to be equivalent of quiet 17:16:05 #link https://github.com/coreos/fedora-coreos-tracker/issues/1244 17:16:15 walters: want to take this one? 17:16:30 ok 17:16:46 +1 17:16:48 #chair walters 17:16:48 Current chairs: bgilbert dustymabe jbrooks jlebon jmarrero lucab miabbott ravanelli saqali travier walters 17:16:54 So...a lot of history here but basically our console is very verbose compared to the Anaconda default which injects `quiet` 17:17:18 this verbosity in turn triggers surprising problems: https://lwn.net/Articles/800946/ 17:17:30 it can cause soft or even hard kernel lockups 17:17:42 we're moving to drop use of the serial console 17:18:03 but...even for people that do want the serial console, I don't think defaulting to debug level is useful 17:18:13 now as I say, I think really we should change the Fedora global kernel default 17:18:14 we default to info, not debug 17:18:18 (we absolutely had users hitting those hard lockups due to console, on both RHCOS and FCOS) 17:18:32 but basically we can lead the way on this to start 17:20:24 otoh, I think I'd be OK trying to push to change the kernel to start, it's possible that this just hasn't come up 17:20:43 the existing log level does present a usability problem for console logins 17:21:02 makes sense to me! this and zstd, we should make sure we follow-up on the Fedora change if we want to lead 17:21:36 if anaconda defaults quiet anyway, this would just be lowering the default further down in the stack 17:22:29 so hopefully shouldn't be controversial... 17:22:36 famous last words 17:22:47 I'm assuming everyone agrees that we *do* want console output of some form, i.e. we're not going to inject `quiet` 17:23:38 ahh right. quiet affects more than just printk level 17:23:57 quiet also affects the boot, which is where we might want more info 17:23:58 what does `quiet` translates to, internally in the kernel? 17:24:37 * jlebon runs quiet in a `cosa run -c` 17:24:50 it also silences systemd 17:24:57 and affects both the initrd and real root 17:25:03 lucab: https://github.com/torvalds/linux/blob/master/init/main.c#L242 17:25:16 whereas here we're talking about just the real root, and just printk 17:25:50 right - so change the kernel compile flag, as colin mentioned earlier isn't necessarily what we want either 17:26:12 right? 17:26:26 yeah good point; changing the kernel default does mean we'd want to analyze the diff before the initramfs 17:26:46 hmm actually, i think it'd suppress most if not all messages actually 17:27:37 could we just deliver this as suggested by lucab as a sysctl dropin in the real root and maybe look at doing this fedora wide (maybe in some fedora-release package, or maybe the kernel)? 17:28:03 obviously anaconda based installs already have `quiet` so it won't really affect them 17:28:20 agree sysctl is cleaner *except* we then can't (afaik) easily do the logic for "only go quieter, not more verbose" 17:28:23 for editions that use `quiet`, wouldn't that stop people from removing `quiet` and getting verbosity? 17:28:27 right 17:29:21 yes, there would be two layers then 17:29:38 not for us, but for others 17:29:58 in that case let's just ship the sysctl.d dropin for us only (we don't have anaconda/quiet)? 17:30:10 or is there still a problem with that? 17:30:14 we can have different sysctl in initramfs and rootfs by the way I think, for us 17:30:33 lucab: right (I assume they get applied at different times)? 17:30:59 dustymabe: we'd still break the ability to use `quiet` for us which I think would be surprising 17:31:15 would we? 17:31:17 walters: how so? `quiet` would just be redundant 17:31:23 ^^ 17:31:24 walters: the API would be that users could override the file via Ignition 17:31:32 we'd break the ability to remove `quiet`, but we already don't have it 17:31:36 if we use the sysctl, we'd override what they provide on the kernel cmdline 17:31:54 i.e. if they provided debug? 17:32:15 well yes, actually if they provide `debug` *or* `quiet` on the kernel cmdline 17:32:36 but the sysctl would be equivalent to `quiet`? 17:32:44 re kernel log level 17:32:59 yeah, but he's arguing we want to support going the other way too 17:33:17 the admin specifying `quiet` would apply to the initramfs too 17:33:49 i'd just really like to not have a whole nother systemd service needed for this :( 17:34:01 walters: I think you're arguing that the sysctl would somehow interfere with the actual `quiet` karg somehow? 17:34:11 I agree it would interfere with `debug` 17:34:13 bgilbert: right; wouldn't it? 17:34:19 I'm arguing it wouldn't 17:34:30 because it sets the log level that `quiet` already sets 17:34:40 but only for the real root 17:34:56 okay, so `quiet` sets the log level in the initrd, then we get to the real root and the sysctl is redundant 17:34:58 real root is a subset of the whole boot :) 17:35:11 no, `quiet` is parsed by the kernel 17:35:15 and it's parsed by systemd 17:35:20 yes 17:35:30 the sysctl is a strict subset of `quiet` 17:35:35 `quiet` is for desktop systems usually where we want to look pretty 17:35:55 thus if the user manually specified `quiet`, it would do more things than the sysctl, but the sysctl wouldn't change anything back 17:36:07 the action the sysctl does is a subset of what quiet does 17:36:32 hmm ok I think you may be right; I may be getting mixed up on debug/info/warn 17:36:56 (worst case, a systemd generator which spits the sysctl fragment if the cmdline has no other quiet/debug/etc) 17:37:10 but circling back re. debug, how much do we care about that? are we ok with users also having to override the sysctl? 17:37:11 this all said, the sysctl would break the use of `debug` (intending to apply that to the real root too) right? 17:37:16 right 17:37:43 right, quiet == OK, debug == still masked, need to delete sysctl dropin 17:38:12 i can easily imagine a kernel support engineer getting confused at a nonfunctional documented kernel API 17:38:22 I don't see a trivial systemd service with ConditionKernelCommandLine as a huge problem; it doesn't need to sequentially order with anything 17:38:34 for setting the sysctl 17:39:12 yeah; and I think hopefully we can argue to include this in e.g. dracut (or perhaps systemd wants to handle this) 17:40:08 though lucab's generator has merit 17:40:26 avoids Ignition configs having to disable a magic service name 17:40:54 Well, we can use `ConditionKernelCommandLine=!quiet` etc as micah suggested 17:40:55 my reading of quiet/debug/etc semantic is "kernel loglevel up to the point where userland components can further tweak it" 17:40:57 does sysctl support /run/ ? 17:41:10 (or a service that's an injected dependency of systemd-sysctl) 17:41:31 dustymabe: yes, https://www.freedesktop.org/software/systemd/man/sysctl.d.html 17:41:58 walters: I mean if someone wants to disable the service without modifying kargs 17:41:59 +1 17:42:22 in walters' PR, the systemd unit runs from the initrd, so it can't be overriden via Ignition, but users are still free to add a regular sysctl dropin via Ignition 17:43:06 hmm, initrd seems like an odd choice 17:43:19 for something only affecting the real root 17:43:45 agreed, generally speaking I'd be happy to push things out of initrd 17:44:08 yeah I agree having it in the real root seems simplest 17:44:28 I was just thinking "I want this at the end of the initrd" 17:44:39 but end of the initrd = "real root" 17:45:06 I'm currently thinking a prereq service for systemd-sysctl makes the most sense 17:45:11 then the real root will awkwardly switch from info to silent a quarter of the way through 17:45:22 hmm 17:45:33 we probably don't need to work through implementation details here though 17:45:34 though right now the initrd also awkwardly switches, so meh 17:45:35 we're already 15m over 17:45:39 yeah, agreed 17:45:56 :) 17:46:08 or.. we could do nothing :) 17:46:15 also true 17:46:30 or actually 17:46:36 we could do the whole boot on info, and then switch after 17:46:49 +1 17:46:51 do we want to draw any info/proposed here or just leave it upstream for now? 17:47:16 #info there's general agreement this would be good, but implementation details are still to be fleshed out 17:47:19 there :) 17:47:34 ok, will end this in 60s unless someone wants to add something 17:47:59 (i can do a 60s open floor if folks would like, but we're really over) 17:48:33 #endmeeting