16:30:39 #startmeeting fedora_coreos_meeting 16:30:39 Meeting started Wed Sep 15 16:30:39 2021 UTC. 16:30:39 This meeting is logged and archived in a public location. 16:30:39 The chair is lucab. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:30:39 Useful Commands: #action #agreed #halp #info #idea #link #topic. 16:30:39 The meeting name has been set to 'fedora_coreos_meeting' 16:30:48 #topic roll call 16:31:12 .hi sohank2602 16:31:13 skunkerk: Sorry, but user 'skunkerk' does not exist 16:31:44 .hi sohank2602 16:31:44 skunkerk: Sorry, but user 'skunkerk' does not exist 16:31:49 .hello2 16:31:50 jlebon: jlebon 'None' 16:31:57 .hello2 16:31:58 walters: walters 'Colin Walters' 16:32:05 .hi 16:32:06 bgilbert: bgilbert 'Benjamin Gilbert' 16:32:09 .hi 16:32:10 ravanelli: ravanelli 'Renata Andrade Matos Ravanelii' 16:32:12 .hi 16:32:16 dustymabe: dustymabe 'Dusty Mabe' 16:32:25 skunkerk: you want ".hello sohank2602" 16:32:33 .hello miabbott 16:32:34 miabbott: miabbott 'Micah Abbott' 16:32:51 #chair miabbott bgilbert dustymabe ravanelli walters jlebon skunkerk 16:32:51 Current chairs: bgilbert dustymabe jlebon lucab miabbott ravanelli skunkerk walters 16:32:52 .hello sohank2602 16:32:53 skunkerk: sohank2602 'Sohan Kunkerkar' 16:34:13 ok I'll start 16:34:15 #topic Action items from last meeting 16:34:47 #link https://meetbot.fedoraproject.org/teams/fedora_coreos_meeting/fedora_coreos_meeting.2021-09-08-16.30.html 16:35:12 .hi siosm 16:35:13 travier: Sorry, but user 'travier' does not exist 16:35:14 I don't see anything left pending there 16:35:22 .hello siosm 16:35:23 travier: siosm 'Timothée Ravier' 16:35:29 #chair travier 16:35:29 Current chairs: bgilbert dustymabe jlebon lucab miabbott ravanelli skunkerk travier walters 16:35:38 let's go to the current tickets 16:36:09 I'll start with the RFE one 16:36:18 #topic Feature Request: Make possible to pin target release 16:36:28 #link https://github.com/coreos/fedora-coreos-tracker/issues/947 16:37:16 I missed this earlier as it got mixed into the pile of notifications when I was offline for a few days 16:37:33 not sure if the reporter is around 16:37:48 dustymabe: did you want to ask something related to that? 16:38:06 not specifically - mostly interested in your thoughts lucab 16:38:23 from my side, the request is legit but the config proposal is not very kosher (because it gets versioned/upgraded/rolledback with deployments) 16:38:34 typically if people don't want updates you'd think they'd just turn zincati off 16:38:42 it looks like a cluster coordination issue 16:38:55 but in this case it's understandable that they'd want to just target a specific release as a max upgrade 16:39:38 i.e. ensuring a whole fleet goes through updates in sync, with pausing points 16:39:59 I'll followup asking to the reporter what's the underlying setup 16:40:18 lucab: would it make sense for the max version to come from the coordination protocol, instead of a local ocnfig 16:40:20 *config? 16:40:32 if there is a pushing coordinator (like Ansible), then the PR that travier linked would cover this 16:40:49 I was under the impression that it would be a good use case for the marker file strategy 16:41:14 if there is no pushing coordinator, then yes it will need to be pulled by Zincati as bgilbert says 16:42:03 (possibly requiring a custom backend combining graph & lock management) 16:42:13 ehh, to me a local config could make sense - but what lucab said about the files in /etc/ getting managed by rpm-ostree is a concern 16:42:39 dustymabe: it should simply be written in /var/lib/zincati/foo 16:42:42 bgilbert, travier: I think you are both right, it really depends on the environment setup 16:42:54 travier: in that case it would be fine then? 16:42:56 I was thinking requiring a pull is an interesting middle ground, where it's possible to do this for cluster coordination while making it harder to abuse 16:43:22 if the version comes from a coordinator I think a good best practice is to put it in `/run` so it's not persistent on the node 16:43:23 i think max version wouldn't be enough even. i expect most of the time you'd want to able to make sure the whole fleet was upgrading to the same version 16:43:26 #link https://github.com/coreos/zincati/pull/540 16:43:29 walters: +1 16:43:42 walters: subject to race conditions I suppose 16:43:57 yeah, for sure, it just helps a bit 16:44:06 .hello2 16:44:07 jaimelm: jaimelm 'Jaime Magiera' 16:44:09 yes it could into data/state, but that's not my primary approach 16:44:15 #chair jaimelm 16:44:15 Current chairs: bgilbert dustymabe jaimelm jlebon lucab miabbott ravanelli skunkerk travier walters 16:44:33 I think you all raising very good points 16:44:41 terst 16:44:46 (sorry) 16:45:27 anybody want to recap? 16:45:28 my gut feeling is that this would be better placed outside of local-node state, though possibly requiring some Zincati changes to make it work better 16:46:01 dustymabe: I'll recap directly in the ticket, I need to followup with the reporter anyway 16:46:06 ehh - but wouldn't that require you to run your own update server? 16:46:38 dustymabe: maybe, or possibly just the lock-manager 16:46:40 presumably we'd want to add a dropin to c-l-h-m also 16:46:44 lock-manager is what I was thinking 16:47:08 hmm - not sure I quite understand that bit - does the lock manager dictate a version? 16:47:12 There could be supported added to zincati to not update directly to the latest but to a given release while crossing all the barrier release in the middle 16:47:29 dustymabe: not at the moment 16:47:38 dustymabe: no, but it can allow or not the update 16:48:02 lucab: right, but to me that's just the same as him manually disabling zincati on every node 16:48:17 he can already do that. I think he's asking for something a little more powerful 16:48:24 dustymabe: IIRC right now it doesn't know about current/target version, but the protocol can be augmented for that 16:49:02 dustymabe: it'd be a single field to change vs imperatively manipulating every node 16:49:36 jlebon: does this assume you've already got a lock manager setup? 16:49:45 yes, there is clearly a feature missing, but to find what it is the relevant one I'd need to know more about the environment 16:50:09 dustymabe: right, yeah 16:50:27 k 16:50:35 i think all my questions have been answered 16:50:47 well, if you need fleet-wide steering you need to have a coordinator somewhere anyway 16:51:03 though, lock manager would be a bit overkill for a single node (i.e. you knew there was a bug coming that hadn't been fixed) 16:51:05 (either for push or pull, which is a relevant question here) 16:51:11 though I guess in that case you just disable zincati 16:51:34 dustymabe: I agree, but my understanding is that the problem here is keeping multiple nodes in sync 16:51:53 lucab: yeah I honestly think there are two problems 16:51:57 1 - keeping nodes in sync 16:51:58 anyway, let's move this back to the ticket so that we can get reporter's feedback 16:52:07 2 - preventing upgrade past certain max version 16:52:17 i think they can be considered separately 16:53:06 yeah, let's move on and circle back 16:53:07 #action lucab to follow up with the reporter in order to get clarity on the environment and on the actual problem encountered 16:53:10 +1 16:53:12 dustymabe: yes, likely 16:53:46 next one, F35 maybe? 16:54:04 #topic tracker: Fedora 35 changes considerations 16:54:24 #link https://github.com/coreos/fedora-coreos-tracker/issues/856 16:54:57 anyone with any updates to their tickets? 16:55:24 No update yet, sorry 16:55:38 I have for the Luks one 16:55:44 ravanelli++ 16:55:53 also half-update on the libvirt one 16:56:05 I got that tested for fcos, it is working. 16:56:11 And for RHCOs the cryptsetup still in the old version, so is not working. Probably will get it in 4.10? Do we have a track for it in rhcos as well? 16:56:36 ravanelli: nah, we don't have to worry about RHCOS for this 16:56:47 ravanelli++ thanks for testing! 16:56:56 #info OpenSSL 3.0 rebuilds started (moved to target F36 though) 16:56:57 btw, should I updated only one place with the tests, or all bz/issues related? 16:57:17 ravanelli: what exactly did you test (i guess you can just make a comment in the ticket)? 16:57:38 for libvirt, i've noticed *a lot* of packages are getting pulled in now when layering: https://github.com/coreos/fedora-coreos-tracker/issues/936#issuecomment-915593960 16:57:58 dustymabe: https://pagure.io/fesco/issue/2638 I validated the 4k disk for luks 16:58:27 ravanelli: you can post your findings in https://github.com/coreos/fedora-coreos-tracker/issues/935 16:58:36 everyone knows hypervisors need libx11 16:58:41 jlebon: thanks! 16:58:47 :) 16:59:13 walters: hehe 16:59:41 have to dig into this some more, but i suspect some deps were loosened a bit too much during the modular transition 17:00:42 jlebon: did you check whether the baseline set is smaller in F34 installing the monolithic one? 17:01:06 lucab: that's what i'll do next 17:01:39 (dumb question perhaps) 17:01:47 ok 17:01:57 let's move to the next topic then 17:02:32 +1 17:02:35 I'll pick the console one 17:02:57 #topic console defaults for x86_64 qemu platform 17:03:03 +1 17:03:08 #link https://github.com/coreos/fedora-coreos-tracker/issues/954 17:04:02 i nominate bgilbert :) 17:04:19 cool cool 17:04:45 partially related, we are seeing console-related issues on vmware at https://github.com/coreos/fedora-coreos-tracker/issues/943 17:04:49 so jlebon, dustymabe, and I discussed this more OOB 17:05:11 I'm not sure if different setups on qemu could be affected too 17:05:37 we couldn't find a clear answer? there are multiple factors in each direction. 17:05:56 serial console is easier to use, and on QEMU doesn't have the downsides like hardware that gets confused when you write to its serial port 17:06:27 it's not as user-visible as VGA console, though there are ways e.g. in virt-manager to get to it 17:06:55 ideally the kernel would notice whether there's a VGA card and do the right thing, but no such luck 17:07:54 we noticed that with the current console kargs, if you start qemu with '-serial none' and try to `rd.break`, the system just boots all the way to a login prompt 17:08:12 but notably, the FCOS docs currently emphasize serial console 17:08:13 so. 17:08:40 it appears that we may need to support per-arch kargs for qemu anyway, because of ppc64le 17:09:44 so our consensus was to continue defaulting to 'console=tty0 console=ttyS0,115200' on qemu for now, but 17:10:04 (I think docs emphasizes serial consoles mostly because it was a source of trouble for many new installs) 17:10:16 try to improve the UX, by seeing if we can get the initramfs e-shell to say something useful on secondary consoles. 17:10:39 that'd improve the primary problematic case, Ignition failures. 17:10:59 +1 - i guess we need to open a new RFE for that 17:11:03 dustymabe: +1 17:11:18 thoughts? 17:11:58 +1 to everything you said 17:12:24 if we drop all `console=` kargs on qemu, is the kernel able to autodetected VGA-vs-serial if there is only one of either present? 17:12:32 lucab: nope :-( 17:13:07 we talked about various ways to inject state from the outside, but basically there's nothing great 17:13:22 guestfish script; coreos-installer qemu kargs embed; SMBIOS field 17:13:33 all have UX and/or implementation difficulties 17:14:15 if we wanted to get some code into GRUB, we could maybe add a command to check for a video card, and use that to set up the right kargs 17:14:55 i think that's worth exploring, but short/mid-term we stick with the status quo 17:15:27 because of the ppc64le thing, we'd have the technical means to change the default later if we wanted 17:15:51 part of the problem is that we don't really understand how people are launching the QEMU image today 17:15:54 OpenStack has its own image 17:15:59 yeah, i think we talked about how to make things better but all of them required a decent amount of work - versus just leaving things the way they are, and not requiring any extra work (other than the RFE to make things output on the VGA console more clear to the user) 17:16:07 random libvirt/GNOME Boxes deploys? something else? 17:17:15 if we can get dracut to actually launch an e-shell on secondary consoles, that solves the biggest pain point for debuggability 17:17:26 I would assume that cosa run is the biggest QEMU image user? 17:17:32 I was just reading through https://www.kernel.org/doc/html/v5.0/admin-guide/serial-console.html and the text there seems to hint at serial auto-detection 17:17:59 travier: probably? the current draft PR (which removes console kargs) teaches kola to re-add them with libguestfs 17:18:10 lucab: see https://github.com/coreos/fedora-coreos-tracker/issues/954#issuecomment-914613360 17:18:25 `cosa run` definitely isn't a prod use case 17:18:58 if we think users aren't using the qemu image for anything other than development then I'd argue leaving serail console as default would be better anyway 17:21:18 timecheck - should we #proposed ? 17:21:52 if the libguestfs tweak in cosa does not significantly increase the boot time then any option is fine 17:22:03 travier: it takes maybe 2-3 seconds 17:22:13 hum, not great :/ 17:24:47 #proposed for now, we will continue to add `console=tty0 console=ttyS0,115200` to the QEMU image. we will investigate having the initramfs e-shell print some info to secondary consoles, and possibly start additional shells on them. 17:25:45 +1 17:25:58 +1 17:26:00 +1 17:26:05 subjective opinion, but I think the qemu image is the one most used by interactive newcomers, which are most likely to have to debug Ignition troubles and have a non-serial console 17:26:40 +1 17:26:53 lucab: yeah, I made that argument, at some length 17:28:06 I'm hopeful that we can improve the behavior on secondary consoles so that the primary console as a concept is much less relevant 17:28:44 (I'm personally for having no console= arguments at all, but I understand if consensus ends up somewhere else) 17:29:11 #agreed for now, we will continue to add `console=tty0 console=ttyS0,115200` to the QEMU image. we will investigate having the initramfs e-shell print some info to secondary consoles, and possibly start additional shells on them. 17:29:41 ok 17:30:01 we are already at time, there would be still a "kubernetes runtime" ticket 17:30:28 do we need to react on that now or can we leave to the next meeting? 17:30:53 #topic Kubernetes v1.22+ container runtime on Fedora CoreOS 17:30:57 we can leave it - i mostly just wanted to do a status check on it 17:30:59 I'm a stickler for time. 17:30:59 #link https://github.com/coreos/fedora-coreos-tracker/issues/767 17:31:05 let's push it 17:31:24 #info deferred to next meeting due to time constraints 17:31:31 #topic open floor 17:32:24 nothing from me today 17:33:05 same 17:33:10 if nothing else, I'll close it here 17:33:25 who was going to give the tour of the build CI to the unitiated? jlebon? 17:33:37 I think that was last meeting 17:33:51 oh? don't recall, but happy to do that :) 17:34:06 sorry, currently in another meeting, but we can definitely chat about that 17:34:07 I'm interested. Hit up me when you get a chance. 17:34:14 cool 17:34:49 ok, closing now 17:34:54 #endmeeting