16:30:39 <lucab> #startmeeting fedora_coreos_meeting
16:30:39 <zodbot> Meeting started Wed Sep 15 16:30:39 2021 UTC.
16:30:39 <zodbot> This meeting is logged and archived in a public location.
16:30:39 <zodbot> The chair is lucab. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:30:39 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
16:30:39 <zodbot> The meeting name has been set to 'fedora_coreos_meeting'
16:30:48 <lucab> #topic roll call
16:31:12 <skunkerk> .hi sohank2602
16:31:13 <zodbot> skunkerk: Sorry, but user 'skunkerk' does not exist
16:31:44 <skunkerk> .hi sohank2602
16:31:44 <zodbot> skunkerk: Sorry, but user 'skunkerk' does not exist
16:31:49 <jlebon> .hello2
16:31:50 <zodbot> jlebon: jlebon 'None' <jonathan@jlebon.com>
16:31:57 <walters> .hello2
16:31:58 <zodbot> walters: walters 'Colin Walters' <walters@redhat.com>
16:32:05 <bgilbert> .hi
16:32:06 <zodbot> bgilbert: bgilbert 'Benjamin Gilbert' <bgilbert@backtick.net>
16:32:09 <ravanelli> .hi
16:32:10 <zodbot> ravanelli: ravanelli 'Renata Andrade Matos Ravanelii' <renata.ravanelli@gmail.com>
16:32:12 <dustymabe> .hi
16:32:16 <zodbot> dustymabe: dustymabe 'Dusty Mabe' <dusty@dustymabe.com>
16:32:25 <bgilbert> skunkerk: you want ".hello sohank2602"
16:32:33 <miabbott> .hello miabbott
16:32:34 <zodbot> miabbott: miabbott 'Micah Abbott' <miabbott@redhat.com>
16:32:51 <lucab> #chair miabbott bgilbert dustymabe ravanelli walters jlebon skunkerk
16:32:51 <zodbot> Current chairs: bgilbert dustymabe jlebon lucab miabbott ravanelli skunkerk walters
16:32:52 <skunkerk> .hello sohank2602
16:32:53 <zodbot> skunkerk: sohank2602 'Sohan Kunkerkar' <skunkerk@redhat.com>
16:34:13 <lucab> ok I'll start
16:34:15 <lucab> #topic Action items from last meeting
16:34:47 <lucab> #link https://meetbot.fedoraproject.org/teams/fedora_coreos_meeting/fedora_coreos_meeting.2021-09-08-16.30.html
16:35:12 <travier> .hi siosm
16:35:13 <zodbot> travier: Sorry, but user 'travier' does not exist
16:35:14 <lucab> I don't see anything left pending there
16:35:22 <travier> .hello siosm
16:35:23 <zodbot> travier: siosm 'Timothée Ravier' <travier@redhat.com>
16:35:29 <lucab> #chair travier
16:35:29 <zodbot> Current chairs: bgilbert dustymabe jlebon lucab miabbott ravanelli skunkerk travier walters
16:35:38 <lucab> let's go to the current tickets
16:36:09 <lucab> I'll start with the RFE one
16:36:18 <lucab> #topic Feature Request: Make possible to pin target release
16:36:28 <lucab> #link https://github.com/coreos/fedora-coreos-tracker/issues/947
16:37:16 <lucab> I missed this earlier as it got mixed into the pile of notifications when I was offline for a few days
16:37:33 <lucab> not sure if the reporter is around
16:37:48 <lucab> dustymabe: did you want to ask something related to that?
16:38:06 <dustymabe> not specifically - mostly interested in your thoughts lucab
16:38:23 <lucab> from my side, the request is legit but the config proposal is not very kosher (because it gets versioned/upgraded/rolledback with deployments)
16:38:34 <dustymabe> typically if people don't want updates you'd think they'd just turn zincati off
16:38:42 <lucab> it looks like a cluster coordination issue
16:38:55 <dustymabe> but in this case it's understandable that they'd want to just target a specific release as a max upgrade
16:39:38 <lucab> i.e. ensuring a whole fleet goes through updates in sync, with pausing points
16:39:59 <lucab> I'll followup asking to the reporter what's the underlying setup
16:40:18 <bgilbert> lucab: would it make sense for the max version to come from the coordination protocol, instead of a local ocnfig
16:40:20 <bgilbert> *config?
16:40:32 <lucab> if there is a pushing coordinator (like Ansible), then the PR that travier linked would cover this
16:40:49 <travier> I was under the impression that it would be a good use case for the marker file strategy
16:41:14 <lucab> if there is no pushing coordinator, then yes it will need to be pulled by Zincati as bgilbert says
16:42:03 <lucab> (possibly requiring a custom backend combining graph & lock management)
16:42:13 <dustymabe> ehh, to me a local config could make sense - but what lucab said about the files in /etc/ getting managed by rpm-ostree is a concern
16:42:39 <travier> dustymabe: it should simply be written in /var/lib/zincati/foo
16:42:42 <lucab> bgilbert, travier: I think you are both right, it really depends on the environment setup
16:42:54 <dustymabe> travier: in that case it would be fine then?
16:42:56 <bgilbert> I was thinking requiring a pull is an interesting middle ground, where it's possible to do this for cluster coordination while making it harder to abuse
16:43:22 <walters> if the version comes from a coordinator I think a good best practice is to put it in `/run` so it's not persistent on the node
16:43:23 <jlebon> i think max version wouldn't be enough even. i expect most of the time you'd want to able to make sure the whole fleet was upgrading to the same version
16:43:26 <travier> #link https://github.com/coreos/zincati/pull/540
16:43:29 <bgilbert> walters: +1
16:43:42 <bgilbert> walters: subject to race conditions I suppose
16:43:57 <walters> yeah, for sure, it just helps a bit
16:44:06 <jaimelm> .hello2
16:44:07 <zodbot> jaimelm: jaimelm 'Jaime Magiera' <jaimelm@umich.edu>
16:44:09 <lucab> yes it could into data/state, but that's not my primary approach
16:44:15 <lucab> #chair jaimelm
16:44:15 <zodbot> Current chairs: bgilbert dustymabe jaimelm jlebon lucab miabbott ravanelli skunkerk travier walters
16:44:33 <lucab> I think you all raising very good points
16:44:41 <travier> terst
16:44:46 <travier> (sorry)
16:45:27 <dustymabe> anybody want to recap?
16:45:28 <lucab> my gut feeling is that this would be better placed outside of local-node state, though possibly requiring some Zincati changes to make it work better
16:46:01 <lucab> dustymabe: I'll recap directly in the ticket, I need to followup with the reporter anyway
16:46:06 <dustymabe> ehh - but wouldn't that require you to run your own update server?
16:46:38 <lucab> dustymabe: maybe, or possibly just the lock-manager
16:46:40 <bgilbert> presumably we'd want to add a dropin to c-l-h-m also
16:46:44 <bgilbert> lock-manager is what I was thinking
16:47:08 <dustymabe> hmm - not sure I quite understand that bit - does the lock manager dictate a version?
16:47:12 <travier> There could be supported added to zincati to not update directly to the latest but to a given release while crossing all the barrier release in the middle
16:47:29 <bgilbert> dustymabe: not at the moment
16:47:38 <lucab> dustymabe: no, but it can allow or not the update
16:48:02 <dustymabe> lucab: right, but to me that's just the same as him manually disabling zincati on every node
16:48:17 <dustymabe> he can already do that. I think he's asking for something a little more powerful
16:48:24 <lucab> dustymabe: IIRC right now it doesn't know about current/target version, but the protocol can be augmented for that
16:49:02 <jlebon> dustymabe: it'd be a single field to change vs imperatively manipulating every node
16:49:36 <dustymabe> jlebon: does this assume you've already got a lock manager setup?
16:49:45 <lucab> yes, there is clearly a feature missing, but to find what it is the relevant one I'd need to know more about the environment
16:50:09 <jlebon> dustymabe: right, yeah
16:50:27 <dustymabe> k
16:50:35 <dustymabe> i think all my questions have been answered
16:50:47 <lucab> well, if you need fleet-wide steering you need to have a coordinator somewhere anyway
16:51:03 <dustymabe> though, lock manager would be a bit overkill for a single node (i.e. you knew there was a bug coming that hadn't been fixed)
16:51:05 <lucab> (either for push or pull, which is a relevant question here)
16:51:11 <dustymabe> though I guess in that case you just disable zincati
16:51:34 <lucab> dustymabe: I agree, but my understanding is that the problem here is keeping multiple nodes in sync
16:51:53 <dustymabe> lucab: yeah I honestly think there are two problems
16:51:57 <dustymabe> 1 - keeping nodes in sync
16:51:58 <lucab> anyway, let's move this back to the ticket so that we can get reporter's feedback
16:52:07 <dustymabe> 2 - preventing upgrade past certain max version
16:52:17 <dustymabe> i think they can be considered separately
16:53:06 <jlebon> yeah, let's move on and circle back
16:53:07 <lucab> #action lucab to follow up with the reporter in order to get clarity on the environment and on the actual problem encountered
16:53:10 <dustymabe> +1
16:53:12 <lucab> dustymabe: yes, likely
16:53:46 <lucab> next one, F35 maybe?
16:54:04 <lucab> #topic tracker: Fedora 35 changes considerations
16:54:24 <lucab> #link https://github.com/coreos/fedora-coreos-tracker/issues/856
16:54:57 <dustymabe> anyone with any updates to their tickets?
16:55:24 <travier> No update yet, sorry
16:55:38 <ravanelli> I have for the Luks one
16:55:44 <dustymabe> ravanelli++
16:55:53 <jlebon> also half-update on the libvirt one
16:56:05 <ravanelli> I got that tested for fcos, it is working.
16:56:11 <ravanelli> And for RHCOs the cryptsetup still in the old version, so is not working. Probably will get it in 4.10? Do we have a track for it in rhcos as well?
16:56:36 <jlebon> ravanelli: nah, we don't have to worry about RHCOS for this
16:56:47 <jlebon> ravanelli++ thanks for testing!
16:56:56 <lucab> #info OpenSSL 3.0 rebuilds started (moved to target F36 though)
16:56:57 <ravanelli> btw, should I updated only one place with the tests, or all bz/issues related?
16:57:17 <dustymabe> ravanelli: what exactly did you test (i guess you can just make a comment in the ticket)?
16:57:38 <jlebon> for libvirt, i've noticed *a lot* of packages are getting pulled in now when layering: https://github.com/coreos/fedora-coreos-tracker/issues/936#issuecomment-915593960
16:57:58 <ravanelli> dustymabe: https://pagure.io/fesco/issue/2638 I validated the 4k disk for luks
16:58:27 <jlebon> ravanelli: you can post your findings in https://github.com/coreos/fedora-coreos-tracker/issues/935
16:58:36 <walters> everyone knows hypervisors need libx11
16:58:41 <ravanelli> jlebon: thanks!
16:58:47 <travier> :)
16:59:13 <jlebon> walters: hehe
16:59:41 <jlebon> have to dig into this some more, but i suspect some deps were loosened a bit too much during the modular transition
17:00:42 <lucab> jlebon: did you check whether the baseline set is smaller in F34 installing the monolithic one?
17:01:06 <jlebon> lucab: that's what i'll do next
17:01:39 <lucab> (dumb question perhaps)
17:01:47 <lucab> ok
17:01:57 <lucab> let's move to the next topic then
17:02:32 <jlebon> +1
17:02:35 <lucab> I'll pick the console one
17:02:57 <lucab> #topic console defaults for x86_64 qemu platform
17:03:03 <dustymabe> +1
17:03:08 <lucab> #link https://github.com/coreos/fedora-coreos-tracker/issues/954
17:04:02 <jlebon> i nominate bgilbert :)
17:04:19 <bgilbert> cool cool
17:04:45 <lucab> partially related, we are seeing console-related issues on vmware at https://github.com/coreos/fedora-coreos-tracker/issues/943
17:04:49 <bgilbert> so jlebon, dustymabe, and I discussed this more OOB
17:05:11 <lucab> I'm not sure if different setups on qemu could be affected too
17:05:37 <bgilbert> we couldn't find a clear answer?  there are multiple factors in each direction.
17:05:56 <bgilbert> serial console is easier to use, and on QEMU doesn't have the downsides like hardware that gets confused when you write to its serial port
17:06:27 <bgilbert> it's not as user-visible as VGA console, though there are ways e.g. in virt-manager to get to it
17:06:55 <bgilbert> ideally the kernel would notice whether there's a VGA card and do the right thing, but no such luck
17:07:54 <bgilbert> we noticed that with the current console kargs, if you start qemu with '-serial none' and try to `rd.break`, the system just boots all the way to a login prompt
17:08:12 <bgilbert> but notably, the FCOS docs currently emphasize serial console
17:08:13 <bgilbert> so.
17:08:40 <bgilbert> it appears that we may need to support per-arch kargs for qemu anyway, because of ppc64le
17:09:44 <bgilbert> so our consensus was to continue defaulting to 'console=tty0 console=ttyS0,115200' on qemu for now, but
17:10:04 <lucab> (I think docs emphasizes serial consoles mostly because it was a source of trouble for many new installs)
17:10:16 <bgilbert> try to improve the UX, by seeing if we can get the initramfs e-shell to say something useful on secondary consoles.
17:10:39 <bgilbert> that'd improve the primary problematic case, Ignition failures.
17:10:59 <dustymabe> +1 - i guess we need to open a new RFE for that
17:11:03 <bgilbert> dustymabe: +1
17:11:18 <bgilbert> thoughts?
17:11:58 <dustymabe> +1 to everything you said
17:12:24 <lucab> if we drop all `console=` kargs on qemu, is the kernel able to autodetected VGA-vs-serial if there is only one of either present?
17:12:32 <bgilbert> lucab: nope :-(
17:13:07 <bgilbert> we talked about various ways to inject state from the outside, but basically there's nothing great
17:13:22 <bgilbert> guestfish script; coreos-installer qemu kargs embed; SMBIOS field
17:13:33 <bgilbert> all have UX and/or implementation difficulties
17:14:15 <bgilbert> if we wanted to get some code into GRUB, we could maybe add a command to check for a video card, and use that to set up the right kargs
17:14:55 <jlebon> i think that's worth exploring, but short/mid-term we stick with the status quo
17:15:27 <bgilbert> because of the ppc64le thing, we'd have the technical means to change the default later if we wanted
17:15:51 <bgilbert> part of the problem is that we don't really understand how people are launching the QEMU image today
17:15:54 <bgilbert> OpenStack has its own image
17:15:59 <dustymabe> yeah, i think we talked about how to make things better but all of them required a decent amount of work - versus just leaving things the way they are, and not requiring any extra work (other than the RFE to make things output on the VGA console more clear to the user)
17:16:07 <bgilbert> random libvirt/GNOME Boxes deploys?  something else?
17:17:15 <bgilbert> if we can get dracut to actually launch an e-shell on secondary consoles, that solves the biggest pain point for debuggability
17:17:26 <travier> I would assume that cosa run is the biggest QEMU image user?
17:17:32 <lucab> I was just reading through https://www.kernel.org/doc/html/v5.0/admin-guide/serial-console.html and the text there seems to hint at serial auto-detection
17:17:59 <bgilbert> travier: probably?  the current draft PR (which removes console kargs) teaches kola to re-add them with libguestfs
17:18:10 <jlebon> lucab: see https://github.com/coreos/fedora-coreos-tracker/issues/954#issuecomment-914613360
17:18:25 <dustymabe> `cosa run` definitely isn't a prod use case
17:18:58 <dustymabe> if we think users aren't using the qemu image for anything other than development then I'd argue leaving serail console as default would be better anyway
17:21:18 <dustymabe> timecheck - should we #proposed ?
17:21:52 <travier> if the libguestfs tweak in cosa does not significantly increase the boot time then any option is fine
17:22:03 <bgilbert> travier: it takes maybe 2-3 seconds
17:22:13 <travier> hum, not great :/
17:24:47 <bgilbert> #proposed for now, we will continue to add `console=tty0 console=ttyS0,115200` to the QEMU image.  we will investigate having the initramfs e-shell print some info to secondary consoles, and possibly start additional shells on them.
17:25:45 <dustymabe> +1
17:25:58 <jaimelm> +1
17:26:00 <miabbott> +1
17:26:05 <lucab> subjective opinion, but I think the qemu image is the one most used by interactive newcomers, which are most likely to have to debug Ignition troubles and have a non-serial console
17:26:40 <jlebon> +1
17:26:53 <bgilbert> lucab: yeah, I made that argument, at some length
17:28:06 <bgilbert> I'm hopeful that we can improve the behavior on secondary consoles so that the primary console as a concept is much less relevant
17:28:44 <lucab> (I'm personally for having no console= arguments at all, but I understand if consensus ends up somewhere else)
17:29:11 <bgilbert> #agreed for now, we will continue to add `console=tty0 console=ttyS0,115200` to the QEMU image.  we will investigate having the initramfs e-shell print some info to secondary consoles, and possibly start additional shells on them.
17:29:41 <lucab> ok
17:30:01 <lucab> we are already at time, there would be still a "kubernetes runtime" ticket
17:30:28 <lucab> do we need to react on that now or can we leave to the next meeting?
17:30:53 <lucab> #topic Kubernetes v1.22+ container runtime on Fedora CoreOS
17:30:57 <dustymabe> we can leave it - i mostly just wanted to do a status check on it
17:30:59 <jaimelm> I'm a stickler for time.
17:30:59 <lucab> #link https://github.com/coreos/fedora-coreos-tracker/issues/767
17:31:05 <dustymabe> let's push it
17:31:24 <lucab> #info deferred to next meeting due to time constraints
17:31:31 <lucab> #topic open floor
17:32:24 <dustymabe> nothing from me today
17:33:05 <lucab> same
17:33:10 <lucab> if nothing else, I'll close it here
17:33:25 <jaimelm> who was going to give the tour of the build CI to the unitiated? jlebon?
17:33:37 <jaimelm> I think that was last meeting
17:33:51 <jlebon> oh? don't recall, but happy to do that :)
17:34:06 <jlebon> sorry, currently in another meeting, but we can definitely chat about that
17:34:07 <jaimelm> I'm interested. Hit up me when you get a chance.
17:34:14 <jaimelm> cool
17:34:49 <lucab> ok, closing now
17:34:54 <lucab> #endmeeting