16:30:32 <jlebon> #startmeeting fedora_coreos_meeting
16:30:32 <zodbot> Meeting started Wed Nov  4 16:30:32 2020 UTC.
16:30:32 <zodbot> This meeting is logged and archived in a public location.
16:30:32 <zodbot> The chair is jlebon. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:30:32 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
16:30:32 <zodbot> The meeting name has been set to 'fedora_coreos_meeting'
16:30:38 <jlebon> #topic roll call
16:30:53 <slowrie> .hello2
16:30:54 <zodbot> slowrie: slowrie 'Stephen Lowrie' <slowrie@redhat.com>
16:31:18 <jlebon> #chair slowrie
16:31:18 <zodbot> Current chairs: jlebon slowrie
16:32:37 <dustymabe> .hello2
16:32:38 <zodbot> dustymabe: dustymabe 'Dusty Mabe' <dusty@dustymabe.com>
16:32:43 <nasirhm> .hello2
16:32:43 <jlebon> #chair dustymabe
16:32:43 <zodbot> Current chairs: dustymabe jlebon slowrie
16:32:43 <zodbot> nasirhm: nasirhm 'Nasir Hussain' <nasirhussainm14@gmail.com>
16:32:50 <jlebon> #chair nasirhm
16:32:50 <zodbot> Current chairs: dustymabe jlebon nasirhm slowrie
16:33:19 <lucab> .hello2
16:33:20 <zodbot> lucab: lucab 'Luca Bruno' <lucab@redhat.com>
16:33:27 <jlebon> #chair lucab
16:33:27 <zodbot> Current chairs: dustymabe jlebon lucab nasirhm slowrie
16:33:49 <jlebon> let's wait 1 or 2 more minutes
16:35:21 <jlebon> alrighty, let's begin then :)
16:35:31 <jlebon> #topic Action items from last meeting
16:35:40 <jlebon> jlebon to add short term proposal for unblocking user to the ticket https://github.com/coreos/fedora-coreos-tracker/issues/653 dustymabe to open ticket with design discussion about authselect support+fcct sugar in the future
16:35:50 <jlebon> sigh, that was meant to be two lines
16:35:56 <jlebon> jlebon to add short term proposal for unblocking user to the ticket https://github.com/coreos/fedora-coreos-tracker/issues/653
16:36:01 <jlebon> dustymabe to open ticket with design discussion about authselect support+fcct sugar in the future
16:36:16 <jlebon> #info jlebon added short-term proposal here: https://github.com/coreos/fedora-coreos-tracker/issues/653#issuecomment-718225822
16:36:24 <jlebon> #info dustymabe opened ticket here: https://github.com/coreos/fedora-coreos-tracker/issues/657
16:36:51 <jlebon> dustymabe: anything you wanted to discuss on that topic before we move on?
16:37:12 <darkmuggle> 👋
16:37:19 <jlebon> #chair darkmuggle
16:37:19 <zodbot> Current chairs: darkmuggle dustymabe jlebon lucab nasirhm slowrie
16:37:30 <dustymabe> jlebon: the ticket I opened ?
16:37:39 <walters> .hello2
16:37:40 <zodbot> walters: walters 'Colin Walters' <walters@redhat.com>
16:37:53 <sumantro> .hello sumantrom
16:37:54 <zodbot> sumantro: sumantrom 'Sumantro Mukherjee' <sumukher@redhat.com>
16:37:56 <jlebon> dustymabe: yeah, re. sssd.  i guess the next step is actually prioritizing this work.
16:38:00 <dustymabe> It would be nice if we could get a volunteer to execute the "short term solution" you added to the ticket.
16:38:24 <jlebon> right yeah.  ok let's circle back to that maybe in open floor and take care of the meeting items first
16:38:32 <jlebon> #chair walters sumantro
16:38:32 <zodbot> Current chairs: darkmuggle dustymabe jlebon lucab nasirhm slowrie sumantro walters
16:38:51 <jlebon> #topic  next: default hostname now is `fedora`, used to be `localhost
16:38:54 <jlebon> https://github.com/coreos/fedora-coreos-tracker/issues/649
16:39:03 <jlebon> #link https://github.com/coreos/fedora-coreos-tracker/issues/649
16:39:25 <jlebon> lucab or darkmuggle: can one of you summarize where we are here and what the path forward is?
16:39:58 <dustymabe> I thought we had it figured out last week with https://github.com/coreos/fedora-coreos-tracker/issues/649#issuecomment-718148008
16:40:10 <dustymabe> but the added discussion to the ticket has me questioning that
16:42:13 <darkmuggle> yeah
16:42:24 <darkmuggle> So bear with me....
16:42:38 * dustymabe grabs popcorn
16:42:53 <darkmuggle> switch for a whiskey instead....
16:43:13 * dustymabe pours whiskey on his popcorn
16:43:51 <darkmuggle> The issue originally started in the Openshift MCO repo where a user reported that FCOS/OKD installs were getting hostname of `fedora`.
16:44:41 <darkmuggle> The default hostname of `fedora` is being hit on FCOS in GCP land since the DHCP provided hostnames are too long. NM refuses to set a long hostname.
16:45:18 <darkmuggle> The user was requesting that the MCO add a check on `fedora` to the existing checks that MCO does on `localhost.*`
16:45:36 <darkmuggle> We were advised that checking any magical strings (including `localhost`) is bad.
16:46:31 <darkmuggle> Having worked on RHCOS's hostname miseries and curses in the context of GCP and OpenShift, I am pushing that we fix FCOS instead of adding more hacks.
16:47:01 <dustymabe> where "fix FCOS" can take different paths
16:47:11 <darkmuggle> Correct.
16:47:15 <jlebon> is the hostname that afterburn gets from the metadata service the same as the one from DHCP, except that afterburn knows how to truncate?
16:47:31 <slowrie> I don't think afterburn truncates
16:47:32 <darkmuggle> Afterburn does not know how to truncate it.
16:47:37 <dustymabe> jlebon: afterburn can be taught how to truncate
16:47:47 <dustymabe> but ideally it wouldn't need to
16:47:56 <slowrie> Truncation happens in a systemd unit that's laid down by the MCO afaik
16:47:56 <darkmuggle> And yes, the meta-data provided name is the same as DHCP
16:48:01 <jlebon> ahh ok, so the afterburn path is just different from NM in that it doesn't care that it's too long?
16:48:06 <darkmuggle> Right
16:48:07 <dustymabe> https://github.com/coreos/afterburn/issues/509
16:48:43 <jlebon> is the hostname setting code in NM actually part of NM, or shipped as dispatch scripts by the MCO?
16:49:00 <darkmuggle> More nuance :)
16:50:36 <darkmuggle> NM  does not do any truncation. The MCO ships a system unit to prevent startup until a "valid" hostname is found and then disables the NM changing the hostname on GCP.
16:50:38 <darkmuggle> Why?
16:50:52 <darkmuggle> Because NM gets a bit contankerous when anything changes the hostname other than itself
16:51:15 <darkmuggle> And NM will reset the name leading to the Kublet erroring out.
16:51:44 <darkmuggle> And then on recent OCP versions, we introduced OVS
16:51:47 <jlebon> so the bit in NM that sets the hostname from dhcp does live in the NM codebase itself then?
16:51:48 <walters> (random aside, Debian's installer ISO seems to *require* the user input a hostname)
16:51:58 <walters> and also defaults to "debian"
16:52:09 <darkmuggle> Yes.
16:52:26 <darkmuggle> More context on MCO
16:53:01 <jlebon> ok, how about this: we ship a dispatcher script in fedora which truncates long names, then work with NM to have that functionality upstreamed
16:53:11 <darkmuggle> The sole reason the MCO has this logic is because on OCP we don't change the boot media. And the only place to reasonabily fix this problem on upgrading is the MCO.
16:53:54 <darkmuggle> I'm cool with whatever path is chosen.
16:53:57 <jlebon> darkmuggle: i'm confused how upgrades come into the picture. isn't /etc/hostname written to on first boot?
16:54:42 <walters> I'm ok with moving the script into fcos with the idea it lives in NM eventually
16:55:02 <dustymabe> walters: we'd have to talk with them to make sure they'd accept that
16:55:10 <dustymabe> lucab: did you start that conversation?
16:55:11 <walters> +1
16:55:13 <darkmuggle> The MCO code exists because of RHCOS, not FCOS.
16:55:24 <lucab> dustymabe: I did not
16:55:42 <darkmuggle> the dispatcher script won't work, tbh
16:55:47 <dustymabe> I guess no one signed up to reach out to the NM team to do that
16:55:51 <dustymabe> any volunteers?
16:56:08 <darkmuggle> I just remembered that the dispatcher sets the hostname outside of NM so NM changes it back
16:56:13 <darkmuggle> it has to be in the NM core itself.
16:56:25 <walters> The MCO code also is there for OKD (FCOS) right?
16:56:26 <lucab> dustymabe:  I'll put an action item for myself
16:56:38 <darkmuggle> Right.
16:56:52 <walters> actually aren't we also scoping in disabling NM setting the hostname on GCP?
16:57:03 <lucab> I'd like to check plain NM-on-FCOS-on-GCP behavior first though, on long fqdn
16:57:03 <darkmuggle> Right
16:57:18 <jlebon> ok, so to summarize: path forward is to ask NM whether they're willing to truncate hostname in-tree, and if so we can work on carrying short-term script until that lands
16:57:23 <walters> IOW this is basically "move MCO code as is to FCOS (and inherit from RHCOS)"
16:57:39 <jlebon> let me make that a proposal
16:57:59 <dustymabe> jlebon: has much changed since https://github.com/coreos/fedora-coreos-tracker/issues/649#issuecomment-718148008 ?
16:58:23 <slowrie> I'm a bit confused about moving the MCO code to FCOS; do we want that functionality outside of OKD/k8s?
16:58:32 <dustymabe> only change I see is we've proposed using a NM dispatcher (might not work) versus afterburn as the short term hack
16:58:59 <walters> slowrie: it's a good topic; I would say more generally that anything that requires a routable hostname wants this (and that's more than just k8s)
16:59:00 <dustymabe> slowrie: yeah I'm a bit confused too, i'd have to look at the code
16:59:19 <walters> it's "make the hostname Just Work on GCP"
16:59:34 <jlebon> dustymabe: it's a bit more concrete than the previous proposal i think
17:00:00 <dustymabe> is this a bug in GCPs platform? we have a call with them every two weeks we could try to bring it up?
17:00:03 <jlebon> and we can ask NM whether there's a sane way to carry this in FCOS, and if not we use afterburn
17:00:12 <dustymabe> jlebon: +1
17:00:15 <slowrie> walters: from what I've heard though it sounds like everything has it's own set of differing requirements on hostname (kernel vs kubelet) and I don't know if we want to / can maintain a script that works in all cases
17:00:32 <darkmuggle> ^ this
17:01:02 <darkmuggle> kernel hostname lenghth can be longer than what is allowed for TLS certificates (63 chars)
17:01:16 <lucab> dustymabe: I think that GCP DHCP behavior is not technically wrong, so arguably it does not need to be fixed
17:01:18 <walters> hmm; i think this is just truncating it on GCP, I am not aware of any distinct requirements in kernel vs kubelet
17:01:32 <darkmuggle> Kublet is 63 characters
17:01:53 <walters> now for openshift I think we'd still carry the "wait until we have a non-localhost name" in the MCO, we're not scoping moving that into FCOS - that's kind of kubelet/openshift specific
17:02:11 <jlebon> does the kubelet pick up the hostname from /etc/hostname or from gethostname?
17:02:11 <darkmuggle> +1 to that
17:02:15 <slowrie> I don't know the exact set of differing requirements; the main one I've heard is the character length difference (63 from kubelet b/c of TLS, 64 from kernel)
17:02:36 <darkmuggle> Now sure, on where it gets the hostname from
17:02:43 <walters> it has to be gethostname
17:02:51 <walters> otherwise it would fail in DHCP cases
17:04:01 <darkmuggle> Kernel is 65
17:04:46 <dustymabe> sounds like it would be nice to be able to have NM trucate the name and also maybe make the truncation threshold a configuration option
17:04:56 <jlebon> ok, let me make a proposal
17:05:40 <jlebon> #proposed we will ask NM whether they're willing to truncate the DHCP hostname in-tree, and if so we can work on carrying short-term script (or resort to afterburn logic for this) until that lands
17:05:44 * dustymabe notes there are some outstanding questions to be followed up on in https://bugzilla.redhat.com/show_bug.cgi?id=1892235
17:06:16 <dustymabe> jlebon: ack
17:06:29 <jlebon> if NM *doesn't* think truncating the hostname is sane behaviour, then we should listen to their reasons
17:07:38 <jlebon> anyone else?
17:08:29 <jlebon> a strong argument for this we can present to NM is that systemd-networkd also does this: https://github.com/systemd/systemd/pull/7616
17:09:10 <dustymabe> +1 / -1 anyone?
17:09:12 <jlebon> lucab, darkmuggle, walters: ack/nack?
17:09:20 <dustymabe> the proposal is pretty close to what we had last week anyway
17:09:24 <darkmuggle> +1
17:09:44 <walters> makes sense to me; I am a bit confused by the 63-65 length thread but I'm not sure that should block (we'd adopt 63 right?)
17:09:56 <lucab> yes, just that we didn't follow up to NM
17:10:12 <jlebon> #agreed we will ask NM whether they're willing to truncate the DHCP hostname in-tree, and if so we can work on carrying short-term script (or resort to afterburn logic for this) until that lands
17:10:27 <jlebon> ok, let's move on
17:10:30 <dustymabe> walters: it might be different for different users, which is why I was thinking NM may want a config option for the threshold
17:10:33 <dustymabe> jlebon: +1
17:10:40 <dustymabe> one sec
17:10:40 <walters> now the reason we got here is the `fedora` hostname; presumably we'd add that special case in this script for now?
17:10:50 <jlebon> dustymabe: does https://github.com/coreos/fedora-coreos-tracker/issues/609 still have the meeting label on purpose?
17:11:13 <dustymabe> #action lucab to follow up with NM team about hostname length and run this issue to ground
17:11:36 <dustymabe> jlebon: mostly just to discuss progress of moving to f33
17:11:43 <dustymabe> I don't have many updates there so we can skip it
17:11:49 <jlebon> ack, SGTM
17:11:57 <jlebon> #topic  Fedora Test day for our `next` stream (Fedora 33)
17:12:00 <jlebon> #link https://github.com/coreos/fedora-coreos-tracker/issues/659
17:12:13 <darkmuggle> walters: the tl;dr is that we can truncate a hostname longer than 65 chars and NM wan't complain. But if the hostname is between 63 and 64 then NM will reset it to the longer name and give an ugly message about the hostname being changed outside NM.
17:12:35 <walters> darkmuggle: even if we set the flag to tell NM not to change the hostname, as we are right now?
17:12:36 <dustymabe> #info there is a test day coming up on Friday November 6th. https://github.com/coreos/fedora-coreos-tracker/issues/659
17:12:56 <dustymabe> Please help us review the existing test cases, add new ones, and also participate in the test day
17:12:59 <jlebon> darkmuggle, walters: let's carry on the NM convo elsewhere please :)
17:13:31 <dustymabe> thanks to sumantro who has been amazing in helping us get it organized (this time and last time)
17:13:52 <jlebon> we can probably add some platforms to the template since last time
17:14:01 <dustymabe> we even have a fedora badge we'll be handing out for participation
17:14:13 <dustymabe> jlebon: I think sumantro added vultr ibmcloud and exoscale
17:14:16 <jlebon> sumantro++  that's really cool
17:14:16 <zodbot> jlebon: Karma for sumantro changed to 2 (for the current release cycle):  https://badges.fedoraproject.org/tags/cookie/any
17:14:35 <sumantro> dustymabe and all , let me know if there is something I can do more
17:14:38 <dustymabe> https://badges.fedoraproject.org/badge/fedora-33-coreos-test-day
17:14:47 <jlebon> hmm i don't see those in https://testdays.fedorainfracloud.org/events/98
17:15:35 <sumantro> dustymabe, I am missing vultr, but exoscale and ibm cloud are added
17:15:42 <dustymabe> nice
17:15:51 <sumantro> dustymabe, thanks for designing the badge
17:15:59 <dustymabe> sumantro: I just awarded the badge to you :)
17:16:05 <jlebon> ahh heh, vultr is the only one i was Ctrl+F'ing for :)
17:16:55 <jlebon> ok so basically, we need to make sure that existing test case procedures are still good, and then add any new relevant ones?
17:16:58 <sumantro> dustymabe, I am sumantrom on FAS :D
17:17:30 <sumantro> I think you awarded someone else , wrongly .. BUT thanks :D
17:17:40 <dustymabe> ha.. boo
17:17:41 <sumantro> jlebon, yes!
17:18:30 <jlebon> ahh heh, and i see dusty already did an #info for that :)
17:18:39 <jlebon> anything else to discuss on this topic?
17:18:46 <dustymabe> that's it for this topic
17:18:52 <jlebon> +1
17:19:09 <jlebon> anything else before open floor?
17:20:16 <jlebon> #topic Open Floor
17:20:31 <dustymabe> so we did a round of releases, but we're going to do another set because of a problem?
17:20:33 <dustymabe> is that right?
17:20:44 <jlebon> heh yeah was going to bring that up :)
17:21:04 <jlebon> correct. skunkerk_ is going to respin the testing and next once the patch is in
17:21:08 <jlebon> stable is unaffected
17:21:54 <jlebon> for the record, this is re. https://github.com/coreos/fedora-coreos-tracker/issues/660
17:22:53 <jlebon> anything else anyone wants to bring up?
17:23:20 <dustymabe> just wanted to call out that we're close on our openstack testing
17:23:34 <dustymabe> https://github.com/coreos/coreos-assembler/pull/1833 gets the mantle code base working
17:23:49 <dustymabe> the only test failing right now on openstack is coreos.ignition.ssh.key
17:24:05 <jlebon> nice!
17:24:06 <dustymabe> see https://github.com/coreos/coreos-assembler/issues/1835 for details on why
17:25:27 <jlebon> ok, closing meeting in 1 minute
17:26:29 <jlebon> #endmeeting