16:30:32 #startmeeting fedora_coreos_meeting 16:30:32 Meeting started Wed Nov 4 16:30:32 2020 UTC. 16:30:32 This meeting is logged and archived in a public location. 16:30:32 The chair is jlebon. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:30:32 Useful Commands: #action #agreed #halp #info #idea #link #topic. 16:30:32 The meeting name has been set to 'fedora_coreos_meeting' 16:30:38 #topic roll call 16:30:53 .hello2 16:30:54 slowrie: slowrie 'Stephen Lowrie' 16:31:18 #chair slowrie 16:31:18 Current chairs: jlebon slowrie 16:32:37 .hello2 16:32:38 dustymabe: dustymabe 'Dusty Mabe' 16:32:43 .hello2 16:32:43 #chair dustymabe 16:32:43 Current chairs: dustymabe jlebon slowrie 16:32:43 nasirhm: nasirhm 'Nasir Hussain' 16:32:50 #chair nasirhm 16:32:50 Current chairs: dustymabe jlebon nasirhm slowrie 16:33:19 .hello2 16:33:20 lucab: lucab 'Luca Bruno' 16:33:27 #chair lucab 16:33:27 Current chairs: dustymabe jlebon lucab nasirhm slowrie 16:33:49 let's wait 1 or 2 more minutes 16:35:21 alrighty, let's begin then :) 16:35:31 #topic Action items from last meeting 16:35:40 jlebon to add short term proposal for unblocking user to the ticket https://github.com/coreos/fedora-coreos-tracker/issues/653 dustymabe to open ticket with design discussion about authselect support+fcct sugar in the future 16:35:50 sigh, that was meant to be two lines 16:35:56 jlebon to add short term proposal for unblocking user to the ticket https://github.com/coreos/fedora-coreos-tracker/issues/653 16:36:01 dustymabe to open ticket with design discussion about authselect support+fcct sugar in the future 16:36:16 #info jlebon added short-term proposal here: https://github.com/coreos/fedora-coreos-tracker/issues/653#issuecomment-718225822 16:36:24 #info dustymabe opened ticket here: https://github.com/coreos/fedora-coreos-tracker/issues/657 16:36:51 dustymabe: anything you wanted to discuss on that topic before we move on? 16:37:12 👋 16:37:19 #chair darkmuggle 16:37:19 Current chairs: darkmuggle dustymabe jlebon lucab nasirhm slowrie 16:37:30 jlebon: the ticket I opened ? 16:37:39 .hello2 16:37:40 walters: walters 'Colin Walters' 16:37:53 .hello sumantrom 16:37:54 sumantro: sumantrom 'Sumantro Mukherjee' 16:37:56 dustymabe: yeah, re. sssd. i guess the next step is actually prioritizing this work. 16:38:00 It would be nice if we could get a volunteer to execute the "short term solution" you added to the ticket. 16:38:24 right yeah. ok let's circle back to that maybe in open floor and take care of the meeting items first 16:38:32 #chair walters sumantro 16:38:32 Current chairs: darkmuggle dustymabe jlebon lucab nasirhm slowrie sumantro walters 16:38:51 #topic next: default hostname now is `fedora`, used to be `localhost 16:38:54 https://github.com/coreos/fedora-coreos-tracker/issues/649 16:39:03 #link https://github.com/coreos/fedora-coreos-tracker/issues/649 16:39:25 lucab or darkmuggle: can one of you summarize where we are here and what the path forward is? 16:39:58 I thought we had it figured out last week with https://github.com/coreos/fedora-coreos-tracker/issues/649#issuecomment-718148008 16:40:10 but the added discussion to the ticket has me questioning that 16:42:13 yeah 16:42:24 So bear with me.... 16:42:38 * dustymabe grabs popcorn 16:42:53 switch for a whiskey instead.... 16:43:13 * dustymabe pours whiskey on his popcorn 16:43:51 The issue originally started in the Openshift MCO repo where a user reported that FCOS/OKD installs were getting hostname of `fedora`. 16:44:41 The default hostname of `fedora` is being hit on FCOS in GCP land since the DHCP provided hostnames are too long. NM refuses to set a long hostname. 16:45:18 The user was requesting that the MCO add a check on `fedora` to the existing checks that MCO does on `localhost.*` 16:45:36 We were advised that checking any magical strings (including `localhost`) is bad. 16:46:31 Having worked on RHCOS's hostname miseries and curses in the context of GCP and OpenShift, I am pushing that we fix FCOS instead of adding more hacks. 16:47:01 where "fix FCOS" can take different paths 16:47:11 Correct. 16:47:15 is the hostname that afterburn gets from the metadata service the same as the one from DHCP, except that afterburn knows how to truncate? 16:47:31 I don't think afterburn truncates 16:47:32 Afterburn does not know how to truncate it. 16:47:37 jlebon: afterburn can be taught how to truncate 16:47:47 but ideally it wouldn't need to 16:47:56 Truncation happens in a systemd unit that's laid down by the MCO afaik 16:47:56 And yes, the meta-data provided name is the same as DHCP 16:48:01 ahh ok, so the afterburn path is just different from NM in that it doesn't care that it's too long? 16:48:06 Right 16:48:07 https://github.com/coreos/afterburn/issues/509 16:48:43 is the hostname setting code in NM actually part of NM, or shipped as dispatch scripts by the MCO? 16:49:00 More nuance :) 16:50:36 NM does not do any truncation. The MCO ships a system unit to prevent startup until a "valid" hostname is found and then disables the NM changing the hostname on GCP. 16:50:38 Why? 16:50:52 Because NM gets a bit contankerous when anything changes the hostname other than itself 16:51:15 And NM will reset the name leading to the Kublet erroring out. 16:51:44 And then on recent OCP versions, we introduced OVS 16:51:47 so the bit in NM that sets the hostname from dhcp does live in the NM codebase itself then? 16:51:48 (random aside, Debian's installer ISO seems to *require* the user input a hostname) 16:51:58 and also defaults to "debian" 16:52:09 Yes. 16:52:26 More context on MCO 16:53:01 ok, how about this: we ship a dispatcher script in fedora which truncates long names, then work with NM to have that functionality upstreamed 16:53:11 The sole reason the MCO has this logic is because on OCP we don't change the boot media. And the only place to reasonabily fix this problem on upgrading is the MCO. 16:53:54 I'm cool with whatever path is chosen. 16:53:57 darkmuggle: i'm confused how upgrades come into the picture. isn't /etc/hostname written to on first boot? 16:54:42 I'm ok with moving the script into fcos with the idea it lives in NM eventually 16:55:02 walters: we'd have to talk with them to make sure they'd accept that 16:55:10 lucab: did you start that conversation? 16:55:11 +1 16:55:13 The MCO code exists because of RHCOS, not FCOS. 16:55:24 dustymabe: I did not 16:55:42 the dispatcher script won't work, tbh 16:55:47 I guess no one signed up to reach out to the NM team to do that 16:55:51 any volunteers? 16:56:08 I just remembered that the dispatcher sets the hostname outside of NM so NM changes it back 16:56:13 it has to be in the NM core itself. 16:56:25 The MCO code also is there for OKD (FCOS) right? 16:56:26 dustymabe: I'll put an action item for myself 16:56:38 Right. 16:56:52 actually aren't we also scoping in disabling NM setting the hostname on GCP? 16:57:03 I'd like to check plain NM-on-FCOS-on-GCP behavior first though, on long fqdn 16:57:03 Right 16:57:18 ok, so to summarize: path forward is to ask NM whether they're willing to truncate hostname in-tree, and if so we can work on carrying short-term script until that lands 16:57:23 IOW this is basically "move MCO code as is to FCOS (and inherit from RHCOS)" 16:57:39 let me make that a proposal 16:57:59 jlebon: has much changed since https://github.com/coreos/fedora-coreos-tracker/issues/649#issuecomment-718148008 ? 16:58:23 I'm a bit confused about moving the MCO code to FCOS; do we want that functionality outside of OKD/k8s? 16:58:32 only change I see is we've proposed using a NM dispatcher (might not work) versus afterburn as the short term hack 16:58:59 slowrie: it's a good topic; I would say more generally that anything that requires a routable hostname wants this (and that's more than just k8s) 16:59:00 slowrie: yeah I'm a bit confused too, i'd have to look at the code 16:59:19 it's "make the hostname Just Work on GCP" 16:59:34 dustymabe: it's a bit more concrete than the previous proposal i think 17:00:00 is this a bug in GCPs platform? we have a call with them every two weeks we could try to bring it up? 17:00:03 and we can ask NM whether there's a sane way to carry this in FCOS, and if not we use afterburn 17:00:12 jlebon: +1 17:00:15 walters: from what I've heard though it sounds like everything has it's own set of differing requirements on hostname (kernel vs kubelet) and I don't know if we want to / can maintain a script that works in all cases 17:00:32 ^ this 17:01:02 kernel hostname lenghth can be longer than what is allowed for TLS certificates (63 chars) 17:01:16 dustymabe: I think that GCP DHCP behavior is not technically wrong, so arguably it does not need to be fixed 17:01:18 hmm; i think this is just truncating it on GCP, I am not aware of any distinct requirements in kernel vs kubelet 17:01:32 Kublet is 63 characters 17:01:53 now for openshift I think we'd still carry the "wait until we have a non-localhost name" in the MCO, we're not scoping moving that into FCOS - that's kind of kubelet/openshift specific 17:02:11 does the kubelet pick up the hostname from /etc/hostname or from gethostname? 17:02:11 +1 to that 17:02:15 I don't know the exact set of differing requirements; the main one I've heard is the character length difference (63 from kubelet b/c of TLS, 64 from kernel) 17:02:36 Now sure, on where it gets the hostname from 17:02:43 it has to be gethostname 17:02:51 otherwise it would fail in DHCP cases 17:04:01 Kernel is 65 17:04:46 sounds like it would be nice to be able to have NM trucate the name and also maybe make the truncation threshold a configuration option 17:04:56 ok, let me make a proposal 17:05:40 #proposed we will ask NM whether they're willing to truncate the DHCP hostname in-tree, and if so we can work on carrying short-term script (or resort to afterburn logic for this) until that lands 17:05:44 * dustymabe notes there are some outstanding questions to be followed up on in https://bugzilla.redhat.com/show_bug.cgi?id=1892235 17:06:16 jlebon: ack 17:06:29 if NM *doesn't* think truncating the hostname is sane behaviour, then we should listen to their reasons 17:07:38 anyone else? 17:08:29 a strong argument for this we can present to NM is that systemd-networkd also does this: https://github.com/systemd/systemd/pull/7616 17:09:10 +1 / -1 anyone? 17:09:12 lucab, darkmuggle, walters: ack/nack? 17:09:20 the proposal is pretty close to what we had last week anyway 17:09:24 +1 17:09:44 makes sense to me; I am a bit confused by the 63-65 length thread but I'm not sure that should block (we'd adopt 63 right?) 17:09:56 yes, just that we didn't follow up to NM 17:10:12 #agreed we will ask NM whether they're willing to truncate the DHCP hostname in-tree, and if so we can work on carrying short-term script (or resort to afterburn logic for this) until that lands 17:10:27 ok, let's move on 17:10:30 walters: it might be different for different users, which is why I was thinking NM may want a config option for the threshold 17:10:33 jlebon: +1 17:10:40 one sec 17:10:40 now the reason we got here is the `fedora` hostname; presumably we'd add that special case in this script for now? 17:10:50 dustymabe: does https://github.com/coreos/fedora-coreos-tracker/issues/609 still have the meeting label on purpose? 17:11:13 #action lucab to follow up with NM team about hostname length and run this issue to ground 17:11:36 jlebon: mostly just to discuss progress of moving to f33 17:11:43 I don't have many updates there so we can skip it 17:11:49 ack, SGTM 17:11:57 #topic Fedora Test day for our `next` stream (Fedora 33) 17:12:00 #link https://github.com/coreos/fedora-coreos-tracker/issues/659 17:12:13 walters: the tl;dr is that we can truncate a hostname longer than 65 chars and NM wan't complain. But if the hostname is between 63 and 64 then NM will reset it to the longer name and give an ugly message about the hostname being changed outside NM. 17:12:35 darkmuggle: even if we set the flag to tell NM not to change the hostname, as we are right now? 17:12:36 #info there is a test day coming up on Friday November 6th. https://github.com/coreos/fedora-coreos-tracker/issues/659 17:12:56 Please help us review the existing test cases, add new ones, and also participate in the test day 17:12:59 darkmuggle, walters: let's carry on the NM convo elsewhere please :) 17:13:31 thanks to sumantro who has been amazing in helping us get it organized (this time and last time) 17:13:52 we can probably add some platforms to the template since last time 17:14:01 we even have a fedora badge we'll be handing out for participation 17:14:13 jlebon: I think sumantro added vultr ibmcloud and exoscale 17:14:16 sumantro++ that's really cool 17:14:16 jlebon: Karma for sumantro changed to 2 (for the current release cycle): https://badges.fedoraproject.org/tags/cookie/any 17:14:35 dustymabe and all , let me know if there is something I can do more 17:14:38 https://badges.fedoraproject.org/badge/fedora-33-coreos-test-day 17:14:47 hmm i don't see those in https://testdays.fedorainfracloud.org/events/98 17:15:35 dustymabe, I am missing vultr, but exoscale and ibm cloud are added 17:15:42 nice 17:15:51 dustymabe, thanks for designing the badge 17:15:59 sumantro: I just awarded the badge to you :) 17:16:05 ahh heh, vultr is the only one i was Ctrl+F'ing for :) 17:16:55 ok so basically, we need to make sure that existing test case procedures are still good, and then add any new relevant ones? 17:16:58 dustymabe, I am sumantrom on FAS :D 17:17:30 I think you awarded someone else , wrongly .. BUT thanks :D 17:17:40 ha.. boo 17:17:41 jlebon, yes! 17:18:30 ahh heh, and i see dusty already did an #info for that :) 17:18:39 anything else to discuss on this topic? 17:18:46 that's it for this topic 17:18:52 +1 17:19:09 anything else before open floor? 17:20:16 #topic Open Floor 17:20:31 so we did a round of releases, but we're going to do another set because of a problem? 17:20:33 is that right? 17:20:44 heh yeah was going to bring that up :) 17:21:04 correct. skunkerk_ is going to respin the testing and next once the patch is in 17:21:08 stable is unaffected 17:21:54 for the record, this is re. https://github.com/coreos/fedora-coreos-tracker/issues/660 17:22:53 anything else anyone wants to bring up? 17:23:20 just wanted to call out that we're close on our openstack testing 17:23:34 https://github.com/coreos/coreos-assembler/pull/1833 gets the mantle code base working 17:23:49 the only test failing right now on openstack is coreos.ignition.ssh.key 17:24:05 nice! 17:24:06 see https://github.com/coreos/coreos-assembler/issues/1835 for details on why 17:25:27 ok, closing meeting in 1 minute 17:26:29 #endmeeting