16:29:32 #startmeeting fedora_coreos_meeting 16:29:32 Meeting started Wed Apr 19 16:29:32 2023 UTC. 16:29:32 This meeting is logged and archived in a public location. 16:29:32 The chair is dustymabe. Information about MeetBot at https://fedoraproject.org/wiki/Zodbot#Meeting_Functions. 16:29:32 Useful Commands: #action #agreed #halp #info #idea #link #topic. 16:29:32 The meeting name has been set to 'fedora_coreos_meeting' 16:29:38 #topic roll call 16:29:42 .hi 16:29:43 dustymabe: dustymabe 'Dusty Mabe' 16:30:30 .hi 16:30:31 fifofonix: fifofonix 'Fifo Phonics' 16:31:03 .hi 16:31:04 ravanelli: ravanelli 'Renata Ravanelli' 16:31:22 .hello davdunc 16:31:22 davdunc[m: davdunc 'David Duncan' 16:31:36 .hello marmijo 16:31:37 marmijo[m]: marmijo 'Michael Armijo' 16:32:37 .hello2 16:32:38 jlebon: jlebon 'None' 16:33:41 #chair fifofonix ravanelli davdunc[m marmijo[m] jlebon 16:33:41 Current chairs: davdunc[m dustymabe fifofonix jlebon marmijo[m] ravanelli 16:34:42 thanks everyone for coming! 16:34:48 let's get started 16:35:37 #topic Action items from last meeting 16:35:42 * dustymabe to open a new issue related to the "regular dbx updates" feature 16:35:44 * jlebon to open a new issue related to the "regular bootloader updates" feature 16:35:50 #info dustymabe opened issue for regular dbx updates: https://github.com/coreos/fedora-coreos-tracker/issues/1478 16:35:52 #info jlebon opened issue for regular bootloader updates: https://github.com/coreos/fedora-coreos-tracker/issues/1468 16:36:39 ok moving on to meeting topics 16:36:48 #topic Rollback from F38 to F37 followed by another F38 upgrade can lead to loss of SSH access 16:37:17 #link https://github.com/coreos/fedora-coreos-tracker/issues/1473 16:37:28 so this one is... interesting 16:37:44 on the 37->38 upgrade there is a migration script that runs that updates permissions on SSH keys 16:37:53 the host keys live in /etc/ 16:38:07 the migration script drops down a stamp file in /var/ to tell it not to run the migration again 16:38:29 upon rollback the keys get rolled back (in /etc/) 16:38:54 but the stamp file in `/var/` lives on 16:39:11 thus upon a later upgrade 37->38* 16:39:16 the migration doesn't happen 16:39:40 .hi 16:39:41 jdoss: jdoss 'Joe Doss' 16:39:51 .hi 16:39:52 bgilbert: bgilbert 'Benjamin Gilbert' 16:39:56 #chair jdoss bgilbert 16:39:56 Current chairs: bgilbert davdunc[m dustymabe fifofonix jdoss jlebon marmijo[m] ravanelli 16:40:23 jdoss: bgilbert we're just discussing https://github.com/coreos/fedora-coreos-tracker/issues/1473 and if there's anything we can do about it 16:41:45 I agree with bgilbert on the no support rollbacks, but losing SSH access is a major bummer. 16:41:45 my current sense is: if we want to put in the effort, then we can, and SSH is indeed special. but in general we can't commit to this rollback ever ever working 16:42:06 +1 16:43:03 bgilbert: would another way to say be that rollbacks are best effort? 16:43:10 sure 16:43:53 i think this is something we all knew deep down, but I don't know if we have any language anywhere about that to make it clear 16:44:07 my experience so far has been that rolling back and re-upgrading has been pretty reliable 16:44:22 but there are just too many factors involved to make any sort of gurantee on it 16:44:22 fifofonix: ditto 16:44:43 so maybe we can find a spot in our docs where it would be approprate to update language 16:44:44 sure, it'll work except to the extent it doesn't 16:44:49 .hello siosm 16:44:50 travier: siosm 'Timothée Ravier' 16:44:56 +1 to docs 16:45:16 indeed 16:45:28 ok let's focus on IF we want to do something for this case and if so, what? 16:46:00 obviously we can document the workaround (which is to remove the stamp file) 16:46:14 but that does require the user to be able to get back in to their machine 16:47:18 They can remove it while on F37 16:47:21 jlebon's suggestion in https://github.com/coreos/fedora-coreos-tracker/issues/1473#issuecomment-1513512301 is to essentially ship a helper unit that removes the stamp file on every boot (guaranteeing the migration code runs every boot) 16:47:23 but yeah that's not great 16:47:36 travier: the user doesn't know it's a problem until they try to re-upgrade to f38* 16:47:54 we can ship a drop-in that removes the drop file check for now and then remove it in a later release 16:47:57 so they then have to select the older entry in grub (or use a password to log in on the console) 16:48:00 after the 38 barrier 16:48:36 travier: ahh, so that's a different version of what jlebon suggested.. yours uses a dropin instead of a separate unit 16:48:51 Since ostree can do rollbacks, and that is a highlight of FCOS, doing best effort when moving between major Fedora releases seems reasonable. 16:49:11 jdoss: it also happens to be the time you'd need rollbacks the most 16:49:20 yep totally. 16:49:39 travier: nice, that's a tinier patch even 16:51:16 so.. it's looking like we'll need to spin a new `testing` anyway for the proxy issue: https://github.com/coreos/fedora-coreos-tracker/issues/1477 16:51:21 I can not find how ConditionPathExists= logic is handled in systemd 16:51:31 if it's a AND or a OR 16:51:36 #info we paused the F38 rollout because of an issue related to updates behind a proxy: https://github.com/coreos/fedora-coreos-tracker/issues/1477 16:51:41 patch sounds good (notes to side that fedora docs more broadly than fcos are silent on this nuance - implying rollback supported - silverblue / iot) 16:51:44 ConditionPathExists=!foo 16:51:44 ConditionPathExists=foo 16:52:03 travier: I think if you just add ConditionPathExists= in a dropin it will cancel previous ConditionPathExists= 16:52:15 hum, not sure, you can have several 16:52:21 right 16:52:30 a single empty entry will cancel all previous 16:52:43 and then you get to start from scratch defining them again (in the dropin) 16:52:57 fifofonix: silverblue at least assumes you're more hands on and not accessing via ssh so it's easier to deal with possible rollback fallout 16:53:26 jlebon: I think that might not be obvious to the users :) 16:53:49 yeah, docs there too wouldn't hurt. but i think it's a less critical issue there in the first place 16:54:09 ok so do we want to try to ship a dropin? at least it will help when we have our 38->`stable` transition 16:54:22 and then we'd remove it in a barrier? 16:54:26 given that it's a trivial fix, i think we should 16:54:40 +1 16:54:46 +1, shipping a smal droppin should be easy to test 16:54:49 small* 16:55:36 #proposed we will ship a systemd dropin to remove the stamp file ConditionPathExists= on the migration unit so the idempotent migration code will run on every boot until we remove it after a barrier release 16:55:49 +1 16:55:55 +1 16:56:16 ack 16:56:18 +1 16:56:22 +1 16:56:28 +1 16:56:35 #agreed we will ship a systemd dropin to remove the stamp file ConditionPathExists= on the migration unit so the idempotent migration code will run on every boot until we remove it after a barrier release 16:56:58 I'm loving all the votes/input today 😍 16:57:17 🍪 16:57:41 #topic Upgrade LUKS key derivation function on (major?) updates 16:57:45 #link https://github.com/coreos/fedora-coreos-tracker/issues/1474 16:58:36 travier: want to intro this one? 16:59:16 travier: thanks for opening this. When I read mjg59's post I double checked all of my LUKS setups and thankfully they were new-ish installs. 17:00:17 honestly I think this class of problem (along with the bootloader updates issue that we've recently discussed) is why I reprovision my systems ~ once a year 17:00:34 I agree with the post's call to have the distros handle this for users. 17:00:57 if not every 30 days. :) 17:01:27 I can take the intro 17:01:50 bgilbert: +1 17:01:56 davdunc[m: I like you 17:02:24 LUKS disk encryption volumes are encrypted with a key which is unrelated to the password you type in at the console 17:02:32 jdoss: :) 17:02:33 (or that Tang handles, etc.) 17:03:16 that main key is essentially encrypted with your password (or Tang's key). there can be multiple "key slots", which each encrypt the main key, so that e.g. multiple passwords can be used to unlock the volume 17:03:32 but your password probably isn't very random 17:03:50 mine is hunter2 17:04:10 so a "password-based key derivation function" (PBKDF) is used to convert your password to the key that's used to decrypt the main volume key 17:04:26 password unlocks> key slot unlocks> volume key 17:04:49 the job of the PBKDF is to make brute-forcing harder 17:05:33 if you want to brute-force jdoss's password "hunter2", you should have to do a lot of work to generate the key that actually decrypts the key slot 17:05:48 historically that meant "use a lot of CPU time", but GPUs exist and are good at parallelizing things 17:05:56 so now it means "use a lot of CPU and a lot of memory" 17:06:28 LUKS supports multiple PBKDFs, but volumes that were created a while ago use an older one that isn't as good at requiring a lot of memory 17:06:34 * dustymabe notes this is a beautifully crafted intro to this problem by bgilbert 17:06:41 dustymabe: <3 17:06:45 thanks bgilbert! 17:07:23 it's possible to upgrade to a new PBKDF, but there are some factors: 17:08:05 1. each key slot needs to be updated, and you have to update all key slots (not necessarily at the same time) to fully improve your security 17:08:23 2. you need the slot's password to update it (because you're re-encrypting with that password) 17:08:38 3. GRUB doesn't support the newest PBKDF (doesn't matter for us, since we don't encrypt /boot) 17:09:13 4. rewriting key slots feels scary: if you get it wrong somehow, you've lost access to your disk 17:10:12 it'd be nice to handle this transition automatically. ideally we'd just piggyback on existing tooling (e.g. upstream dracut glue would just handle this) but mjg's point is that distros aren't really doing this right now 17:10:25 hmm, one question here is what the default PBKDF was at the time FCOS stable started supporting LUKS 17:10:46 bgilbert: or did you already check that it changed since? 17:10:50 we do have some of our own initrd glue though, so in principle we could pursue this ourselves, or work on getting some tooling upstream 17:11:03 jlebon: I have not 17:11:24 Would an easy stopgap be have a systemd unit check for old LUKS and barf out a message to the login stuff we do pointing to documentation on how they can do it manually? 17:11:36 but my current dev machine I believe is post-FCOS-LUKS and uses an older PBKDF 17:11:39 on my FSB, which is provisioned last year, it's argon2id, which per the blog is fine. wonder when that change happened. 17:11:39 I think I'm against pursuing this ourselves.. if anything we should work as part of a subgroup within Fedora to try to solve the problem for Fedora as a whole 17:12:11 jdoss: that UX isn't great 17:12:26 fair 17:12:33 dustymabe: definitely, though it'll likely require some special integration for FCOS/FSB/IOT 17:12:45 oh, I should also note the threat model here: someone walks off with a copy of your encrypted data and then spends a bunch of CPU time to crack it 17:12:55 well, GPU time 17:12:56 One of my suggestion was similar to jdoss in that we add something to CLHM & write docs for now until we have something better 17:13:13 argon2i isn't broken AFAIK, it's just less good 17:13:34 jlebon: true.. this feels like a system-wide fedora change that we should be a part of (maybe not necessarily owning it, but at least giving input and making sure our use cases are handled) 17:14:04 We don't have a good story for encryption on FCOS right now given that only Tang setups are "secure" 17:14:22 this is an example where *CoreOS can help drive changes into all of Fedora and make it better 17:14:48 tpm ones are ok regarding theft 17:14:53 travier: you can't have an encrypted disk outside of tang? 17:15:40 non, that's not what I'm saying. The only encrypted setups in FCOS right now that resists to theft are tpm & tang 17:15:48 hmm 17:16:18 FCOS _can_ read a password from the keyboard at boot, but all of our documented/encouraged use cases are noninteractive 17:16:27 yeah ^^ 17:16:29 which means they use random keys 17:16:32 i was thinking about that use case 17:16:41 ok, this one works as well but I don't really think a lot of folks are doing it 17:16:42 kind of doesn't work well for automatic updates, though 17:16:51 but agree that this one also works 17:16:54 +1 17:16:54 the whole point of brute-forcing is that passwords aren't evenly distributed in the input space 17:17:08 so aaaaactually this may be largely a non-issue for us 17:17:19 bgilbert: would you be able to write up your nice intro in the GH issue? 17:17:29 i found it very helpful in understanding the problem 17:17:42 okay 17:17:57 i think the open question here is: what do we want to do about it? 17:18:12 I'm leaning "leave the bug open" 17:18:14 there's the docs+CLHM helper suggestion (which could be just an intermediate thing) 17:18:33 there's also the Fedora System Wide change and work with other teams option 17:18:35 what is CLHM again? 17:18:40 console-login-helper-messages 17:18:43 ty 17:19:11 there's also the option for us to impelement something on our own (but I'm not a big fan of this option) 17:19:15 I think our use cases are less likely to be affected than the general population 17:19:23 I'm not really concerned about this issue for FCOS but I opened it so that we track this 17:19:31 travier: +1 17:19:36 travier: +1 17:19:40 4th option: do nothing 17:20:06 we can do nothing now, but we'll have to do something "at some point" 17:20:30 again: will we? 17:21:02 if you're using passwords you're already off the beaten path 17:21:14 warning users that they are using a weak pbkdf is the minimum from my perspective 17:21:24 +1 17:21:33 my point is that in almost all cases, _they're not really passwords_ 17:21:35 bgilbert: you could be using LUKS on a non-root device 17:21:42 jlebon: and hand-unlocking it? 17:21:58 luks on non-root is not theft resistant 17:22:05 yeah 17:22:07 unless root is on luks too 17:22:12 travier: THIS ^^ 17:22:25 travier: can you expand? 17:22:31 do you mean if using a keyfile? 17:22:42 with keyfiles yes 17:22:48 so this problem only presents itself IF the user is using a password? 17:23:20 dustymabe: the PBKDF only matters if it's increasing the effective entropy of the input, yeah 17:23:20 but nothing stops you from deleting the keyfile post ignition 17:23:37 dustymabe: if the input is already 128 bits of random data, you still have to search the whole input space 17:23:41 bgilbert: and with tang or tpm you don't have that issue? 17:24:16 dustymabe: right, those both use random keys 17:24:21 ok 17:24:54 anyway, overall not super concerned either for FCOS. just wanted to mention this since ignition does support it. 17:24:56 I think my proposal is we implement nothing for now but try to seek out within Fedora/RHEL people who are looking at this problem (they have to exist, right?) 17:25:22 +1 17:25:34 +1 17:25:39 #action bgilbert to write up a comment in the bug 17:26:01 might be interesting to know what the default was when we started supporting LUKS root 17:26:32 but overall +1 17:27:12 # proposed we think the use case of using a password to unlock encrypted disks (which is where this issue has the most effect) for Fedora CoreOS isn't common. For now we'll do nothing but will reach out to the Fedora community to see if anyone is thinking about or working on this problem already. 17:27:22 +1 17:27:23 +1 17:27:46 aside: anyone want to volunteer to do the reaching out? could be a fedora devel list email 17:28:23 #agreed we think the use case of using a password to unlock encrypted disks (which is where this issue has the most effect) for Fedora CoreOS isn't common. For now we'll do nothing but will reach out to the Fedora community to see if anyone is thinking about or working on this problem already. 17:28:31 I didn't see anyone voting against :) 17:29:11 and.. I think we are out of time :( 17:29:18 #topic open floor 17:29:38 I got some new gear for my datacenter and I am going to try installing FCOS on some CM4s soon 17:29:45 https://usercontent.irccloud-cdn.com/file/LQBQZpvI/PXL_20230419_165217208.jpg 17:29:45 nice 17:30:02 I got 2x Turing Pi 2 boards. 17:30:20 dustymabe: I will be crying to you when I get stuck :) 17:30:49 ha 17:31:06 hopefully it all works flawlessly 17:31:28 * dustymabe closes out the meeting in a few minutes if no new discussion 17:31:29 One could hope. I think the RPi4 docs should work. 17:31:36 just to mention i have pinned my gpu processing on f37 nodes for now. 17:31:47 fifofonix: for the CIFS issue? 17:31:50 can't get f38 working due to missing kernel-headers. 17:32:06 known issue not having a good means for kernel headers. 17:32:14 (we've discussed before) 17:32:48 hoping that soon we'll get some kernel versions that have koji matching headers. 17:32:49 ahh +1 17:34:00 #endmeeting