<@gurssing:matrix.org>
16:30:19
!startmeeting fedora_coreos_meeting
<@gurssing:matrix.org>
16:30:26
!topic roll call
<@jbtrystram:matrix.org>
16:30:34
!hi
<@gurssing:matrix.org>
16:30:43
!hi gursewak
<@dustymabe:matrix.org>
16:31:21
!hi
<@dustymabe:matrix.org>
16:31:21
<@ravanelli:matrix.org>
16:31:48
!hi ravanelli
<@gurssing:matrix.org>
16:32:26
!topic Action items from last meeting
<@siosm:matrix.org>
16:32:29
!hi
<@gurssing:matrix.org>
16:32:48
I don't see the last meeting's minutes in https://discussion.fedoraproject.org/tag/coreos-wg
<@ravanelli:matrix.org>
16:33:15
umm is the bot working?
<@aaradhak:matrix.org>
16:33:21
!hi aaradhak
<@gurssing:matrix.org>
16:33:36
That's a manual step.
<@ravanelli:matrix.org>
16:34:16
I don't mean the last minutes, but all other actions
<@jbtrystram:matrix.org>
16:34:20
Yeah the bot seems out
<@ravanelli:matrix.org>
16:34:55
Let me send a message in the infra channel, or seem if something is there already
<@dustymabe:matrix.org>
16:35:21
gursewak: I think maybe Jonathan Lebon didn't send them out?
<@ravanelli:matrix.org>
16:35:52
a jbtrystram already sent a message in the infra channel, thanks ;)
<@gurssing:matrix.org>
16:36:16
Going through the minutes, I don't see any new action items there
<@gurssing:matrix.org>
16:37:39
Moving on.
<@gurssing:matrix.org>
16:37:40
!topic Latest next VMWare OVA Fails To Boot
<@gurssing:matrix.org>
16:37:49
<@dustymabe:matrix.org>
16:40:22
I tagged this with the meeting label. We've characterized this a bit with Hristo Marinov and fifofonix. It looks like it's somehow related to the OVA we create. We need someone to dig in deeper. It needs to be someone who has access to VMWare and can run `cosa` to develop/iterate on a fix.
<@dustymabe:matrix.org>
16:41:15
for testing purposes it seems like VMWare fusion on MAC or Windows is good enough, but I think once we've developed a fix we need to make sure someone runs it on ESXi too
<@siosm:matrix.org>
16:41:27
So this is a fresh FCOS 41 failing to boot when a fresh F41 (package mode) image works?
<@dustymabe:matrix.org>
16:41:29
them both being able to first reproduce the issue and then verify the images with the "fix" solve the problem
<@meetbot:fedora.im>
16:41:51
Meeting started at 2024-10-02 16:30:19 UTC
<@meetbot:fedora.im>
16:42:02
The Meeting name is 'fedora_coreos_meeting'
<@dustymabe:matrix.org>
16:42:03
travier: kind of
<@zodbot:fedora.im>
16:42:11
Jean-Baptiste Trystram (jbtrystram) - he / him / his
<@dustymabe:matrix.org>
16:42:38
travier: let me explain the info we have so far
<@siosm:matrix.org>
16:42:47
!topic Latest next VMWare OVA Fails To Boot
<@hricky:fedora.im>
16:42:57
!hi
<@dustymabe:matrix.org>
16:43:06
travier: I've got someone to test F41 server (i.e. Anaconda from ISO) and they said it worked
<@dustymabe:matrix.org>
16:43:23
also got someone to test installing with our ISO (bare metal workflow) - it works
<@ravanelli:matrix.org>
16:43:29
If it is just a matter of testing VMWare fusion on MAC, maybe I can give it a try:
<@jlebon:fedora.im>
16:43:40
!hi
<@dustymabe:matrix.org>
16:43:48
got someone to test by first using `testing` F40 OVA and then rebasing to `next` F41 OVA - it works
<@zodbot:fedora.im>
16:43:54
Gursewak Singh (gursewak)
<@zodbot:fedora.im>
16:43:54
None (jlebon)
<@zodbot:fedora.im>
16:43:55
Hristo Marinov (hricky) - he / him / his
<@zodbot:fedora.im>
16:43:56
Renata Ravanelli (ravanelli)
<@dustymabe:matrix.org>
16:44:19
F41 OVA doesn't work
<@jlebon:fedora.im>
16:44:20
is anyone else having matrix issues?
<@dustymabe:matrix.org>
16:44:25
F41 `next` OVA doesn't work
<@dustymabe:matrix.org>
16:44:51
though hmm.. I think fifofonix did mention that after the rebase from `testing` to `next` a `bootupctl update` caused the system to stop working
<@hricky:fedora.im>
16:44:53
Me.
<@siosm:matrix.org>
16:45:04
Jonathan Lebon: the bot is in slow mode
<@ravanelli:matrix.org>
16:45:14
They just restarted the channel bot Jonathan Lebon , other than that, it is normal here
<@zodbot:fedora.im>
16:46:07
Timothée Ravier (siosm) - he / him / his
<@dustymabe:matrix.org>
16:46:13
so at times during this I've thought it was the bootloader (GRUB was upgraded to 2.12, which was a large update) and at times I've thought it was something to do with the OVA
<@zodbot:fedora.im>
16:46:14
Sorry, I can only look up one username at a time
<@siosm:matrix.org>
16:46:52
Hristo Marinov said in the last comment that booting next directly was failing. This is really weird.
<@siosm:matrix.org>
16:48:00
we used to install the bootloader from cosa to the VM but we no longer do AFAIK?
<@dustymabe:matrix.org>
16:48:33
travier: correct. It should be the version exactly from the payload
<@siosm:matrix.org>
16:49:55
Ah, so it likely is something with the bootloader or shim
<@dustymabe:matrix.org>
16:50:37
Yes. but I think we need more of a commitment than that. i.e. we can get people already to try specific things, but what we really need is for someone with access to try a bunch of different things and also test potential fixes
<@siosm:matrix.org>
16:50:43
really weird that this would not affect classic Fedora. Maybe it's a difference in the GRUB configs? But that would still boot and fail, not error out directly
<@dustymabe:matrix.org>
16:51:00
travier: I thought so, but installing via bare metal workflow works if reports are correct
<@dustymabe:matrix.org>
16:51:11
should be the same as the OVA??
<@jlebon:fedora.im>
16:51:30
hmm, wonder if it's related to the versioning info in the OVA + e.g. some changes in the kernel
<@jlebon:fedora.im>
16:51:51
e.g. if we actually needed to bump the min version
<@siosm:matrix.org>
16:51:55
yeah, installing bare metal from ISO should be 100% like the OVA minus the command line kargs change?
<@ravanelli:matrix.org>
16:52:02
I can help with the rest as well.
<@dustymabe:matrix.org>
16:52:15
Thanks Renata Ravanelli !
<@dustymabe:matrix.org>
16:52:36
ideally we'd have some VMWare infra just accessible to us in Fedora too
<@hricky:fedora.im>
16:52:49
I can also help with testing.
<@dustymabe:matrix.org>
16:53:37
!action Renata Ravanelli to help us dig down and diagnose the issue further to find the root cause
<@jlebon:fedora.im>
16:54:14
https://github.com/coreos/fedora-coreos-config/blob/2d3404c98ae3daeb8914894ed99cfaa7f8e9938f/image-base.yaml#L24
<@jlebon:fedora.im>
16:54:14
e.g. right now we have: `vmware-hw-version: 17`
<@ravanelli:matrix.org>
16:54:31
Yeah, it would be really good to even have a CI test running with VMWare
<@gurssing:matrix.org>
16:56:00
Moving on
<@gurssing:matrix.org>
16:56:14
!topic Roadmap to Fedora Bootable Containers
<@gurssing:matrix.org>
16:56:21
<@jlebon:fedora.im>
16:56:41
gursewak: i untagged that one earlier today, but i think the meeting ticket was already generated
<@jlebon:fedora.im>
16:56:51
i forgot to untag it immediately after the meeting last week
<@gurssing:matrix.org>
16:57:06
Got it.
<@gurssing:matrix.org>
16:57:17
That's all the issues tagged.
<@siosm:matrix.org>
16:57:34
For this issue, we talked about it in the bootc meeting
<@siosm:matrix.org>
16:58:17
I suggested doing a tier-y with just enough ignition support to be able to re-use our kola + cosa workflow with the bootc images to leverage the infra and testing integration that we have right now but for bootc
<@siosm:matrix.org>
16:58:33
For this issue, we talked about it in the bootc meeting earlier this week
<@siosm:matrix.org>
16:58:57
(we would not add ignition to bootc images, this would be just for testing)
<@dustymabe:matrix.org>
16:58:59
so now tier-x and tier-y ?
<@siosm:matrix.org>
16:59:39
Yes. It's a mean to get gating for packages in bodhi on "some" bootc testing as soon as possible
<@siosm:matrix.org>
17:00:30
I think it's where we'll get the most value. The manifest duplication is not great but it's not the most painful part. If we can avoid breaking package update landing in Fedora then we get much more
<@jlebon:fedora.im>
17:00:52
there's a wide gap right now with how CI is done in the rest of Fedora.
<@jlebon:fedora.im>
17:00:52
travier: my concern with doing that is that we would very likely be primary maintainers for it. ideally this is owned by a wider group.
<@dustymabe:matrix.org>
17:01:15
but we already have FCOS tests running on packages in Fedora.. where we could just make the results of those test gate already?
<@siosm:matrix.org>
17:01:50
Agree that we would likely be the one maintaining it. Hopefully, this would significantly lighten the load on the FCOS side and compensate that.
<@dustymabe:matrix.org>
17:01:53
For those interested - please follow https://matrix.to/#/#jenkins-coreos:fedoraproject.org
<@jlebon:fedora.im>
17:03:34
1. add fedora bootc CI setup for running in bodhi updates (using e.g. testing farm or tmt, which i think is what is used downstream as well in rhel-bootc and centos-bootc)
<@jlebon:fedora.im>
17:03:34
3. change proposal to turn on gating. maintain CI together
<@jlebon:fedora.im>
17:03:34
2. upstream tests that make sense from FCOS to the fedora bootc level. this might require reworking things to work in the new framework
<@jlebon:fedora.im>
17:03:34
roughly the steps I see are:
<@siosm:matrix.org>
17:03:52
Gating on FCOS tests would also help but does not bring us closer to sharing tests.
<@siosm:matrix.org>
17:04:59
From my experience with Zuul (I'm not sure this fully applies to tmt/testing farm), 1 would be a lot of work
<@jbtrystram:matrix.org>
17:05:26
step 2 is a lot of work as well from what I understand (moving from ignition configs to SSH)
<@jlebon:fedora.im>
17:05:38
travier: yeah, that's exploration to do there for sure. the QE folks on the bootc team have a lot of experience with it
<@siosm:matrix.org>
17:05:41
Zuul is centered around booting an pre-built VM image or a container an running Ansible playbooks or scripts in it
<@dustymabe:matrix.org>
17:07:29
we are literally like 1 step away from success with this 👆️ though
<@dustymabe:matrix.org>
17:07:46
I realize that has nothing to do with bootc, but man we got so close
<@siosm:matrix.org>
17:07:58
agree
<@jlebon:fedora.im>
17:07:59
there's a lot of tradeoffs, but basically i think for this to work as intended, we need a CI platform designed and brought up by the wider group
<@siosm:matrix.org>
17:08:04
how flaky is it right now?
<@dustymabe:matrix.org>
17:08:19
travier: look at the history in the channel
<@dustymabe:matrix.org>
17:08:41
it's not bad right now
<@jlebon:fedora.im>
17:08:56
dustymabe: appearances can be deceiving :) there's a wide gap to cross before we can scale it up. probably work in bodhi, and work in the community
<@dustymabe:matrix.org>
17:08:58
but of course would increase pressure if some tests were flaking for whatever reason
<@jlebon:fedora.im>
17:09:39
unless we're ok with keeping the scope where it is (i.e. a small subset of packages)
<@dustymabe:matrix.org>
17:09:40
I guess I'm missing the remaining pieces
<@siosm:matrix.org>
17:09:46
we can probably pursue both in parallel.
<@dustymabe:matrix.org>
17:10:01
it's a good start anyway
<@jlebon:fedora.im>
17:10:01
the way it's implemented right now is really awful
<@siosm:matrix.org>
17:10:53
what do you mean?
<@jlebon:fedora.im>
17:11:00
(i say this as the person that implemented it :) )
<@jlebon:fedora.im>
17:12:07
travier: we have this regex that combines all the packages to watch for the messages we want: https://github.com/coreos/coreos-ci/blob/c088dcc406170d2eee426505e33bcd7088bbb2cd/jobs/bodhi-trigger.Jenkinsfile#L62-L65
<@jlebon:fedora.im>
17:12:50
this works because we don't currently watch that many packages yet, but it doesn't scale
<@jlebon:fedora.im>
17:13:13
there's also the fact that gating and critical path packages are concepts that are linked in ways we may not want to for this
<@dustymabe:matrix.org>
17:13:48
re: gating/critical path - we'd have to figure that out with `bootc` too?
<@jlebon:fedora.im>
17:14:22
we're hitting all this because we're doing it on the side. with bootc, i think the goal would be to integrate into the existing Fedora CI effort
<@jlebon:fedora.im>
17:14:46
or at least not bring up another type of CI
<@dustymabe:matrix.org>
17:15:42
I think that's fine, but I don't think it's going to be here soon (maybe Fedora 43 or 44) ?
<@jlebon:fedora.im>
17:16:10
i guess this is straying from the topic a bit :)
<@jlebon:fedora.im>
17:16:10
what i'd say is: let's talk with the bootc QE folks and where their minds are at
<@siosm:matrix.org>
17:17:33
not really
<@siosm:matrix.org>
17:18:04
if we rebase to the tier-x manifest then we also depend on their CI setup
<@jlebon:fedora.im>
17:19:27
travier: not sure i follow the connection
<@dustymabe:matrix.org>
17:19:56
<@dustymabe:matrix.org>
17:19:56
> rebase to the tier-x manifest
<@dustymabe:matrix.org>
17:19:56
are you talking about https://github.com/coreos/fedora-coreos-config/pull/3177 ?
<@siosm:matrix.org>
17:20:17
yes
<@dustymabe:matrix.org>
17:20:58
ehh. I don't think so. We're still relying on our CI. We're just inheriting the package list (mind that it's not locked NVRs or depsolved)
<@jlebon:fedora.im>
17:21:41
once we have fedora bootc tier-x with gating CI, then we can actually work on derivation for real, and the tier-x manifest stuff in that PR would be gone
<@dustymabe:matrix.org>
17:21:51
but I think we do want the upstream definition to be more scrutinized - hence: https://gitlab.com/fedora/bootc/tracker/-/issues/40
<@jlebon:fedora.im>
17:22:44
that PR is a way to share now, with the objective that the goal should be that we share at the image or "lockfile" level eventually
<@jlebon:fedora.im>
17:23:11
(or some other level that gives us guarantees that CI on tier-x remains meaningful even as we add our things)
<@dustymabe:matrix.org>
17:23:44
I think if we know the goal is to share in the future it's a good first step, but we do have that open request out to them about branches that will make it easier to deal with the submodules
<@jlebon:fedora.im>
17:24:15
yeah, that's a big one
<@dustymabe:matrix.org>
17:24:23
https://gitlab.com/fedora/bootc/tracker/-/issues/39
<@siosm:matrix.org>
17:25:41
what I mean is that we need some form of CI that validates changes in bootc for downstream consumers. That can be upstreaming some of tests in bootc or directly running FCOS tests on bootc PRs
<@siosm:matrix.org>
17:26:15
if we have neither, then we rely on bootc CI only for changes in bootc, which we'll only really test when we bump the submodule "downstream" in fcos
<@siosm:matrix.org>
17:26:21
which means is not ideal
<@siosm:matrix.org>
17:26:42
what I mean is that we need some form of CI that validates changes in bootc for downstream consumers. That can be upstreaming some of FCOS tests in bootc or directly running FCOS tests on bootc PRs
<@siosm:matrix.org>
17:26:54
which means it is not ideal
<@dustymabe:matrix.org>
17:26:59
agree. We want changes to upstream to be tested
<@jlebon:fedora.im>
17:27:24
basically, we (and probably e.g. Iot) have a lot of tests that aren't actually FCOS-specific and we would want to upstream those
<@jlebon:fedora.im>
17:27:24
right, gotcha. yes, i see this as a prerequisite to derivation.
<@dustymabe:matrix.org>
17:28:13
shall we close off this topic?
<@gurssing:matrix.org>
17:28:47
Almost time anyways.
<@gurssing:matrix.org>
17:28:49
!topic Open Floor
<@gurssing:matrix.org>
17:29:40
Thanks everyone for coming.
<@gurssing:matrix.org>
17:29:54
!endmeeting