<@sgallagh:fedora.im>
16:01:00
!startmeeting ELN SIG (2024-09-20)
<@meetbot:fedora.im>
16:01:01
Meeting started at 2024-09-20 16:01:00 UTC
<@meetbot:fedora.im>
16:01:01
The Meeting name is 'ELN SIG (2024-09-20)'
<@sgallagh:fedora.im>
16:01:03
!meetingname eln
<@meetbot:fedora.im>
16:01:03
The Meeting Name is now eln
<@sgallagh:fedora.im>
16:01:09
!topic Init Process
<@sgallagh:fedora.im>
16:01:10
!hi
<@zodbot:fedora.im>
16:01:11
Stephen Gallagher (sgallagh) - he / him / his
<@tdawson:fedora.im>
16:01:16
!hi
<@yselkowitz:fedora.im>
16:01:17
!hi
<@zodbot:fedora.im>
16:01:17
Troy Dawson (tdawson)
<@zodbot:fedora.im>
16:01:18
Yaakov Selkowitz (yselkowitz)
<@adamwill:fedora.im>
16:02:38
!hi
<@zodbot:fedora.im>
16:02:38
Adam Williamson (adamwill) - he / him / his
<@sgallagh:fedora.im>
16:03:33
adamw: Thanks for making it. I know the timing is poor for you.
<@sgallagh:fedora.im>
16:05:00
OK, I think the Meta folks are out of town today, so this is probably everyone we're going to get.
<@sgallagh:fedora.im>
16:05:07
!topic Agenda
<@sgallagh:fedora.im>
16:05:24
!info Agenda Item: Logos and backgrounds (10 minute cap)
<@sgallagh:fedora.im>
16:05:42
!info Agenda Item: Rethinking build/errata batches (remainder)
<@sgallagh:fedora.im>
16:07:11
!topic Logos and backgrounds
<@sgallagh:fedora.im>
16:07:20
Troy Dawson: Take it away, please
<@tdawson:fedora.im>
16:07:36
<@tdawson:fedora.im>
16:07:53
Right now ELN has no backgrounds.
<@tdawson:fedora.im>
16:08:25
As it says in the ticket, we have three options - Grab the generic ones, use Fedoras, or create our own.
<@sgallagh:fedora.im>
16:08:40
And we haven't broken adamw's needles lately, so we need to change that :)
<@adamwill:fedora.im>
16:09:18
you're helping
<@tdawson:fedora.im>
16:09:24
In my mind, I'd really like to try for the last option, create our own.
<@tdawson:fedora.im>
16:09:48
That way it would be obvious when someone is running ELN.
<@tdawson:fedora.im>
16:10:26
I'm willing to create them, if people are ok with that.
<@sgallagh:fedora.im>
16:10:27
I'm not sure if anyone is running ELN as a desktop. (And I'm a little anxious to learn that they are)
<@sgallagh:fedora.im>
16:10:40
But I don't see a problem with letting you give it a go.
<@tdawson:fedora.im>
16:10:40
That is a very valid point.
<@yselkowitz:fedora.im>
16:11:30
the forthcoming switch to wayland-based anaconda will probably do that too
<@tdawson:fedora.im>
16:12:10
I used to do this (update logos and backgrounds) in the past, and haven't for several years, so for me this would be more of a fun project, if people are ok with it.
<@adamwill:fedora.im>
16:12:15
the good news just keeps coming.
<@sgallagh:fedora.im>
16:13:07
Anyone opposed to letting Troy hack something together and present it at a future meeting?
<@sgallagh:fedora.im>
16:13:29
If it doesn't work out, we can always fall back to one of the other two options
<@tdawson:fedora.im>
16:14:30
I'll take the silence as a "go for it" :)
<@sgallagh:fedora.im>
16:14:44
Yes, let's do that. Let us know when you have a prototype in the ticket, please
<@tdawson:fedora.im>
16:14:52
Will do.
<@tdawson:fedora.im>
16:14:54
Thank you
<@sgallagh:fedora.im>
16:15:28
OK, let's move on to the meaty topic of the week
<@sgallagh:fedora.im>
16:15:42
!topic Rethinking build/errata batches
<@sgallagh:fedora.im>
16:15:53
<@sgallagh:fedora.im>
16:17:03
So there's really two different pieces of work here (and I'll probably split this ticket)
<@sgallagh:fedora.im>
16:17:29
Actually, let me rephrase things in terms of questions we need to answer first:
<@sgallagh:fedora.im>
16:18:45
3) How do we gate packages without breaking subsequent EBS batches?
<@sgallagh:fedora.im>
16:18:45
1) Do we want testing/gating of batches of packages in ELN?
<@sgallagh:fedora.im>
16:18:45
2) Do we want testing/gating of composes prior to syncing them to mirrors?
<@sgallagh:fedora.im>
16:19:16
I don't want to rehash everything already in the ticket; it's quite comprehensive thanks to adamw 's summary skills.
<@adamwill:fedora.im>
16:19:46
"summary"
<@yselkowitz:fedora.im>
16:20:45
#2 is closest to c10s, right?
<@sgallagh:fedora.im>
16:21:20
I'll redirect that question to Troy
<@tdawson:fedora.im>
16:21:43
Yep, it currently is ... although we don't have all of c10s tests working yet.
<@tdawson:fedora.im>
16:22:00
But it is what we are doing for cs in general, c9s and c10s
<@sgallagh:fedora.im>
16:22:46
I think we generally agree that "tests = good", but the main issue is gating.
<@adamwill:fedora.im>
16:22:47
btw, i did have one idea that's not in the ticket: can we just 'inherit' the rawhide update groupings?
<@adamwill:fedora.im>
16:23:10
have the batcher look at how updates are group in rawhide bodhi and follow that, leaving out any packages that aren't in ELN
<@adamwill:fedora.im>
16:23:18
have the batcher look at how updates are grouped in rawhide bodhi and follow that, leaving out any packages that aren't in ELN
<@tdawson:fedora.im>
16:23:36
I think having compose testing is the easiest of the ideas. Everything else has all sorts of exceptions and counting and that sort of thing.
<@sgallagh:fedora.im>
16:23:38
adamw: That turns out to be much harder than it seems.
<@adamwill:fedora.im>
16:23:42
d'oh.
<@sgallagh:fedora.im>
16:24:13
At least in part because since our batches sometimes take a long time, occasionally we get multiple releases of the same package queued up.
<@sgallagh:fedora.im>
16:24:20
We don't rebuild it twice, we just ignore the older one.
<@sgallagh:fedora.im>
16:24:37
So which Bodhi batch would that one need to match? No one knows.
<@adamwill:fedora.im>
16:24:42
Troy Dawson: to be clear, we already have compose testing, and as of a couple weeks ago we have update testing. but we don't have any gating, and we have identified some issues around the update testing that are all to do with (not) grouping updates.
<@sgallagh:fedora.im>
16:25:21
And also gating inherently delays the built packages from getting into the buildroot, which can have domino effects on the next batch
<@adamwill:fedora.im>
16:25:58
Troy Dawson: and of course the testing we have is nothing at all like the centos stream testing, but that's a sidebar.
<@sgallagh:fedora.im>
16:26:17
We've gotten away with things this long because it's fairly rare for ELN rebuilds to fail tests that Rawhide builds pass.
<@yselkowitz:fedora.im>
16:26:39
if only that were true of builds as well...
<@tdawson:fedora.im>
16:27:53
adamw: Oh, I wasn't meaning the tests were the same, just that cs does compose testing.
<@sgallagh:fedora.im>
16:29:46
If nothing else, I think we have established in prior discussions that we probably want the ELN batches to be submitted for testing as a single group of packages, rather than they are now, which is each package getting an auto-created Bodhi update individually.
<@sgallagh:fedora.im>
16:30:25
I think we need the test results, but I'm pretty convinced we do NOT want to gate on them
<@sgallagh:fedora.im>
16:31:10
Because I'm not sure there's a reasonable action we can take when they fail as a group.
<@adamwill:fedora.im>
16:31:17
Stephen Gallagher: do you have any new thoughts on how to handle huge batches?
<@sgallagh:fedora.im>
16:32:14
In a perfect world, I'd call that "Bodhi's problem", but that's probably not fair.
<@sgallagh:fedora.im>
16:32:36
Our batches can get enormous when we're doing mass-rebuilds, for example.
<@sgallagh:fedora.im>
16:33:16
adamw: I do find it mildly amusing that Bodhi handles getting 2500+ individual erratum creation requests better than a single erratum with 2500+ packages.
<@adamwill:fedora.im>
16:34:25
i've never traced out exactly why it spins for a long time when you create a new update with a ton of packages, but one thing i can think of is if it goes through the gating check, it has to do a *ton* of greenwave queries
<@adamwill:fedora.im>
16:34:34
those *should* be fast in infra, but eh
<@adamwill:fedora.im>
16:34:38
anyway, bit of a sidebar again
<@sgallagh:fedora.im>
16:35:55
Did anyone get a chance to review and ponder my crazy idea in the ticket?
<@sgallagh:fedora.im>
16:36:26
(Where we basically mass-tag the entire batch into `eln-build` as a buildroot override in parallel with generating a Bodhi update to get things properly to `eln`)
<@adamwill:fedora.im>
16:37:25
what does that help with?
<@sgallagh:fedora.im>
16:38:00
It helps with ensuring that EBS batches don't miss out on all of the previous batch's new packages in its buildroot while things are getting tested
<@sgallagh:fedora.im>
16:38:21
And prevents us from having to delay the start of the next batch (potentially indefinitely)
<@adamwill:fedora.im>
16:38:24
that's only an issue if we do gating, though, right?
<@adamwill:fedora.im>
16:38:38
as long as there's no gating, if the update makes it to bodhi it's basically guaranteed to go stable no more than three minutes later
<@adamwill:fedora.im>
16:38:48
(if that's broken it's also broken for rawhide, so somebody's gonna notice)
<@sgallagh:fedora.im>
16:39:01
Well, otherwise we need to block the start of the next batch on the successful move to stable of the previous batch's erratum
<@sgallagh:fedora.im>
16:39:23
adamw: "three minutes" seems optimistic for a giant batch
<@sgallagh:fedora.im>
16:39:34
I wouldn't trust that, so we'd need to add logic to wait for it.
<@adamwill:fedora.im>
16:39:41
well, the process of pushing it stable gets started within three minutes.
<@adamwill:fedora.im>
16:39:47
i don't know how long it takes, tbh.
<@sgallagh:fedora.im>
16:39:50
Which we probably should have had already for the individual packages, but didn't
<@adamwill:fedora.im>
16:39:59
yeah, i was gonna say, is this actually any different?
<@adamwill:fedora.im>
16:40:15
i'm not sure it's any *better* if we create 2500 individual updates suddenly
<@sgallagh:fedora.im>
16:40:15
Right, but that's an identified flaw that has bitten us before
<@sgallagh:fedora.im>
16:40:22
Yaakov can attest to that
<@sgallagh:fedora.im>
16:41:01
adamw: It's potentially a single message for us to watch for while waiting, rather than 2500 individual ones (and $DEITY preserve us if one of those messages goes missing...)
<@yselkowitz:fedora.im>
16:42:15
if it's three minutes to stable then tagging into override is probably overkill?
<@adamwill:fedora.im>
16:42:38
the other thing i'm thinking about is, say we get in the situation the override tagging 'helps' with
<@adamwill:fedora.im>
16:42:59
are we happy with the situation we're in at that point? there's some kinda backlog of things actually going stable, but we're merrily tagging whatever shows up into the buildroot as it shows up?
<@adamwill:fedora.im>
16:43:14
do we like the increasing delta there?
<@sgallagh:fedora.im>
16:45:32
Fair objections. Like I said, it was a crazy idea, but I figured it was a place to start the discussion
<@sgallagh:fedora.im>
16:46:27
If we really are talking "just a few minutes" (even as many as fifteen), then it is probably okay. We just need some reasonable way to block starting the next batch.
<@sgallagh:fedora.im>
16:47:00
As I said, I don't want to be sitting and listening for every package in the batch to get the "tagged into `eln`" message.
<@adamwill:fedora.im>
16:47:01
you could run up some stats using koji list-history , i guess
<@adamwill:fedora.im>
16:47:56
i'm not actually sure if there's an update-level message which you can rely on being emitted only once all the packages from the update are *available* in the buildroot repo 🤔
<@sgallagh:fedora.im>
16:48:14
It's definitely something we COULD do, but if any one of them fails to send that message for any reason, we're stuck (modulo a timeout, of course)
<@adamwill:fedora.im>
16:48:19
let's see. for that to happen, the packages in the update all need to be tagged appropriately. but *then* the buildroot repo needs to be regenerated
<@adamwill:fedora.im>
16:48:48
which iirc basically happens on a short timer or loop, koji is constantly regenerating them. but it's a 'it can take up to 15 minutes' kinda deal i think
<@sgallagh:fedora.im>
16:49:02
adamw: We don't need that from the update; just a confirmation that the batch is sent to stable. At that point, we can reasonably trust that as long as any one package NVR from it is in the buildroot, it's there.
<@sgallagh:fedora.im>
16:49:34
adamw: We actually already have code to handle that part of it, at least; the first step of a new batch is "wait for the next time this buildroot is regenerated"
<@adamwill:fedora.im>
16:49:35
ah, so you're planning to just run a wait-repo at some point?
<@adamwill:fedora.im>
16:49:41
ah, ok, cool.
<@adamwill:fedora.im>
16:50:24
then yeah, batching updates should help in this case.
<@nirik:matrix.scrye.com>
16:51:00
as a side note the recent koji upstream release reworks how buildroots are regenerated a lot... (I haven't looked at all the implications yet).
<@sgallagh:fedora.im>
16:51:36
I was just thinking to myself "You know what this conversation needs? More unknowns".
<@sgallagh:fedora.im>
16:51:54
But thanks for the heads-up.
<@adamwill:fedora.im>
16:52:13
let's redo it all in konflux!
<@sgallagh:fedora.im>
16:53:15
Alright, so batching would probably address some of these problems, but we still have the question of whether we could do full mass-rebuild in a Bodhi update.
<@sgallagh:fedora.im>
16:53:31
And what to do if a Bodhi update batch fails for any reason.
<@sgallagh:fedora.im>
16:53:45
Like, Bodhi gives us back a 500 error
<@sgallagh:fedora.im>
16:54:18
In that situation, we might have no way to know what state we're in
<@sgallagh:fedora.im>
16:55:02
nirik: Is there a way we could direct a side-tag to get signed and sent directly to `eln` without Bodhi?
<@sgallagh:fedora.im>
16:55:33
In the past, we've just tagged the side-tag to `eln-updates-candidate` and let Bodhi do its thing individually.
<@adamwill:fedora.im>
16:55:36
Stephen Gallagher: this seems like something that's also possible with single package updates
<@adamwill:fedora.im>
16:55:43
bodhi can just be down when you try and create it, for instance
<@conan_kudo:matrix.org>
16:55:44
!hi
<@zodbot:fedora.im>
16:55:47
Neal Gompa (ngompa) - he / him / his
<@nirik:matrix.scrye.com>
16:55:57
possibly yes...
<@sgallagh:fedora.im>
16:56:22
adamw: In practice, that's never happened; I assume Bodhi has some loop that keeps checking for `eln-updates-candidate` and catches up
<@sgallagh:fedora.im>
16:56:58
We don't directly tell Bodhi to create an update today; it just "happens" some small amount of time after we tag to `-candidate`
<@adamwill:fedora.im>
16:58:30
that's the automatic update creation. okay.
<@nirik:matrix.scrye.com>
16:58:33
https://bodhi.fedoraproject.org/releases/ELN
<@sgallagh:fedora.im>
16:58:38
nirik: Basically, I'd like to be able to do that at least for mass-rebuilds (following some package limit heuristic) so we don't clobber Bodhi.
<@nirik:matrix.scrye.com>
16:59:26
yeah, we do that for rawhide mass rebuilds too... fN-rebuild tag for builds and then tag them all in...
<@adamwill:fedora.im>
16:59:28
i didn't realize that's in use, i thought the ebs thing was already manually creating the updates. hmm
<@nirik:matrix.scrye.com>
17:00:04
doing it in a sidetag is slightly different tho as it will be updating the buildroot all the time.
<@nirik:matrix.scrye.com>
17:00:13
(the sidetags buildroot to be clear)
<@sgallagh:fedora.im>
17:00:40
nirik: What do you do to "tag them all in" from `fN-rebuild`?
<@sgallagh:fedora.im>
17:00:56
Do you just send them to `fN-updates-signing-pending` or something?
<@nirik:matrix.scrye.com>
17:01:23
there's a releng 'mass-tag' script... it checks if there's a newer build already there in the tag, if there is it skips that package, if not it tags.
<@nirik:matrix.scrye.com>
17:01:34
we usually setup signing for that tag when we setup the tag.
<@nirik:matrix.scrye.com>
17:01:42
so they are all signed already.
<@sgallagh:fedora.im>
17:02:42
nirik: How do you set up signing for a tag?
<@sgallagh:fedora.im>
17:02:54
Is that something the on-demand side tag we create could do?
<@nirik:matrix.scrye.com>
17:04:23
We configure it in ansible and push deployment and restart autosigning... however, sidetags have a -signing-pending tag created when they are, and thats what bodhi uses to tell autosign to sign them. However, it only allows (currently) bodhi to tag things into those.
<@nirik:matrix.scrye.com>
17:04:42
we could perhaps add 'trusted taggers' for the eln side tags
<@sgallagh:fedora.im>
17:05:21
Meaning that only the EBS user could tag things into them?
<@nirik:matrix.scrye.com>
17:06:09
or whatever user(s) we set in the config.
<@nirik:matrix.scrye.com>
17:07:04
https://pagure.io/fedora-infra/ansible/blob/main/f/roles/robosignatory/templates/robosignatory.toml.j2#_352
<@sgallagh:fedora.im>
17:08:51
I'm not sure how to read that; is it saying that our side-tags are already signing?
<@sgallagh:fedora.im>
17:09:17
Or is it saying that if we submitted the side-tag as an update, Bodhi would tag it for signing?
<@sgallagh:fedora.im>
17:09:24
Or something else?
<@nirik:matrix.scrye.com>
17:10:39
It's part of the flow... if you make a side tag now, build some stuff in it, submit it to bodhi, bodhi then tags the builds into the signing-pending tag, robosignatory then signs them and moves them to the testing tag and bodhi then knows they are signed and moves on...
<@sgallagh:fedora.im>
17:11:26
Got it
<@sgallagh:fedora.im>
17:11:51
So at least if, today, we started submitting updates via Bodhi instead of just dumping them into `-candidate`, they'd get properly signed.
<@sgallagh:fedora.im>
17:12:11
But that doesn't really tell me what we would do if we wanted to skip Bodhi.
<@sgallagh:fedora.im>
17:12:21
(Such as for a mass-rebuild)
<@nirik:matrix.scrye.com>
17:12:21
well, there's two dlows here... sidetags vs just normal builds.
<@nirik:matrix.scrye.com>
17:12:28
well, there's two fows here... sidetags vs just normal builds.
<@nirik:matrix.scrye.com>
17:13:53
so I'd suggest for mass rebuilds:
<@nirik:matrix.scrye.com>
17:15:04
we just make a known tag like 'eln-rebuild' or whatever. We teach robosignatory to sign builds in it. Then you can just use the mass-tag script to move it into the main tag... then you need to untag anything left in there for the next use?
<@nirik:matrix.scrye.com>
17:16:12
I'm not sure what your threshold is for mass rebuild vs just doing single builds?
<@sgallagh:fedora.im>
17:18:22
So what I'm thinking is a workflow something like this, then:
<@sgallagh:fedora.im>
17:18:22
2. Is the batch > 300 packages OR did we wait > 20 minutes for Bodhi to submit to stable? Then tag everything into `eln-rebuild`, wait for signing, tag everything to `eln`, untag everything from `eln-rebuild`.
<@sgallagh:fedora.im>
17:18:22
1. Is the batch <= 300 packages? Submit it as a Bodhi update. Wait for Bodhi to submit for stable.
<@sgallagh:fedora.im>
17:19:08
That would be to set an upper limit on how long we'd wait for Bodhi, blocking the start of the next batch.
<@sgallagh:fedora.im>
17:19:52
If Bodhi just took a long time to catch up, would it tolerate having packages already in the destination tag?
<@sgallagh:fedora.im>
17:20:02
Or would we need to explicitly cancel the Bodhi update
<@sgallagh:fedora.im>
17:20:45
Or would we need to explicitly cancel the Bodhi update?
<@nirik:matrix.scrye.com>
17:21:16
I expect it would just try and tag in and see they were already there and drive on.
<@nirik:matrix.scrye.com>
17:21:20
but I don't know for sure.
<@sgallagh:fedora.im>
17:22:43
adamw: What do you think? We can get feedback from test results on any of the Bodhi updates with this, I should think.
<@adamwill:fedora.im>
17:23:04
yeah, in the end nothing much changes for me in practice; openqa will test whatever updates show up
<@sgallagh:fedora.im>
17:23:17
And I can look into the compose gating as a separate project, but we're way over time for this meeting to discuss it here today.
<@adamwill:fedora.im>
17:23:20
the main concern i have in this area is that we don't do anything and a huge flood of updates shows up for some reason and overwhelms openqa
<@adamwill:fedora.im>
17:23:31
the compose gating is one i'd probably get more involved in
<@sgallagh:fedora.im>
17:24:22
Yeah, whatever we do for that, I'd like to make it reusable for non-ELN Fedora as well.
<@adamwill:fedora.im>
17:24:28
i didn't get around to looking at splitting eln update tests into their own section yet, sorry
<@adamwill:fedora.im>
17:24:42
Stephen Gallagher: sure, in theory that'd be nice. i doubt we'd use it, though
<@nirik:matrix.scrye.com>
17:24:46
note that some of these same things may come up soon with secondary riscv
<@nirik:matrix.scrye.com>
17:25:05
(not testing per se, but keeping up with rawhide, etc)
<@sgallagh:fedora.im>
17:26:16
OK, I'm going to wrap this meeting up since we're running late.
<@sgallagh:fedora.im>
17:26:40
nirik: Would it be possible for you to set up an `eln-rebuild` tag with signing like you described above that I could use to test a couple things before I start implementation in EBS?
<@nirik:matrix.scrye.com>
17:26:54
sure, please file a ticket. :)
<@sgallagh:fedora.im>
17:26:58
Will do.
<@sgallagh:fedora.im>
17:27:43
!info Lots of discussion on how to best enable Bodhi update testing for multiple packages while not overwhelming things when we do automated mass-rebuilds. See the full logs for more details.
<@sgallagh:fedora.im>
17:28:14
I'm going to skip Open Floor this week.
<@sgallagh:fedora.im>
17:28:19
Thanks for coming, folks.
<@sgallagh:fedora.im>
17:28:23
!endmeeting