16:07:11 <sgallagh> #startmeeting ELN (2023-10-20)
16:07:11 <zodbot_> Meeting started Fri Oct 20 16:07:11 2023 UTC.
16:07:11 <zodbot_> This meeting is logged and archived in a public location.
16:07:11 <zodbot_> The chair is sgallagh. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:07:11 <zodbot_> Useful Commands: #action #agreed #halp #info #idea #link #topic.
16:07:11 <zodbot_> The meeting name has been set to 'eln_(2023-10-20)'
16:07:11 <zodbot> Meeting started Fri Oct 20 16:07:11 2023 UTC.
16:07:11 <zodbot> This meeting is logged and archived in a public location.
16:07:11 <zodbot> The chair is sgallagh. Information about MeetBot at https://fedoraproject.org/wiki/Zodbot#Meeting_Functions.
16:07:11 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
16:07:12 <zodbot> The meeting name has been set to 'eln_(2023-10-20)'
16:07:19 <sgallagh> #meetingname eln
16:07:19 <zodbot_> The meeting name has been set to 'eln'
16:07:19 <zodbot> The meeting name has been set to 'eln'
16:07:24 <sgallagh> #topic Init Process
16:07:33 <sgallagh> .hi
16:07:33 <zodbot_> sgallagh: [hellomynameis sgallagh]
16:07:34 <zodbot> sgallagh: sgallagh 'Stephen Gallagher' <sgallagh@redhat.com>
16:07:35 <yselkowitz> .hi
16:07:36 <zodbot_> yselkowitz: [hellomynameis yselkowitz]
16:07:37 <zodbot> yselkowitz: yselkowitz 'Yaakov Selkowitz' <yselkowi@redhat.com>
16:08:23 <tdawson> Howdy
16:09:50 <sgallagh> I don't have anything specific on the agenda for today
16:09:56 <sgallagh> #topic Agenda Topics
16:10:59 <sgallagh> Anyone?
16:11:32 <yselkowitz> not sure if it's a meeting topic, but I had a thought about EBS
16:12:05 <sgallagh> #topic ELNBuildSync
16:12:08 <sgallagh> It is now!
16:12:25 <sgallagh> Oh, for posterity:
16:12:27 <sgallagh> #link https://sgallagh.wordpress.com/2023/10/13/sausage-factory-fedora-eln-rebuild-strategy/
16:12:58 <yselkowitz> thanks for that btw, nice to have something to point to
16:13:01 <yselkowitz> wrt EBS's issues building sets of packages which require a particular build order...
16:13:38 <sgallagh> I assume this is specifically related to OCAML?
16:13:40 <yselkowitz> I was wondering if it 1) is possible and 2) would help if EBS were to have certain packages which it should process first in a batch
16:13:47 <yselkowitz> ocaml, llvm, etc.
16:13:55 <yselkowitz> so iow
16:14:26 <yselkowitz> if ocaml or llvm were to be included in a batch, they should be processed first
16:15:08 <yselkowitz> iiuc this would increase the likelihood of EBS successfully handling the rest of the packages that are typically built with those
16:15:28 <yselkowitz> because it keeps retrying as long as the failures don't repeat
16:15:44 <sgallagh> Not a bad idea, for known problem cases. Though it gets tricky if there's more than one layer like that.
16:15:44 <yselkowitz> so, taking ocaml as an example
16:15:54 <sgallagh> (e.g. if we have to build llvm, then clang, then everything else)
16:16:17 <yselkowitz> wrt llvm, I think the version requirements take care of that
16:16:40 <sgallagh> Meaning they'll fail and get reordered in subsequent RebuildAttempts?
16:17:01 <yselkowitz> so if you build llvm first, then the rest of that ecosystem, yes some will fail until clang has been built, but EBS will retry and they should pass the second time?
16:17:12 <sgallagh> OK, I'll buy that.
16:17:38 <yselkowitz> no guarantees that will solve all our problems with these, but I think it would improve our chances
16:18:03 <tdawson> Thinking of ocaml ... isn't there a 1st, 2nd, 3rd package that needs to be done for a successful build?
16:18:04 <yselkowitz> wrt ocaml, "seeding" ocaml first will break anything that has dependencies between it and ocaml
16:18:20 <yselkowitz> I have an order that I've been using
16:18:47 <tdawson> I'm just wondering if we need to go at least one step further.  A 1st and 2nd build.   So llvm, then clang type thing.
16:18:58 <tdawson> Or do you think that would make things too complicated.
16:19:07 <sgallagh> That's rapidly getting complicated.
16:19:21 <yselkowitz> so for ocaml, I do:
16:19:22 <yselkowitz> GROUP 1: ocaml
16:19:22 <yselkowitz> GROUP 2: graphviz ocaml-ocamlbuild ocaml-csexp ocaml-pp ocaml-labltk
16:19:22 <yselkowitz> GROUP 3: ocaml-dune ocaml-findlib
16:19:22 <yselkowitz> GROUP 4: brltty hivex libnbd ocaml-augeas ocaml-cppo ocaml-curses ocaml-fileutils ocaml-libvirt ocaml-re
16:19:23 <yselkowitz> GROUP 5: ocaml-calendar ocaml-gettext supermin
16:19:25 <yselkowitz> GROUP 6: libguestfs virt-top
16:19:29 <yselkowitz> GROUP 7: virt-v2v
16:19:30 <sgallagh> With a single "first pass", I think I have a hack that would work.
16:19:47 <yselkowitz> but just getting ocaml in first assures that nothing can be mistakenly built against the previous version
16:20:34 <yselkowitz> anything with multiple ocaml dep layers will fail, but those that don't will build on the first pass, and subsequent passes should get more and more built until they're all done
16:20:37 <tdawson> Ah, if that will work, then that is good.
16:20:45 <yselkowitz> I *think* it will work
16:20:57 <yselkowitz> it certainly shouldn't make things *worse*
16:21:07 <tdawson> It's worth a shot.
16:21:48 <sgallagh> Regarding the "hack", what I can do is check a batch for specific packages and if they exist, mark all others as "failing" the first pass so they get retried on pass two.
16:22:24 <sgallagh> (Mark them as failed rather than firing them off, I mean)
16:22:47 <yselkowitz> I can't speak for the implementation, just the concept
16:22:52 <tdawson> That might work.
16:23:05 <sgallagh> Right, I think the concept makes sense and would be an improvement.
16:23:33 <sgallagh> I'm mentioning the implementation for two reasons: 1) to get feedback on it and 2) so I remember later what I was thinking :)
16:27:07 <sgallagh> OK, I'm not hearing any wild disagreement here.
16:27:25 <sgallagh> yselkowitz: Mind opening a ticket and I'll get to that probably on Monday?
16:27:50 <tdawson> Both the proposal, and the possible implmentation sound ok to me.
16:28:26 <yselkowitz> ok
16:28:28 <sgallagh> There's one tiny edge-case I can see with this, but it's fairly unlikely.
16:28:35 <sgallagh> this->this hack
16:28:44 <yselkowitz> ??
16:29:19 <sgallagh> Actually, I just thought of another. Let me write them down...
16:30:25 <sgallagh> 1) If the first pass fails to build e.g. ocaml, the second pass will still happen.
16:30:46 <tdawson> ouch
16:30:54 <sgallagh> 2) If the first pass succeeds, but all of the second pass fails, they won't get the one auto-retry we have in place for dealing with test flakes
16:31:03 <sgallagh> 2) is probably ignorable.
16:31:11 <sgallagh> The first one concerns m
16:31:12 <sgallagh> me
16:31:30 <sgallagh> So I may need to do this differently.
16:33:11 <sgallagh> I don't know what a good answer here would be. If ocaml or llvm fails (validly; let's set aside flakes for the moment), what should we do about the further packages?
16:33:24 <sgallagh> Not every package in the batch is necessarily related to ocaml or llvm.
16:34:48 <jforbes> Sounds like a dep map is the answer, but realistically, cancel anything that is related to the failed
16:35:03 <sgallagh> But we don't actually have that information.
16:35:23 <yselkowitz> the rest of llvm will just fail
16:35:38 <sgallagh> Especially in a world that includes automatic BuildRequires
16:35:53 <yselkowitz> because they are version locked
16:36:27 <sgallagh> yselkowitz I'm more concerned about things like OCAML (or python minor version bumps).
16:36:45 <sgallagh> Where the builds would succeed, but have incorrect Requires
16:36:57 <yselkowitz> shouldn't be worse than it is now
16:37:11 <yselkowitz> python is tagged in though
16:37:21 <sgallagh> Right, so python is probably okay.
16:37:28 <yselkowitz> yes, if ocaml fails, then ocaml-* might be built against the old version
16:37:31 <sgallagh> OCAML we currently just outright refuse to attempt
16:38:02 <sgallagh> (Both the main package and the ocaml-* ones
16:38:28 <yselkowitz> attempt?  or tag?
16:38:38 <sgallagh> Attempt, I think
16:39:15 * yselkowitz thinks they're just not tagged
16:39:42 <sgallagh> No, you're right.
16:40:16 <sgallagh> It's just skipping the tag.
16:40:47 <sgallagh> OK, so I guess this would still be an improvement.
16:41:36 <sgallagh> Do we have any examples of this sort of problem outside OCAML?
16:42:08 <yselkowitz> llvm and ocaml are the ones that come to mind
16:42:33 <yselkowitz> because they are not tagged (for good reason) and build order matters
16:43:14 <sgallagh> LLVM is only an issue around major version bumps (which is twice a year). Is OCAML more frequent?
16:43:29 <yselkowitz> llvm is micro-version locked
16:43:36 <sgallagh> What do you mean?
16:44:39 <yselkowitz> e.g. clang requires the exact same version of llvm
16:44:59 <yselkowitz> and then other components require clang(-filesystem)
16:45:31 <sgallagh> You sure? It looks like it's only bound to the soname
16:45:38 <yselkowitz> so doing llvm first means that clang should build on the next pass, leaving the others to build on the subsequent pass
16:45:55 <yselkowitz> `BuildRequires:	%{llvm_pkg_name}-devel = %{version}`
16:47:27 <sgallagh> Are you (<expletive deleted>) kidding me?
16:47:54 <yselkowitz> no kidding, I've built both stacks manually enough times to know
16:48:44 <sgallagh> I really have to question why they are separate packages, if they're that interrelated
16:49:51 <tdawson> So ... big question here.  and maybe I need to read your blog better.  Are we not tagging in the Fedora version into the side-tag before a build? ... or is that what you are meaning by "tagging in" .... and ... now the tagging part of this conversation just made sense.
16:50:00 <yselkowitz> not for llvm and ocaml
16:50:11 <yselkowitz> they're not binary compatible between fedora and eln
16:50:11 <tdawson> Sorry ... meant to delete that but hit enter instead.
16:50:16 <sgallagh> Right, certain packages with known issues are excluded.
16:50:31 <sgallagh> Because we know the Fedora and ELN versions are incompatible
16:50:56 <sgallagh> In the case of LLVM and OCAML, the ELN runtime cannot run anything built with the Fedora version of the compiler
16:51:04 <yselkowitz> mind you, ocaml and binary compatibility don't go in the same paragraph, if you sneeze at it, it breaks :-)
16:51:14 <tdawson> *laughs*
16:51:38 * yselkowitz speaks from many years of experience
16:55:21 <sgallagh> OK, I think you're still right that we need to have a list of special packages that need to be built before we attempt any others, but it still doesn't answer the question of what to do with the rest of the batch.
16:55:57 <sgallagh> Especially if the `ocaml` or `llvm` package ends up failing
16:57:09 <sgallagh> I know it's not *safe* to just fire it off, but we really only have two choices:
16:57:22 <yselkowitz> the rest of the batch should wait the first time
16:57:44 <sgallagh> 1) Fire off the batch and let ocaml-* break
16:57:44 <sgallagh> 2) Abort the batch and let that mean that some unrelated packages may not get attempted.
16:58:07 <sgallagh> I mean *after* we've tried the ocaml build on its own *and it fails*
16:58:16 <yselkowitz> #1
16:58:43 <yselkowitz> because we don't control what else is in the batch
16:58:49 <sgallagh> Yeah, I think that's the least-bad option, but I was kind of hoping someone had a great idea for a "third way" :)
16:58:59 <tdawson> Technically, #1 is what we are currently doing.  So ... in theory, it isn't any worse than right now.
16:59:26 <tdawson> I want to say #2, because it is safer, but what if it happens before a long weekend and/or holiday and nobody looks at it for a while.
16:59:47 <sgallagh> Well, #2 only seems safer; I don't think it actually is.
16:59:48 <yselkowitz> unless EBS gains some interactiveness to prompt for and respond to intervention
17:00:14 <sgallagh> Because what if the remainder of the batch includes a soname bump for a major library?
17:00:39 <yselkowitz> #1 is the only safe option
17:00:51 <yselkowitz> it's no worse than the current situation
17:01:07 <sgallagh> I guess in the specific case of `ocaml-*` I could maybe skip just those packages if the `ocaml` build has failed.
17:01:10 <yselkowitz> and has the potential at least to improve things
17:01:11 <tdawson> correct ... it's not worse that currently, and at the same time, better than currently.
17:01:32 <sgallagh> But that's a one-off, not a general solution
17:01:43 <yselkowitz> what's a one-off
17:01:45 <yselkowitz> ?
17:02:02 <yselkowitz> and we're out of time
17:02:06 <sgallagh> Adding a hack so if `ocaml` fails, we auto-skip `ocaml-*`
17:02:17 <sgallagh> But let the others run
17:02:46 <sgallagh> The channel is free after our meeting, so we can run long if we want to keep going on this.
17:03:01 <yselkowitz> ok
17:04:38 <jforbes> That helps, but still doesn't skip any others which might have a builRequires on ocaml.
17:05:34 <yselkowitz> we could make more lists for "skip these if this doesn't build" but no idea how complicated that is
17:05:37 <jforbes> While auto BuildRequires does create a problem, we have the advantage that fedora just built those packages for rawhide, so we could get proper buildreqs from the rawhide srpms
17:05:43 <sgallagh> Sure... but at this point we're mitigating, not solving.
17:06:23 <sgallagh> jforbes: Sure... if they aren't conditionalizing BuildRequires too
17:06:42 <jforbes> sgallagh: if they are conditionalizing, they aren't auto BuildRequires
17:06:52 <yselkowitz> not necessarily true
17:07:21 <jforbes> given that the SRPM download is heavy, it would make sense to only use that when the package is using auto buildrequires
17:08:13 <yselkowitz> not sure what auto buildreqs have to do with this?
17:08:26 <jforbes> Oh, I see what you mean, if they conditionalize something else in the build, it might bring in a new BuildRequires that isn't in rawhide
17:08:36 <sgallagh> Yes
17:08:55 <jforbes> yselkowitz: I was thinking if you build a depmap for BuildRequires, you can know truly what to cancel when a package fails.
17:08:57 <sgallagh> auto-BR is one way that happens, but not the only one
17:09:25 <sgallagh> jforbes: If that was possible, we'd be using it to do ordering right from the get-go. Unfortunately, it's not reliable
17:09:27 <jforbes> sgallagh: sure, but for non auto-BR, we can use the lighter weight spec glance
17:09:40 <yselkowitz> just because package A depends on B, and both land in the same batch, does not necessarily mean they need to be built in order
17:09:52 <yselkowitz> and besides, most packages are cross-tagged
17:10:15 <sgallagh> And in some cases, they are circularly dependent and trying to order them will get it wrong
17:10:19 <yselkowitz> here we're talking about very specific package sets which are not tagged and are always build-order sensitive
17:10:45 <sgallagh> Point: saying "are not tagged" is confusing.
17:10:57 <yselkowitz> not cross-tagged
17:11:03 <sgallagh> We probably need to say "the Rawhide version cannot be in the buildroot"
17:11:46 <sgallagh> Since that's the effect of not tagging it into the batch side-tag
17:11:55 <jforbes> Was just thinking about how we solved this with conary and the rPath build system a while ago, but we had the advantage of putting bootstrap build info into the recipes, so if you had a circular dep, the build order was added so that the buildsystem knew how to order
17:12:19 <sgallagh> Yeah, that was one of the requirements we had on Modularity too.
17:12:19 * yselkowitz feels like this is going off the rails
17:12:28 <sgallagh> Almost as if this is a hard problem!
17:13:07 <sgallagh> jforbes: If you haven't had a chance to read it, I did a write-up of our approach the other day; I linked it earlier in the meeting.
17:14:02 <sgallagh> OK, this clearly needs some more thought before I jump to any kind of implementation. yselkowitz, please file that ticket after the meeting.
17:14:19 <sgallagh> We're well over time and should probably close out the meeting
17:16:05 <yselkowitz> thanks sgallagh
17:16:11 <sgallagh> #endmeeting