16:00:07 <sgallagh> #startmeeting ELN (2023-11-03)
16:00:07 <zodbot> Meeting started Fri Nov  3 16:00:07 2023 UTC.
16:00:07 <zodbot> This meeting is logged and archived in a public location.
16:00:07 <zodbot> The chair is sgallagh. Information about MeetBot at https://fedoraproject.org/wiki/Zodbot#Meeting_Functions.
16:00:07 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
16:00:07 <zodbot> The meeting name has been set to 'eln_(2023-11-03)'
16:00:07 <sgallagh> #meetingname eln
16:00:07 <zodbot> The meeting name has been set to 'eln'
16:00:07 <sgallagh> #topic init process
16:00:07 <sgallagh> .hi
16:00:08 <zodbot> sgallagh: sgallagh 'Stephen Gallagher' <sgallagh@redhat.com>
16:01:09 <sgallagh> Who do we have here today?
16:01:55 <neil> .hi
16:01:56 <zodbot> neil: neil 'Neil Hanlon' <neil@shrug.pw>
16:02:30 <dcavalca> .hi
16:02:31 <zodbot> dcavalca: dcavalca 'Davide Cavalca' <davide@cavalca.name>
16:04:56 <sgallagh> OK, small crowd today I guess
16:05:07 <sgallagh> I guess we will get started.
16:05:19 <sgallagh> #topic Agenda
16:05:37 <tdawson> Sorry for being late ... I'm here.
16:05:44 <sgallagh> #info Agenda Item: Sysprof, frame-pointers and divergence between Fedora Linux and Fedora ELN
16:05:57 <sgallagh> #info Agenda Item: Content Resolver Workloads: what do they mean?
16:07:21 <sgallagh> We had a request to discuss https://pagure.io/fesco/issue/3068 as well, but I've added new updates there that people likely haven't had a chance to read yet. If there are things to discuss there, we can do it in Open Floor.
16:08:04 <sgallagh> Similar answer for https://github.com/fedora-eln/eln/issues/170 which we ended up discussing a bit in the Matrix channel
16:08:12 <sgallagh> Does anyone have other topics for today?
16:09:59 <sgallagh> #topic  Sysprof, frame-pointers and divergence between Fedora Linux and Fedora ELN
16:10:36 <sgallagh> #link https://github.com/minimization/content-resolver-input/pull/1007
16:12:00 <sgallagh> tl;dr There's a proposal to add sysprof to ELN (and RHEL 10 by extension), but there is question as to whether this is worthwhile since ELN doesn't support frame pointers.
16:12:32 <Son_Goku> I think having sysprof is great, for what it's worth
16:12:57 <sgallagh> For a little background: Fedora ELN diverges from Fedora Linux in this situation because RHEL management has made (and, I believe, publicized) the decision not to enable frame pointers in RHEL 10
16:13:11 <Son_Goku> actually I don't think that was the case
16:13:20 <sgallagh> Which part?
16:13:39 <dcavalca> some (maybe helpful) context on frame pointers from the Fedora side: Daan ran the benchmarks again, and for x86_64 and aarch64 the impact is around or less than 1%; ppc64le already enables frame pointers by default (well, not exactly, but the outcome is the same) and has for a long time; we're currently looking into enabling backchain for s390x
16:13:39 <dcavalca> (the arch-equivalent of frame pointers) and working with the upstream s390x folks on the necessary kernel enablement work on that, which should be landing soon
16:13:56 <Son_Goku> sgallagh: the part of RHEL management deciding something
16:14:20 <Son_Goku> the way you explained it to me was that status quo was being kept for now in ELN for a later decision
16:14:21 <sgallagh> There's a recorded decision internally. It's decided.
16:14:37 <Son_Goku> is there a rationale for it?
16:15:07 <sgallagh> Yes, I just found the Knowledge Base article for it: https://access.redhat.com/solutions/7041624
16:15:09 <dcavalca> in general, without frame pointers profiling is a lot less useful; continuous profiling in particular becomes essentially impossible, as is troubleshooting issues on end user systems as they happen
16:16:30 <Son_Goku> sgallagh: uhh, that's *interesting*
16:17:00 <Son_Goku> I wonder if one of the reasons why there isn't more pressure is because golang forcibly enables frame pointers on all arches for Go builds?
16:17:27 <sgallagh> I'm not going to engage in speculation. It won't lead to a useful outcome :)
16:17:31 <Son_Goku> lol
16:17:44 <Son_Goku> anyway, the article itself doesn't exactly provide a real rationale
16:18:00 <dcavalca> sgallagh: something that would be useful to know is which workloads are negatively affected by frame pointers and how
16:18:03 <Son_Goku> it just handwaves it as not "enterprise" which is a very odd statement
16:18:18 <sgallagh> There was considerable debate/discussion internally. I don't personally agree with the decision, but I understand that there were tradeoffs to be made in either direction.
16:18:20 <dcavalca> obviously without naming specific customers, but if there's something that can be improved I'd love to get some eyes on it
16:19:04 <Son_Goku> right
16:19:05 <sgallagh> Rather than rehashing the RHEL decision, we should discuss if there's anything we might want to do in Fedora ELN around this.
16:19:31 <Son_Goku> well, can we get some kind of workload sample for ELN that could be analyzed for perf and such?
16:19:32 <sgallagh> I'm wary of diverging too far from RHEL, but I had an idea to bounce off of you
16:19:36 <Son_Goku> okay?
16:19:41 <Son_Goku> what's the idea?
16:21:00 <sgallagh> As you know, RHEL/CentOS Stream 10 will be branching from Fedora in February. I propose that we enable frame pointers in Fedora ELN (now targeted at  RHEL 11). We then commit to re-evaluate as we go into the Fedora release that will be branched from to create RHEL 11.
16:21:22 <Son_Goku> that seems reasonable
16:21:32 <dcavalca> ah, you mean enabling them after it branches? yeah that's fine by me
16:21:41 <dcavalca> and by then we should have s390x sorted out as well upstream
16:21:44 <Son_Goku> yup
16:21:51 <sgallagh> So if RHEL decides not to enable frame pointers for RHEL 11, we have a full cycle to work out any hiccoughs, but we still get 2.5 years of frame pointers to tinker with.
16:22:08 <Son_Goku> well I hope by then they'll keep it
16:22:13 <Son_Goku> it's really nice having working flamegraphs
16:22:17 <tdawson> That sounds reasonable to me.
16:22:26 <sgallagh> So do I
16:22:41 <sgallagh> (Hope they keep it)
16:23:11 <sgallagh> My hope is that improvements we get to profiling will lead to greater performance increases than the hit we take from having them
16:23:15 <Son_Goku> we could encode this in redhat-rpm-config as a rhel >= 11 enable now, that should make it automatically work?
16:23:25 <sgallagh> I believe so, yes
16:23:39 <Son_Goku> sgallagh: the fact we've gotten somewhere between 30~60% perf improvements in the gtk stack alone from only having it for six months is pretty amazing
16:23:45 <sgallagh> Though we probably would also want to consider a mass-rebuild to apply it
16:24:07 <Son_Goku> yes, but we can have that ready so we don't forget when we do the mass build the flip everything over to rhel 11 :)
16:24:22 <yselkowitz> but the mass rebuild is before or after branching?
16:24:30 <Son_Goku> after
16:24:44 <sgallagh> There's no implied mass-rebuild when we switch to RHEL 11.
16:24:57 <sgallagh> Generally, we'd just piggy-back on whenever Fedora does one.
16:25:05 <Son_Goku> right
16:25:12 <sgallagh> But if we want this to apply immediately, we could plan for it
16:25:18 <yselkowitz> right, but f40 branching and eln->11 should coincide?
16:25:22 <Son_Goku> yes
16:25:36 <yselkowitz> but isn't the f40 mass rebuild *before* f40 branches?
16:25:41 <Son_Goku> oh you're right
16:26:22 <Son_Goku> https://fedorapeople.org/groups/schedule/f-40/f-40-all-tasks.html
16:26:28 <Son_Goku> the mass build is in January
16:28:15 <Son_Goku> sgallagh: can we trigger a build of our own after the rhel macro is bumped?
16:28:17 <sgallagh> In the interest of time, let's table the mass-rebuild portion and just decide on the general plan
16:28:30 <sgallagh> Any time we want; we can just bump the ELN buildroot number.
16:28:35 <Son_Goku> ah cool
16:28:37 <sgallagh> That's why it's there
16:29:41 <Son_Goku> but yeah in terms of a plan, my thought was we can modify redhat-rpm-config now to incorporate it with rhel >= 11
16:30:10 <Son_Goku> and then we will get it automatically in February after the bump and rebuild
16:30:19 <sgallagh> So, just to make it clear: the obvious potential issue with this plan is that if RHEL opts to continue without frame pointers in RHEL 11, we'll have been doing 2+ years of testing against builds that aren't entirely representative.
16:30:45 <sgallagh> I personally think the benefits outweigh the risks there, but I want to make that clear so we know what we're agreeing to do :)
16:30:49 <Son_Goku> sure
16:30:53 <dcavalca> yeah I think that's fine
16:31:03 <Son_Goku> I think this is the kind of thing where ELN can provide real value
16:31:28 <Son_Goku> because otherwise it's too hard to evaluate what it looks like to incorporate Fedora improvements into RHEL
16:31:55 <Son_Goku> we should never be resorting to the excuse of "not enterprise" to not include something in RHEL
16:33:58 <sgallagh> There are definitely markets (Financial Services?) where a 2% performance hit would be a deal-breaker. I can understand that. I wish this didn't have to be a compile-time option, but you defend the country with the army you have
16:34:46 <sgallagh> Anyway, let me record the plan, assuming no one wants to disagree?
16:34:50 <Son_Goku> yeah, though the spin I would give is that you can eke more of a gain out with the instrumentation ;)
16:35:03 <Son_Goku> but yeah, let's go with it
16:36:18 <sgallagh> #agreed Fedora ELN will enable frame pointers in February, following the branching from Fedora for RHEL 10. We will re-evaluate the inclusion of frame pointers at the start of the Fedora release from which RHEL 11 will branch (giving us six months to test without them, if that is the decision).
16:36:31 <Son_Goku> ack
16:36:49 <sgallagh> #topic Content Resolver Workloads: what do they mean?
16:37:08 <sgallagh> Like a double-rainbow: no one is quite sure what a "workload" is anymore.
16:37:12 <Son_Goku> yeah
16:37:21 <sgallagh> @Son_Goku, would you mind elaborating?
16:37:24 <Son_Goku> sure
16:37:49 <Son_Goku> this was triggered by my pull request: https://github.com/minimization/content-resolver-input/pull/972
16:38:29 <Son_Goku> I considered it quite reasonable to track cmake in the VFX platform definition because that is a "workload" as we classically defined it: things that people care about that make up a thing
16:38:49 <Son_Goku> but increasingly, it doesn't look like there _are_ workloads in content resolver anymore
16:39:14 <yselkowitz> cmake is... complicated, not sure I would extrapolate from that
16:39:17 <Son_Goku> it seems to be mostly subsystem team stuff, which doesn't really match up with what's actually going on
16:39:30 <yselkowitz> basically, everyone needs it but nobody wants to own it
16:39:38 <Son_Goku> yselkowitz: no, it's not really complicated... cmake is important as a vfx platform development interface
16:39:48 <Son_Goku> see that's the problem, workloads aren't supposed to be about ownership
16:39:58 <tdawson> Why not?
16:40:10 <sgallagh> @Son_Goku But they're also the only mechanism we have for figuring ownership out
16:40:29 <sgallagh> Or, maybe more accurately, figuring out when something is unowned
16:40:31 <yselkowitz> I agree that vfx is a sensible starting place for looking for a maintainer, but it's not necessarily the only one
16:40:45 <Son_Goku> again, that's not the point I'm making here
16:41:02 <Son_Goku> the point I'm making is that workloads and maintainers are orthogonal concepts
16:41:13 <Son_Goku> and at some point, someone conflated them and started using it that way
16:41:15 <sgallagh> I think I'm missing that point, then.
16:41:35 <tdawson> Not to me ... to me ... if you put a package in your workload, then you are saying that you are responsible for it.
16:41:47 <Son_Goku> because you're not using it as a "workload"
16:41:56 <Son_Goku> you're using it as a "collection"
16:42:14 <Son_Goku> the sst stuff? those are collections
16:42:21 <Son_Goku> kde and vfx? those are workloads
16:42:38 <sgallagh> I think I see where you're coming from.
16:42:55 <Son_Goku> hopefully, I don't know if I have more words to try to clarify it :P
16:42:57 <sgallagh> But I'm not sure what you're proposing we do about it.
16:43:22 <sgallagh> You're right that the effective meaning of the term has become an "owned collection of packages"
16:43:22 <Son_Goku> I'm not proposing anything right now, I'm asking what we should do about this conflation because it's confusing
16:43:53 <sgallagh> Changing the existing terminology is actually harder than it seems, due to how the CR is actually implemented.
16:44:04 <sgallagh> But we can probably try to document it better.
16:44:51 <sgallagh> In the specific case of VFX, is the development of CMake itself part of that workload?
16:44:57 <Son_Goku> we also can encode some policy stuff here: naming the yaml files a specific way encodes particular team maintenance, and an omission of that pattern encodes regular workloads
16:45:18 <Son_Goku> sgallagh: cmake features are aggressively used as part of development of vfx components as such
16:45:36 <tdawson> Son_Goku: Although I think I understand what you are thinking for your definition of collections and workloads ... but I honestly don't see that as "my" definition of workload.
16:46:22 <Son_Goku> upstream, the aswf have asked about having cmake freshened regularly so that they can build stuff properly with things like new cuda and c++ features
16:47:04 <Son_Goku> the kde workload also really depends on cmake, though it's less aggressive at adopting new cmake features which makes it less of a problem
16:47:43 <Son_Goku> the way I view it "workloads" should track components that are needed for the explicit success of the workload
16:48:08 <tdawson> Ok ... and ... where did you get that definition?
16:48:21 <Son_Goku> from Adam Samalik :)
16:48:30 <Son_Goku> when minimization first started
16:48:41 <tdawson> Ahh ... ok
16:49:07 <Son_Goku> I'm fine with people disagreeing with that statement, but this entanglement of maintained collection vs workload is making it hard for me to judge what I should be doing here
16:49:41 <tdawson> Personally, I agree with your idea about naming ... and I myself don't touch anything that starts with sst_
16:49:42 <Son_Goku> and there are other problems caused by this entanglement, but I don't want to go into it now because it'll take all day :)
16:51:32 <Son_Goku> anyway, I have no proposals or solutions, I wanted to bring this up mostly to discuss it and get it on everyone's radar
16:51:36 <sgallagh> OK, so is there any action to take here right now, or should we move this to a separate discussion?
16:51:38 <sgallagh> Thanks
16:52:35 <tdawson> So, looking at the files in your pull request, there is nothing about it that says it is a workload, other than the yaml lable "document:" ... and it looks like that is so hard coded, it also still has "feedback-pipeline" in it.
16:53:13 <Son_Goku> there's also going to be some community work around bringing in VFX components into Fedora and I would absolutely like to track them in ELN too
16:53:33 <sgallagh> FWIW, `feedback-pipeline-workload` and `content-resolver-workload` are synonyms in the code.
16:53:34 <tdawson> Anyway ... I see your point.
16:54:22 <tdawson> I'm fine moving on.  Nothing more from me.
16:55:00 <sgallagh> OK, thanks for bringing this up, Son_Goku
16:55:04 <sgallagh> #topic Open Floor
16:55:15 <sgallagh> We've got about five minutes left if anyone has any other topics
16:55:23 <tdawson> Nothing from me
16:55:42 <Son_Goku> nothing from me
16:55:50 <dcavalca> all good, thanks everyone
16:57:26 <sgallagh> #endmeeting