16:00:07 #startmeeting ELN (2023-11-03) 16:00:07 Meeting started Fri Nov 3 16:00:07 2023 UTC. 16:00:07 This meeting is logged and archived in a public location. 16:00:07 The chair is sgallagh. Information about MeetBot at https://fedoraproject.org/wiki/Zodbot#Meeting_Functions. 16:00:07 Useful Commands: #action #agreed #halp #info #idea #link #topic. 16:00:07 The meeting name has been set to 'eln_(2023-11-03)' 16:00:07 #meetingname eln 16:00:07 The meeting name has been set to 'eln' 16:00:07 #topic init process 16:00:07 .hi 16:00:08 sgallagh: sgallagh 'Stephen Gallagher' 16:01:09 Who do we have here today? 16:01:55 .hi 16:01:56 neil: neil 'Neil Hanlon' 16:02:30 .hi 16:02:31 dcavalca: dcavalca 'Davide Cavalca' 16:04:56 OK, small crowd today I guess 16:05:07 I guess we will get started. 16:05:19 #topic Agenda 16:05:37 Sorry for being late ... I'm here. 16:05:44 #info Agenda Item: Sysprof, frame-pointers and divergence between Fedora Linux and Fedora ELN 16:05:57 #info Agenda Item: Content Resolver Workloads: what do they mean? 16:07:21 We had a request to discuss https://pagure.io/fesco/issue/3068 as well, but I've added new updates there that people likely haven't had a chance to read yet. If there are things to discuss there, we can do it in Open Floor. 16:08:04 Similar answer for https://github.com/fedora-eln/eln/issues/170 which we ended up discussing a bit in the Matrix channel 16:08:12 Does anyone have other topics for today? 16:09:59 #topic Sysprof, frame-pointers and divergence between Fedora Linux and Fedora ELN 16:10:36 #link https://github.com/minimization/content-resolver-input/pull/1007 16:12:00 tl;dr There's a proposal to add sysprof to ELN (and RHEL 10 by extension), but there is question as to whether this is worthwhile since ELN doesn't support frame pointers. 16:12:32 I think having sysprof is great, for what it's worth 16:12:57 For a little background: Fedora ELN diverges from Fedora Linux in this situation because RHEL management has made (and, I believe, publicized) the decision not to enable frame pointers in RHEL 10 16:13:11 actually I don't think that was the case 16:13:20 Which part? 16:13:39 some (maybe helpful) context on frame pointers from the Fedora side: Daan ran the benchmarks again, and for x86_64 and aarch64 the impact is around or less than 1%; ppc64le already enables frame pointers by default (well, not exactly, but the outcome is the same) and has for a long time; we're currently looking into enabling backchain for s390x 16:13:39 (the arch-equivalent of frame pointers) and working with the upstream s390x folks on the necessary kernel enablement work on that, which should be landing soon 16:13:56 sgallagh: the part of RHEL management deciding something 16:14:20 the way you explained it to me was that status quo was being kept for now in ELN for a later decision 16:14:21 There's a recorded decision internally. It's decided. 16:14:37 is there a rationale for it? 16:15:07 Yes, I just found the Knowledge Base article for it: https://access.redhat.com/solutions/7041624 16:15:09 in general, without frame pointers profiling is a lot less useful; continuous profiling in particular becomes essentially impossible, as is troubleshooting issues on end user systems as they happen 16:16:30 sgallagh: uhh, that's *interesting* 16:17:00 I wonder if one of the reasons why there isn't more pressure is because golang forcibly enables frame pointers on all arches for Go builds? 16:17:27 I'm not going to engage in speculation. It won't lead to a useful outcome :) 16:17:31 lol 16:17:44 anyway, the article itself doesn't exactly provide a real rationale 16:18:00 sgallagh: something that would be useful to know is which workloads are negatively affected by frame pointers and how 16:18:03 it just handwaves it as not "enterprise" which is a very odd statement 16:18:18 There was considerable debate/discussion internally. I don't personally agree with the decision, but I understand that there were tradeoffs to be made in either direction. 16:18:20 obviously without naming specific customers, but if there's something that can be improved I'd love to get some eyes on it 16:19:04 right 16:19:05 Rather than rehashing the RHEL decision, we should discuss if there's anything we might want to do in Fedora ELN around this. 16:19:31 well, can we get some kind of workload sample for ELN that could be analyzed for perf and such? 16:19:32 I'm wary of diverging too far from RHEL, but I had an idea to bounce off of you 16:19:36 okay? 16:19:41 what's the idea? 16:21:00 As you know, RHEL/CentOS Stream 10 will be branching from Fedora in February. I propose that we enable frame pointers in Fedora ELN (now targeted at RHEL 11). We then commit to re-evaluate as we go into the Fedora release that will be branched from to create RHEL 11. 16:21:22 that seems reasonable 16:21:32 ah, you mean enabling them after it branches? yeah that's fine by me 16:21:41 and by then we should have s390x sorted out as well upstream 16:21:44 yup 16:21:51 So if RHEL decides not to enable frame pointers for RHEL 11, we have a full cycle to work out any hiccoughs, but we still get 2.5 years of frame pointers to tinker with. 16:22:08 well I hope by then they'll keep it 16:22:13 it's really nice having working flamegraphs 16:22:17 That sounds reasonable to me. 16:22:26 So do I 16:22:41 (Hope they keep it) 16:23:11 My hope is that improvements we get to profiling will lead to greater performance increases than the hit we take from having them 16:23:15 we could encode this in redhat-rpm-config as a rhel >= 11 enable now, that should make it automatically work? 16:23:25 I believe so, yes 16:23:39 sgallagh: the fact we've gotten somewhere between 30~60% perf improvements in the gtk stack alone from only having it for six months is pretty amazing 16:23:45 Though we probably would also want to consider a mass-rebuild to apply it 16:24:07 yes, but we can have that ready so we don't forget when we do the mass build the flip everything over to rhel 11 :) 16:24:22 but the mass rebuild is before or after branching? 16:24:30 after 16:24:44 There's no implied mass-rebuild when we switch to RHEL 11. 16:24:57 Generally, we'd just piggy-back on whenever Fedora does one. 16:25:05 right 16:25:12 But if we want this to apply immediately, we could plan for it 16:25:18 right, but f40 branching and eln->11 should coincide? 16:25:22 yes 16:25:36 but isn't the f40 mass rebuild *before* f40 branches? 16:25:41 oh you're right 16:26:22 https://fedorapeople.org/groups/schedule/f-40/f-40-all-tasks.html 16:26:28 the mass build is in January 16:28:15 sgallagh: can we trigger a build of our own after the rhel macro is bumped? 16:28:17 In the interest of time, let's table the mass-rebuild portion and just decide on the general plan 16:28:30 Any time we want; we can just bump the ELN buildroot number. 16:28:35 ah cool 16:28:37 That's why it's there 16:29:41 but yeah in terms of a plan, my thought was we can modify redhat-rpm-config now to incorporate it with rhel >= 11 16:30:10 and then we will get it automatically in February after the bump and rebuild 16:30:19 So, just to make it clear: the obvious potential issue with this plan is that if RHEL opts to continue without frame pointers in RHEL 11, we'll have been doing 2+ years of testing against builds that aren't entirely representative. 16:30:45 I personally think the benefits outweigh the risks there, but I want to make that clear so we know what we're agreeing to do :) 16:30:49 sure 16:30:53 yeah I think that's fine 16:31:03 I think this is the kind of thing where ELN can provide real value 16:31:28 because otherwise it's too hard to evaluate what it looks like to incorporate Fedora improvements into RHEL 16:31:55 we should never be resorting to the excuse of "not enterprise" to not include something in RHEL 16:33:58 There are definitely markets (Financial Services?) where a 2% performance hit would be a deal-breaker. I can understand that. I wish this didn't have to be a compile-time option, but you defend the country with the army you have 16:34:46 Anyway, let me record the plan, assuming no one wants to disagree? 16:34:50 yeah, though the spin I would give is that you can eke more of a gain out with the instrumentation ;) 16:35:03 but yeah, let's go with it 16:36:18 #agreed Fedora ELN will enable frame pointers in February, following the branching from Fedora for RHEL 10. We will re-evaluate the inclusion of frame pointers at the start of the Fedora release from which RHEL 11 will branch (giving us six months to test without them, if that is the decision). 16:36:31 ack 16:36:49 #topic Content Resolver Workloads: what do they mean? 16:37:08 Like a double-rainbow: no one is quite sure what a "workload" is anymore. 16:37:12 yeah 16:37:21 @Son_Goku, would you mind elaborating? 16:37:24 sure 16:37:49 this was triggered by my pull request: https://github.com/minimization/content-resolver-input/pull/972 16:38:29 I considered it quite reasonable to track cmake in the VFX platform definition because that is a "workload" as we classically defined it: things that people care about that make up a thing 16:38:49 but increasingly, it doesn't look like there _are_ workloads in content resolver anymore 16:39:14 cmake is... complicated, not sure I would extrapolate from that 16:39:17 it seems to be mostly subsystem team stuff, which doesn't really match up with what's actually going on 16:39:30 basically, everyone needs it but nobody wants to own it 16:39:38 yselkowitz: no, it's not really complicated... cmake is important as a vfx platform development interface 16:39:48 see that's the problem, workloads aren't supposed to be about ownership 16:39:58 Why not? 16:40:10 @Son_Goku But they're also the only mechanism we have for figuring ownership out 16:40:29 Or, maybe more accurately, figuring out when something is unowned 16:40:31 I agree that vfx is a sensible starting place for looking for a maintainer, but it's not necessarily the only one 16:40:45 again, that's not the point I'm making here 16:41:02 the point I'm making is that workloads and maintainers are orthogonal concepts 16:41:13 and at some point, someone conflated them and started using it that way 16:41:15 I think I'm missing that point, then. 16:41:35 Not to me ... to me ... if you put a package in your workload, then you are saying that you are responsible for it. 16:41:47 because you're not using it as a "workload" 16:41:56 you're using it as a "collection" 16:42:14 the sst stuff? those are collections 16:42:21 kde and vfx? those are workloads 16:42:38 I think I see where you're coming from. 16:42:55 hopefully, I don't know if I have more words to try to clarify it :P 16:42:57 But I'm not sure what you're proposing we do about it. 16:43:22 You're right that the effective meaning of the term has become an "owned collection of packages" 16:43:22 I'm not proposing anything right now, I'm asking what we should do about this conflation because it's confusing 16:43:53 Changing the existing terminology is actually harder than it seems, due to how the CR is actually implemented. 16:44:04 But we can probably try to document it better. 16:44:51 In the specific case of VFX, is the development of CMake itself part of that workload? 16:44:57 we also can encode some policy stuff here: naming the yaml files a specific way encodes particular team maintenance, and an omission of that pattern encodes regular workloads 16:45:18 sgallagh: cmake features are aggressively used as part of development of vfx components as such 16:45:36 Son_Goku: Although I think I understand what you are thinking for your definition of collections and workloads ... but I honestly don't see that as "my" definition of workload. 16:46:22 upstream, the aswf have asked about having cmake freshened regularly so that they can build stuff properly with things like new cuda and c++ features 16:47:04 the kde workload also really depends on cmake, though it's less aggressive at adopting new cmake features which makes it less of a problem 16:47:43 the way I view it "workloads" should track components that are needed for the explicit success of the workload 16:48:08 Ok ... and ... where did you get that definition? 16:48:21 from Adam Samalik :) 16:48:30 when minimization first started 16:48:41 Ahh ... ok 16:49:07 I'm fine with people disagreeing with that statement, but this entanglement of maintained collection vs workload is making it hard for me to judge what I should be doing here 16:49:41 Personally, I agree with your idea about naming ... and I myself don't touch anything that starts with sst_ 16:49:42 and there are other problems caused by this entanglement, but I don't want to go into it now because it'll take all day :) 16:51:32 anyway, I have no proposals or solutions, I wanted to bring this up mostly to discuss it and get it on everyone's radar 16:51:36 OK, so is there any action to take here right now, or should we move this to a separate discussion? 16:51:38 Thanks 16:52:35 So, looking at the files in your pull request, there is nothing about it that says it is a workload, other than the yaml lable "document:" ... and it looks like that is so hard coded, it also still has "feedback-pipeline" in it. 16:53:13 there's also going to be some community work around bringing in VFX components into Fedora and I would absolutely like to track them in ELN too 16:53:33 FWIW, `feedback-pipeline-workload` and `content-resolver-workload` are synonyms in the code. 16:53:34 Anyway ... I see your point. 16:54:22 I'm fine moving on. Nothing more from me. 16:55:00 OK, thanks for bringing this up, Son_Goku 16:55:04 #topic Open Floor 16:55:15 We've got about five minutes left if anyone has any other topics 16:55:23 Nothing from me 16:55:42 nothing from me 16:55:50 all good, thanks everyone 16:57:26 #endmeeting