17:00:13 #startmeeting ELN SIG 2022-03-11 17:00:13 Meeting started Fri Mar 11 17:00:13 2022 UTC. 17:00:13 This meeting is logged and archived in a public location. 17:00:13 The chair is sgallagh. Information about MeetBot at https://fedoraproject.org/wiki/Zodbot#Meeting_Functions. 17:00:13 Useful Commands: #action #agreed #halp #info #idea #link #topic. 17:00:13 The meeting name has been set to 'eln_sig_2022-03-11' 17:00:13 #meetingname eln 17:00:13 The meeting name has been set to 'eln' 17:00:13 #topic Init Process 17:00:27 .hello2 17:00:28 sgallagh: sgallagh 'Stephen Gallagher' 17:00:53 .hi 17:00:56 dcavalca: dcavalca 'Davide Cavalca' 17:01:10 .hi 17:01:11 fweimer: fweimer 'Florian Weimer' 17:01:24 .hi 17:01:25 cyberpear: cyberpear 'James Cassell' 17:04:07 Thanks for joining, folks. I wasn't sure how many would, since I forgot to send out the agenda. 17:04:18 #topic Status Report 17:04:48 We had a long (about two weeks) span where we didn't get a valid compose, but I resolved that two days ago. 17:05:15 Unfortunately, something changed yesterday and now our compose is getting canceled before it can finish due to a timeout during the image-build phase. 17:05:19 So that's fun. 17:05:45 I'm in the process of tracking down where that timeout value is set and extending it, though it may not be today. 17:06:12 #info Composes are timing out and failing, but we did get a good one on Mar. 9 17:06:46 Additionally, jforbes and I gave a recorded talk and Q&A session on ELN to the Fedora Council yesterday. 17:06:58 I don't think the recording is published yet, but look for it in the coming days. 17:07:59 They will be published to the Fedora YouTube channel: https://www.youtube.com/channel/UCnIfca4LPFVn8-FjpPVc1ow 17:08:29 #info ELN SIG spoke to the Fedora Council yesterday, see recording at https://www.youtube.com/channel/UCnIfca4LPFVn8-FjpPVc1ow when available 17:08:42 And my last update: 17:08:46 I didn't even know there was a Fedora youtube chhannel 17:09:19 * cyberpear turns on notifications 17:10:41 I've been actively working on getting ELN-Extras up and running. 17:11:10 A first-draft of the changes to DistroBuildSync to enable this are available as a merge request at https://gitlab.com/redhat/centos-stream/ci-cd/distrosync/distrobuildsync/-/merge_requests/37 17:11:45 #info ELN-Extras is in progress, but slow going due to other commitments. Reviews requested for https://gitlab.com/redhat/centos-stream/ci-cd/distrosync/distrobuildsync/-/merge_requests/37 17:12:03 That's all I have to report this week. 17:12:26 Does anyone have something they'd like to discuss/ask/announce? 17:13:15 Could we discuss the x86-64-v3 switch? Do we have quorum? 17:13:39 "Quorum" is rather slushy here, but I'd say it's fine to discuss it at least. 17:13:54 #topic x86_64-v3 17:14:19 I'll also ping Conan Kudo since I know this is important to him. 17:14:19 To recap, the goal is to switch redhat-rpm-config and gcc so that everything is built to the x86-64-v3 ISA baseline on x86_64 (basically AVX2). 17:14:27 * Eighth_Doctor waves 17:14:53 is x86_64-v3 a good idea given that Intel is making CPUs that don't support AVX2? 17:15:44 The most recent efficiency cores support AVX2. 17:15:56 fweimer: Do you have data that demonstrates a significant improvement caused by this change? 17:16:25 Are we looking at a major increase in performance? On certain workloads or across the board, etc.? 17:16:27 Code is smaller because the VEX encoding is more efficient. Other kinds of benchmarks are of course difficult. 17:17:06 By "code is smaller" do you mean physical space on disk? 17:17:29 Yes, and there's also slightly reduced i-cache pressure, it runs faster. 17:17:49 as a data point, we still have plenty of non-AVX2 systems in production 17:17:59 though by the time RHEL 11 comes out that might be less of an issue in practice 17:18:00 But do you run ELN on them? 17:18:01 Any kind of numbers on what "faster" looks like? Even if they're artificial. 17:18:17 fweimer: not yet, but working on it 17:18:20 dcavalca: A change made to ELN today would impact RHEL 10, not 11 17:18:25 Well, this would be RHEL 10 timeline, not 11 right? 17:18:31 oh sure lol 17:18:39 for some reason I had in my head that we were already at 11 :) 17:18:44 fweimer: ELN is RHELish, so we should _always_ treat such changes as if someone were to roll that out today on RHEL 17:19:09 Eighth_Doctor: not quite 17:19:20 More to the point, we want people to be testing against ELN early so they have fewer surprises from the next RHEL 17:19:37 Right, and we had basically one single blocker for this change in RHEL 9, and I do not expect this to be an issue for RHEL 10. 17:20:04 AMD didn't support AVX2 until 2015 on some processors, and it wasn't supported fully until Ryzen in 2017 17:20:04 ELN is RHEL nextish, so we should always treat such changes as if someone were to roll them out on the next RHEL. rather large difference given the release cycle of RHEL 17:20:11 fweimer: Can you elaborate on that? 17:21:07 The RHEL 9 blog post https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level 17:21:24 has a few bullet items under the Recommendations section. 17:22:04 I believe these bullet will not apply anymore next year or so. 17:22:08 also forcing AVX means that virtualizing RHEL is way more painful 17:22:43 On the flip side, we ensure that our customers configure their hypervisors correctly, and ISVs can assume that these instructions are always available. 17:23:02 yes, but RHEL deliberately scopes out developer use-cases 17:23:12 this is a huge problem that makes RHEL more difficult 17:23:16 Today, anyone shipping AVX2-only software, even for specialized applications, will have to shoulder the task of cleaning out broken hypervisor configurations. 17:23:44 e.g. no testing for VMware Fusion, Parallels, or VirtualBox, which are common for developers 17:23:45 If we force x86-64-v3, we solve the issue for the ecosystem. 17:23:52 fweimer: out of curiosity, is shipping AVX2-only software common nowadays? 17:23:53 and two out of three break on x86-64-v3 17:24:40 fweimer: I know we can do this with KVM, but what about other supported hypervisors? 17:24:40 which means Vagrant breaks pretty badly 17:24:46 dcavalca: I think it's very difficult today. 17:25:05 Shipping AVX2-only software, I mean. 17:25:58 I don't want this to become a pile-on, so that's why I was asking fweimer to illuminate some of the positives so that we can weigh them against the negatives. 17:26:31 the main things I know of that use AVX are codecs 17:26:48 but RHEL doesn't ship any codec implementations that take advantage of that right now 17:27:05 Eighth_Doctor: machine learning 17:27:16 yes, but RHEL ships no machine learning software 17:27:26 so that's pretty much irrelevant 17:27:40 Eighth_Doctor: but RHEL supports a whole lot of users developing and deploying it 17:27:41 we don't even have machine learning software in Fedora 17:27:46 e.g. no pytorch, tensorflow, etc. 17:27:58 jforbes: yes, but that doesn't require RHEL _itself_ to be built against AVX2 17:28:15 that just requires the compiler to be capable of using those instructions, which has been possible for several RHEL releases 17:28:29 I do this today for some things now 17:29:00 Does guarantee that their customers will be able to run it though. 17:29:37 if they're already going to be compiling it or getting it with a container or a HPC stack, it's already definitely compiled that way 17:30:15 But RHEL business case is much less interesting from an ELN perspective, and certainly any discussion here is likely irrelevant to RHEL 10 planning. There are some other possible advantages to raising the baseline 17:30:26 I think jforbes means that if they install e.g. tensorflow on a RHEL 10 machine with this baseline, they know it will run 17:30:31 is the idea that by virtue of requiring AVX2 in RHEL 10, we'd make it easier for folks to ship AVX2 software knowing that they won't have to worry about the underlying hardware supporting it? 17:30:41 Sorry, why is the RHEL business case not relevant to ELN? 17:30:51 Whereas if they install it on a RHEL 9 machine today, they need to know ahead of time if the CPU supports AVX2 or they'll get unintuitive failures 17:31:02 Unfortunately, it is difficult to test those without a platform to test it. ELN would be an interesting test bed 17:31:04 The RHEL business case is absolutely relevant 17:31:28 dcavalca: That was the statement I was just making, yes. 17:31:32 dcavalca: Yes, that's the main benefit that I see. 17:31:57 Most code that ships with RHEL is not expected to run that much because it's purely administrative overhead. 17:32:20 (Language interpreters are the exception, of course.) 17:32:37 fweimer: it is relevant in that we don't do anything for ELN without an idea that there will be a business case, it is not relevant in that discussions among ELN sig is going to have very little impact on the business case. Results of ELN, very much so 17:33:23 jforbes: so you're saying that this is all for naught and I should just not bother discussing this, because it's going to be done anyway? 17:33:35 I don't think that's what he said at all 17:33:55 * dcavalca is finding this discussion useful fwiw 17:34:27 from my perspective, I don't have any infrastructure that will be able to run a RHEL 10 with x86_64-v3 17:34:33 and I doubt I'll have it in three years time either 17:35:03 Eighth_Doctor: No, I was not saying that. I was saying that a better case can be made as a result of testing. 17:35:06 I think the questions we need to ask ourselves are these: "What percentage of users would we alienate if we assume RHEL 10 would use this as the baseline?" and "Is that an acceptable number?" 17:35:32 I'd also recommend thinking about how to communicate this, should the change go ahead 17:35:33 we can basically assume hardware floor of x86_64-v3 is the same floor as what Windows 11 has 17:35:48 refreshing hardware can take a while, and folks will want to know well in advance that they should plan for it 17:36:04 because the AVX instructions became "universal" (i.e. on both AMD and Intel) in 2017 platforms 17:36:05 sgallagh: Not doing it means alienating IHVs in general (not just CPU vendors). 17:36:21 sgallagh: not just users in absolute, but usecases in general 17:36:24 I would say Windows 11 is a good bit more forgiving than that, originally that was the case, but they opened it up a bit more 17:36:31 Eighth_Doctor: I don't think that's true, Microsoft hasn't even adopted x86-64-v2 yet. 17:36:37 dcavalca: Right 17:36:38 e.g. I'd expect this would disproportionately impact embedded/appliance scenarios 17:36:54 Their requirements were all about security hardware 17:36:55 which tend to use older/cheaper/crappier CPUs 17:36:59 yeah, I can say extremely confidently that my fleet would be ruined by this change 17:37:21 jforbes: it has the same impact, since Ryzen v1 and 7th gen Intel are the CPU floors 17:38:01 I do think it is an interesting case to have a build (even if a side tag) where this could be tested/benchmarked/compared against our current floor 17:38:11 Stephen Gallagher, jforbes: RHEL for Edge would be in serious trouble with x86_64-v3 17:38:16 Eighth_Doctor: AVX2 is 4th generation Core. 17:38:22 (Haswell) 17:38:33 fweimer: it was introduced then, but not universally in Intel's lineup 17:38:39 that doesn't happen until 8th generation 17:39:00 and then it goes away (sort of) in 12th generation where Intel wants AVX not available on E-cores 17:39:07 only P-cores 17:39:11 Eighth_Doctor: Not every Intel CPU in production is supported by RHEL. 17:39:20 fweimer: I think the concern is that often folks will qualify a specific CPU for a project or a product, and keep it throughout its lifetime so they don't have to recertify it 17:39:20 IHVs certify only a subset. 17:39:29 I could definitely see some of these being impacted by this change 17:39:47 fweimer: yes, but IHVs aren't the entirety of the RHEL ecosystem 17:40:15 I don't know if RH has the ability to survey customers to see if/how they'd be impacted, but if you do I'd recommend trying to gather some data around this 17:40:16 Right, RHEL has never supported a lot of the older bits. Even with drivers turned on, there is code that drops certain pci ids 17:40:26 integrators, providers, ISVs, etc. also matter 17:40:35 from a community standpoint, as I said before if we do this, we should communicate it far and wide as early as possible 17:40:48 dcavalca: Having ELN images/composes/container images would help with that. 17:41:01 Although ld.so --help works as a quick check, too. 17:41:01 indeed, a lot of CentOS people are going to be left in the lurch too 17:41:02 Conan Kudo: Could you elaborate on the Edge issue. Is that just because of the usage of lightweight, budget CPUs? 17:41:06 but I was avoiding mentioning that since that doesn't matter to folks in this context 17:41:26 Stephen Gallagher: it's that and that machines have elongated hardware life cycles 17:41:37 How often are Intel chips used in Edge vs. ARM? 17:41:49 sgallagh: lightweight, budget CPUs that were picked a decade ago 17:41:53 Eighth_Doctor: CentOS stream wouldn't see this until Fedora 40ish 17:42:01 I still have hardware out at customers that was deployed in 2012 running today with the latest hardware 17:42:08 err latest software 17:42:15 Intel Atom is fairly common for things that don't need battery power 17:42:26 yup 17:42:27 Intel is in a decent amount of edge bits 17:42:29 Eighth_Doctor: But these long-term systems wouldn't upgrade the OS during their life-time, would they? 17:42:40 e.g. stuff like sat decoders or point of sale systems or ATM machines 17:42:41 yes they do 17:42:42 we upgrade the OS regularly over the lifetime 17:42:55 e.g. we've had stuff start on Ubuntu 10.04 that run 20.04 now 17:43:23 I'm not sure we see that much for deeply-embedded RHEL. 17:43:23 I don't we've ever had to cut off hardware in the time I've worked here 17:44:38 My preference is if you could have this conversation with Red Hat. 8-) 17:44:56 I do not think ELN should set RHEL policy in this way. 17:45:36 it kind of does though? pretending it doesn't is weird to me 17:45:47 well, ELN is what is expected to show up in the next RHEL, so it does effectively set expectations at least 17:46:47 Right, but if you don't want something in RHEL, that has to be communicated to RHEL engineering. 17:46:58 Information needs to flow both ways, indeed. 17:47:08 And I think the points about RHEL for Edge are important ones that we should take to the RHEL Program Management team to discuss. 17:47:53 Either way, I am interested in the data from this. I wonder if doing a sidetag build/compose would be worth the effort 17:47:57 I increasingly wonder why we're so worried about having flavor build composes anyway 17:48:15 For ELN or for RHEL? 17:48:29 because of the ODCS stuff, it should be possible for us to do x86_64-v2 and x86_64-v3 in parallel and produce artifacts for both 17:48:33 in general, but especially for ELN and RHEL 17:48:40 honestly, I think it'd be useful for Fedora too 17:48:56 Well, in the case of RHEL, it's an issue of QA and support. 17:49:06 I remember when we did have those multi-flavor things for x86_32 and while it was painful due to immature technology, we did have them 17:49:09 Do you mean multiple RPM builds? Like ppc64p7? 17:49:09 It would essentially be adding new architectures to the support matrix 17:49:10 Eighth_Doctor: That is a much longer discussion, and I have recommended several times how dnf might be enhanced to support such a thing 17:49:25 jforbes: OpenMandriva developed patches to do it 17:49:34 since they ship znver1 and x86_64 17:49:53 since I switched OMV to DNF years ago, and helped make that support work alongside bero 17:50:12 they are not upstream because PRs to rpm around all this stuff is stalled 17:50:38 fweimer: yes 17:50:48 Ideally, it should look at flags on the system. I say 'dnf install foo' and it will select the best match. If I have avx2 support on my system and the package has an avx2 build, it will install it, otherwise it falls back to whatever the best mathc is. 17:50:56 but yes, I already know how to enhance RPM and DNF to do this properly 17:51:23 I did it once for Ryzen for OpenMandriva, and there's no reason I couldn't do it again for generic x86_64-vX levels 17:51:25 It was one of the better features of conary from a package management standpoint. 17:52:15 indeed 17:52:29 We're coming up on the top of the hour, so I'd like to try to record some #info and #actions here. 17:52:38 But then ISVs would have to support multiple builds, too. They really hate that. 17:52:52 fweimer: no, they don't, they just have to declare what they support 17:52:55 that's not different from today 17:53:03 #chair fweimer jforbes Conan_Kudo dcavalca 17:53:03 Current chairs: Conan_Kudo dcavalca fweimer jforbes sgallagh 17:53:31 If you could add your primary concerns as #info, that would be very helpful. 17:53:38 fweimer: under such a system, they can support mutliple builds, they certainly do not have to 17:53:48 I will also take it on to discuss with the RHEL Program about Edge 17:54:10 #action sgallagh to discuss the impact of x86_64-v3 on RHEL for Edge with RHEL Pgm 17:54:43 If I want to build for x86_64 I can. If I want to build a version that specifically uses avx2 or avx512 I can do that as well. If I build all 3, it should autimatically select the best version for my system 17:55:20 #info Important points raised: v3 was not universally available until 2017, which means existing hardware may need replacement to run 17:55:28 jforbes: The point is that if you want to build for v3, you need to deliver a lower baseline as well unless the OS has cleaned the pipes for you. 17:55:30 #info non-AVX2 hardware is still in production, and widely deployed in long-lifecycle environments (emdedded/pos/appliances/etc.) that could be impacted by this change 17:55:31 If I am building software that gets no advantage from avx at all, I just do a single build of the baseline. The idea is flags are a "preference" not a hard arch 17:55:34 #info Concerns about ISV and integrator cases for x86_64-v3 for RHEL. Would prefer to see us look into supporting multiple flavors for x86_64 in parallel like done in the past with ppc64 with p7 and x86_32 with i586 and i686 17:55:47 #info Edge devices frequently use low-power Intel chips such as Atom and many (most?) don't have v3 support 17:56:30 And the flags should be extensible to the point that people really could add flags for a variety of reasons. It doesn't all have to be hardware support 17:56:47 #info Increasing the baseline will lead to smaller code (both in less disk space and higher efficiency) 17:56:51 #info On the other hand, multiple builds in the OS do not clean the pipes for ISVs, which would have to deliver multiple builds as well to avoid support issues. 17:56:54 jforbes: yeah, ideally we could have dynamic hardware provide/supplement/requires magic around ISA extensions 17:57:20 #info The guaranteed availability of AVX2 extensions would benefit high-performance computing, particularly machine-learning. 17:57:26 that would be massively useful for ARM and RISC-V, both of which have massively more fragmented ISA bases 17:58:32 #info OpenMandriva has an implementation of x86_64 flavors we can use as a basis to understand implementing a similar system for supporting x86_64-vX levels in parallel. 17:59:54 If I had to guess, I would expect that very little real benefit would exist for many packages, so the number of things which needed to support multiple builds would be small. An ISV could even make a flag a hard requirement if they didn't want to support older builds 18:00:19 yup 18:01:18 OK, we are officially over time now. Any last #info? 18:01:22 While it works as a preference by default (often meaning users don't even have to know their system, we can detect and DTRT), a flag can also be a hard req. 18:01:38 nothing else from me. 18:01:45 nothing else from me 18:02:00 nope, this was a good discussion, thanks folks 18:02:43 Indeed, thank you to everyone who participated in it. 18:03:06 And especially to those who were working towards a possible compromise solution, which I think has merit. 18:03:44 #endmeeting