17:00:13 <sgallagh> #startmeeting ELN SIG 2022-03-11
17:00:13 <zodbot> Meeting started Fri Mar 11 17:00:13 2022 UTC.
17:00:13 <zodbot> This meeting is logged and archived in a public location.
17:00:13 <zodbot> The chair is sgallagh. Information about MeetBot at https://fedoraproject.org/wiki/Zodbot#Meeting_Functions.
17:00:13 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
17:00:13 <zodbot> The meeting name has been set to 'eln_sig_2022-03-11'
17:00:13 <sgallagh> #meetingname eln
17:00:13 <zodbot> The meeting name has been set to 'eln'
17:00:13 <sgallagh> #topic Init Process
17:00:27 <sgallagh> .hello2
17:00:28 <zodbot> sgallagh: sgallagh 'Stephen Gallagher' <sgallagh@redhat.com>
17:00:53 <dcavalca> .hi
17:00:56 <zodbot> dcavalca: dcavalca 'Davide Cavalca' <dcavalca@fb.com>
17:01:10 <fweimer> .hi
17:01:11 <zodbot> fweimer: fweimer 'Florian Weimer' <fweimer@redhat.com>
17:01:24 <cyberpear> .hi
17:01:25 <zodbot> cyberpear: cyberpear 'James Cassell' <fedoraproject@cyberpear.com>
17:04:07 <sgallagh> Thanks for joining, folks. I wasn't sure how many would, since I forgot to send out the agenda.
17:04:18 <sgallagh> #topic Status Report
17:04:48 <sgallagh> We had a long (about two weeks) span where we didn't get a valid compose, but I resolved that two days ago.
17:05:15 <sgallagh> Unfortunately, something changed yesterday and now our compose is getting canceled before it can finish due to a timeout during the image-build phase.
17:05:19 <sgallagh> So that's fun.
17:05:45 <sgallagh> I'm in the process of tracking down where that timeout value is set and extending it, though it may not be today.
17:06:12 <sgallagh> #info Composes are timing out and failing, but we did get a good one on Mar. 9
17:06:46 <sgallagh> Additionally, jforbes and I gave a recorded talk and Q&A session on ELN to the Fedora Council yesterday.
17:06:58 <sgallagh> I don't think the recording is published yet, but look for it in the coming days.
17:07:59 <sgallagh> They will be published to the Fedora YouTube channel: https://www.youtube.com/channel/UCnIfca4LPFVn8-FjpPVc1ow
17:08:29 <sgallagh> #info ELN SIG spoke to the Fedora Council yesterday, see recording at https://www.youtube.com/channel/UCnIfca4LPFVn8-FjpPVc1ow when available
17:08:42 <sgallagh> And my last update:
17:08:46 <jforbes> I didn't even know there was a Fedora youtube chhannel
17:09:19 * cyberpear turns on notifications
17:10:41 <sgallagh> I've been actively working on getting ELN-Extras up and running.
17:11:10 <sgallagh> A first-draft of the changes to DistroBuildSync to enable this are available as a merge request at https://gitlab.com/redhat/centos-stream/ci-cd/distrosync/distrobuildsync/-/merge_requests/37
17:11:45 <sgallagh> #info ELN-Extras is in progress, but slow going due to other commitments. Reviews requested for https://gitlab.com/redhat/centos-stream/ci-cd/distrosync/distrobuildsync/-/merge_requests/37
17:12:03 <sgallagh> That's all I have to report this week.
17:12:26 <sgallagh> Does anyone have something they'd like to discuss/ask/announce?
17:13:15 <fweimer> Could we discuss the x86-64-v3 switch?  Do we have quorum?
17:13:39 <sgallagh> "Quorum" is rather slushy here, but I'd say it's fine to discuss it at least.
17:13:54 <sgallagh> #topic x86_64-v3
17:14:19 <sgallagh> I'll also ping Conan Kudo since I know this is important to him.
17:14:19 <fweimer> To recap, the goal is to switch redhat-rpm-config and gcc so that everything is built to the x86-64-v3 ISA baseline on x86_64 (basically AVX2).
17:14:27 * Eighth_Doctor waves
17:14:53 <Eighth_Doctor> is x86_64-v3 a good idea given that Intel is making CPUs that don't support AVX2?
17:15:44 <fweimer> The most recent efficiency cores support AVX2.
17:15:56 <sgallagh> fweimer: Do you have data that demonstrates a significant improvement caused by this change?
17:16:25 <sgallagh> Are we looking at a major increase in performance? On certain workloads or across the board, etc.?
17:16:27 <fweimer> Code is smaller because the VEX encoding is more efficient. Other kinds of benchmarks are of course difficult.
17:17:06 <sgallagh> By "code is smaller" do you mean physical space on disk?
17:17:29 <fweimer> Yes, and there's also slightly reduced i-cache pressure, it runs faster.
17:17:49 <dcavalca> as a data point, we still have plenty of non-AVX2 systems in production
17:17:59 <dcavalca> though by the time RHEL 11 comes out that might be less of an issue in practice
17:18:00 <fweimer> But do you run ELN on them?
17:18:01 <sgallagh> Any kind of numbers on what "faster" looks like? Even if they're artificial.
17:18:17 <dcavalca> fweimer: not yet, but working on it
17:18:20 <sgallagh> dcavalca: A change made to ELN today would impact RHEL 10, not 11
17:18:25 <jforbes> Well, this would be RHEL 10 timeline, not 11 right?
17:18:31 <dcavalca> oh sure lol
17:18:39 <dcavalca> for some reason I had in my head that we were already at 11 :)
17:18:44 <Eighth_Doctor> fweimer: ELN is RHELish, so we should _always_ treat such changes as if someone were to roll that out today on RHEL
17:19:09 <jforbes> Eighth_Doctor: not quite
17:19:20 <sgallagh> More to the point, we want people to be testing against ELN early so they have fewer surprises from the next RHEL
17:19:37 <fweimer> Right, and we had basically one single blocker for this change in RHEL 9, and I do not expect this to be an issue for RHEL 10.
17:20:04 <Eighth_Doctor> AMD didn't support AVX2 until 2015 on some processors, and it wasn't supported fully until Ryzen in 2017
17:20:04 <jforbes> ELN is RHEL nextish, so we should always treat such changes as if someone were to roll them out on the next RHEL.  rather large difference given the release cycle of RHEL
17:20:11 <sgallagh> fweimer: Can you elaborate on that?
17:21:07 <fweimer> The RHEL 9 blog post https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level
17:21:24 <fweimer> has a few bullet items under the Recommendations section.
17:22:04 <fweimer> I believe these bullet will not apply anymore next year or so.
17:22:08 <Eighth_Doctor> also forcing AVX means that virtualizing RHEL is way more painful
17:22:43 <fweimer> On the flip side, we ensure that our customers configure their hypervisors correctly, and ISVs can assume that these instructions are always available.
17:23:02 <Eighth_Doctor> yes, but RHEL deliberately scopes out developer use-cases
17:23:12 <Eighth_Doctor> this is a huge problem that makes RHEL more difficult
17:23:16 <fweimer> Today, anyone shipping AVX2-only software, even for specialized applications, will have to shoulder the task of cleaning out broken hypervisor configurations.
17:23:44 <Eighth_Doctor> e.g. no testing for VMware Fusion, Parallels, or VirtualBox, which are common for developers
17:23:45 <fweimer> If we force x86-64-v3, we solve the issue for the ecosystem.
17:23:52 <dcavalca> fweimer: out of curiosity, is shipping AVX2-only software common nowadays?
17:23:53 <Eighth_Doctor> and two out of three break on x86-64-v3
17:24:40 <jforbes> fweimer: I know we can do this with KVM, but what about other supported hypervisors?
17:24:40 <Eighth_Doctor> which means Vagrant breaks pretty badly
17:24:46 <fweimer> dcavalca: I think it's very difficult today.
17:25:05 <fweimer> Shipping AVX2-only software, I mean.
17:25:58 <sgallagh> I don't want this to become a pile-on, so that's why I was asking fweimer to illuminate some of the positives so that we can weigh them against the negatives.
17:26:31 <Eighth_Doctor> the main things I know of that use AVX are codecs
17:26:48 <Eighth_Doctor> but RHEL doesn't ship any codec implementations that take advantage of that right now
17:27:05 <jforbes> Eighth_Doctor: machine learning
17:27:16 <Eighth_Doctor> yes, but RHEL ships no machine learning software
17:27:26 <Eighth_Doctor> so that's pretty much irrelevant
17:27:40 <jforbes> Eighth_Doctor: but RHEL supports a whole lot of users developing and deploying it
17:27:41 <Eighth_Doctor> we don't even have machine learning software in Fedora
17:27:46 <Eighth_Doctor> e.g. no pytorch, tensorflow, etc.
17:27:58 <Eighth_Doctor> jforbes: yes, but that doesn't require RHEL _itself_ to be built against AVX2
17:28:15 <Eighth_Doctor> that just requires the compiler to be capable of using those instructions, which has been possible for several RHEL releases
17:28:29 <Eighth_Doctor> I do this today for some things now
17:29:00 <jforbes> Does guarantee that their customers will be able to run it though.
17:29:37 <Eighth_Doctor> if they're already going to be compiling it or getting it with a container or a HPC stack, it's already definitely compiled that way
17:30:15 <jforbes> But RHEL business case is much less interesting from an ELN perspective, and certainly any discussion here is likely irrelevant to RHEL 10 planning.  There are some other possible advantages to raising the baseline
17:30:26 <sgallagh> I think jforbes means that if they install e.g. tensorflow on a RHEL 10 machine with this baseline, they know it will run
17:30:31 <dcavalca> is the idea that by virtue of requiring AVX2 in RHEL 10, we'd make it easier for folks to ship AVX2 software knowing that they won't have to worry about the underlying hardware supporting it?
17:30:41 <fweimer> Sorry, why is the RHEL business case not relevant to ELN?
17:30:51 <sgallagh> Whereas if they install it on a RHEL 9 machine today, they need to know ahead of time if the CPU supports AVX2 or they'll get unintuitive failures
17:31:02 <jforbes> Unfortunately, it is difficult to test those without a platform to test it.  ELN would be an interesting test bed
17:31:04 <sgallagh> The RHEL business case is absolutely relevant
17:31:28 <sgallagh> dcavalca: That was the statement I was just making, yes.
17:31:32 <fweimer> dcavalca: Yes, that's the main benefit that I see.
17:31:57 <fweimer> Most code that ships with RHEL is not expected to run that much because it's purely administrative overhead.
17:32:20 <fweimer> (Language interpreters are the exception, of course.)
17:32:37 <jforbes> fweimer: it is relevant in that we don't do anything for ELN without an idea that there will be a business case, it is not relevant in that discussions among ELN sig is going to have very little impact on the business case. Results of ELN, very much so
17:33:23 <Eighth_Doctor> jforbes: so you're saying that this is all for naught and I should just not bother discussing this, because it's going to be done anyway?
17:33:35 <sgallagh> I don't think that's what he said at all
17:33:55 * dcavalca is finding this discussion useful fwiw
17:34:27 <Eighth_Doctor> from my perspective, I don't have any infrastructure that will be able to run a RHEL 10 with x86_64-v3
17:34:33 <Eighth_Doctor> and I doubt I'll have it in three years time either
17:35:03 <jforbes> Eighth_Doctor: No, I was not saying that. I was saying that a better case can be made as a result of testing.
17:35:06 <sgallagh> I think the questions we need to ask ourselves are these: "What percentage of users would we alienate if we assume RHEL 10 would use this as the baseline?" and "Is that an acceptable number?"
17:35:32 <dcavalca> I'd also recommend thinking about how to communicate this, should the change go ahead
17:35:33 <Eighth_Doctor> we can basically assume hardware floor of x86_64-v3 is the same floor as what Windows 11 has
17:35:48 <dcavalca> refreshing hardware can take a while, and folks will want to know well in advance that they should plan for it
17:36:04 <Eighth_Doctor> because the AVX instructions became "universal" (i.e. on both AMD and Intel) in 2017 platforms
17:36:05 <fweimer> sgallagh: Not doing it means alienating IHVs in general (not just CPU vendors).
17:36:21 <dcavalca> sgallagh: not just users in absolute, but usecases in general
17:36:24 <jforbes> I would say Windows 11 is a good bit more forgiving than that, originally that was the case, but they opened it up a bit more
17:36:31 <fweimer> Eighth_Doctor: I don't think that's true, Microsoft hasn't even adopted x86-64-v2 yet.
17:36:37 <sgallagh> dcavalca: Right
17:36:38 <dcavalca> e.g. I'd expect this would disproportionately impact embedded/appliance scenarios
17:36:54 <jforbes> Their requirements were all about security hardware
17:36:55 <dcavalca> which tend to use older/cheaper/crappier CPUs
17:36:59 <Eighth_Doctor> yeah, I can say extremely confidently that my fleet would be ruined by this change
17:37:21 <Eighth_Doctor> jforbes: it has the same impact, since Ryzen v1 and 7th gen Intel are the CPU floors
17:38:01 <jforbes> I do think it is an interesting case to have a build (even if a side tag) where this could be tested/benchmarked/compared against our current floor
17:38:11 <Eighth_Doctor> Stephen Gallagher, jforbes: RHEL for Edge would be in serious trouble with x86_64-v3
17:38:16 <fweimer> Eighth_Doctor: AVX2 is 4th generation Core.
17:38:22 <fweimer> (Haswell)
17:38:33 <Eighth_Doctor> fweimer: it was introduced then, but not universally in Intel's lineup
17:38:39 <Eighth_Doctor> that doesn't happen until 8th generation
17:39:00 <Eighth_Doctor> and then it goes away (sort of) in 12th generation where Intel wants AVX not available on E-cores
17:39:07 <Eighth_Doctor> only P-cores
17:39:11 <fweimer> Eighth_Doctor: Not every Intel CPU in production is supported by RHEL.
17:39:20 <dcavalca> fweimer: I think the concern is that often folks will qualify a specific CPU for a project or a product, and keep it throughout its lifetime so they don't have to recertify it
17:39:20 <fweimer> IHVs certify only a subset.
17:39:29 <dcavalca> I could definitely see some of these being impacted by this change
17:39:47 <Eighth_Doctor> fweimer: yes, but IHVs aren't the entirety of the RHEL ecosystem
17:40:15 <dcavalca> I don't know if RH has the ability to survey customers to see if/how they'd be impacted, but if you do I'd recommend trying to gather some data around this
17:40:16 <jforbes> Right, RHEL has never supported a lot of the older bits.  Even with drivers turned on, there is code that drops certain pci ids
17:40:26 <Eighth_Doctor> integrators, providers, ISVs, etc. also matter
17:40:35 <dcavalca> from a community standpoint, as I said before if we do this, we should communicate it far and wide as early as possible
17:40:48 <fweimer> dcavalca: Having ELN images/composes/container images would help with that.
17:41:01 <fweimer> Although ld.so --help works as a quick check, too.
17:41:01 <Eighth_Doctor> indeed, a lot of CentOS people are going to be left in the lurch too
17:41:02 <sgallagh> Conan Kudo: Could you elaborate on the Edge issue. Is that just because of the usage of lightweight, budget CPUs?
17:41:06 <Eighth_Doctor> but I was avoiding mentioning that since that doesn't matter to folks in this context
17:41:26 <Eighth_Doctor> Stephen Gallagher: it's that and that machines have elongated hardware life cycles
17:41:37 <sgallagh> How often are Intel chips used in Edge vs. ARM?
17:41:49 <dcavalca> sgallagh: lightweight, budget CPUs that were picked a decade ago
17:41:53 <jforbes> Eighth_Doctor: CentOS stream wouldn't see this until Fedora 40ish
17:42:01 <Eighth_Doctor> I still have hardware out at customers that was deployed in 2012 running today with the latest hardware
17:42:08 <Eighth_Doctor> err latest software
17:42:15 <dcavalca> Intel Atom is fairly common for things that don't need battery power
17:42:26 <Eighth_Doctor> yup
17:42:27 <jforbes> Intel is in a decent amount of edge bits
17:42:29 <fweimer> Eighth_Doctor: But these long-term systems wouldn't upgrade the OS during their life-time, would they?
17:42:40 <dcavalca> e.g. stuff like sat decoders or point of sale systems or ATM machines
17:42:41 <Eighth_Doctor> yes they do
17:42:42 <Eighth_Doctor> we upgrade the OS regularly over the lifetime
17:42:55 <Eighth_Doctor> e.g. we've had stuff start on Ubuntu 10.04 that run 20.04 now
17:43:23 <fweimer> I'm not sure we see that much for deeply-embedded RHEL.
17:43:23 <Eighth_Doctor> I don't we've ever had to cut off hardware in the time I've worked here
17:44:38 <fweimer> My preference is if you could have this conversation with Red Hat. 8-)
17:44:56 <fweimer> I do not think ELN should set RHEL policy in this way.
17:45:36 <Eighth_Doctor> it kind of does though? pretending it doesn't is weird to me
17:45:47 <dcavalca> well, ELN is what is expected to show up in the next RHEL, so it does effectively set expectations at least
17:46:47 <fweimer> Right, but if you don't want something in RHEL, that has to be communicated to RHEL engineering.
17:46:58 <sgallagh> Information needs to flow both ways, indeed.
17:47:08 <sgallagh> And I think the points about RHEL for Edge are important ones that we should take to the RHEL Program Management team to discuss.
17:47:53 <jforbes> Either way, I am interested in the data from this. I wonder if doing a sidetag build/compose would be worth the effort
17:47:57 <Eighth_Doctor> I increasingly wonder why we're so worried about having flavor build composes anyway
17:48:15 <sgallagh> For ELN or for RHEL?
17:48:29 <Eighth_Doctor> because of the ODCS stuff, it should be possible for us to do x86_64-v2 and x86_64-v3 in parallel and produce artifacts for both
17:48:33 <Eighth_Doctor> in general, but especially for ELN and RHEL
17:48:40 <Eighth_Doctor> honestly, I think it'd be useful for Fedora too
17:48:56 <sgallagh> Well, in the case of RHEL, it's an issue of QA and support.
17:49:06 <Eighth_Doctor> I remember when we did have those multi-flavor things for x86_32 and while it was painful due to immature technology, we did have them
17:49:09 <fweimer> Do you mean multiple RPM builds? Like ppc64p7?
17:49:09 <sgallagh> It would essentially be adding new architectures to the support matrix
17:49:10 <jforbes> Eighth_Doctor: That is a much longer discussion, and I have recommended several times how dnf might be enhanced to support such a thing
17:49:25 <Eighth_Doctor> jforbes: OpenMandriva developed patches to do it
17:49:34 <Eighth_Doctor> since they ship znver1 and x86_64
17:49:53 <Eighth_Doctor> since I switched OMV to DNF years ago, and helped make that support work alongside bero
17:50:12 <Eighth_Doctor> they are not upstream because PRs to rpm around all this stuff is stalled
17:50:38 <Eighth_Doctor> fweimer: yes
17:50:48 <jforbes> Ideally, it should look at flags on the system. I say 'dnf install foo' and it will select the best match.  If I have avx2 support on my system and the package has an avx2 build, it will install it, otherwise it falls back to whatever the best mathc is.
17:50:56 <Eighth_Doctor> but yes, I already know how to enhance RPM and DNF to do this properly
17:51:23 <Eighth_Doctor> I did it once for Ryzen for OpenMandriva, and there's no reason I couldn't do it again for generic x86_64-vX levels
17:51:25 <jforbes> It was one of the better features of conary from a package management standpoint.
17:52:15 <Eighth_Doctor> indeed
17:52:29 <sgallagh> We're coming up on the top of the hour, so I'd like to try to record some #info and #actions here.
17:52:38 <fweimer> But then ISVs would have to support multiple builds, too. They really hate that.
17:52:52 <Eighth_Doctor> fweimer: no, they don't, they just have to declare what they support
17:52:55 <Eighth_Doctor> that's not different from today
17:53:03 <sgallagh> #chair fweimer jforbes Conan_Kudo dcavalca
17:53:03 <zodbot> Current chairs: Conan_Kudo dcavalca fweimer jforbes sgallagh
17:53:31 <sgallagh> If you could add your primary concerns as #info, that would be very helpful.
17:53:38 <jforbes> fweimer: under such a system, they can support mutliple builds, they certainly do not have to
17:53:48 <sgallagh> I will also take it on to discuss with the RHEL Program about Edge
17:54:10 <sgallagh> #action sgallagh to discuss the impact of x86_64-v3 on RHEL for Edge with RHEL Pgm
17:54:43 <jforbes> If I want to build for x86_64 I can.  If I want to build a version that specifically uses avx2 or avx512 I can do that as well. If I build all 3, it should autimatically select the best version for my system
17:55:20 <sgallagh> #info Important points raised: v3 was not universally available until 2017, which means existing hardware may need replacement to run
17:55:28 <fweimer> jforbes: The point is that if you want to build for v3, you need to deliver a lower baseline as well unless the OS has cleaned the pipes for you.
17:55:30 <dcavalca> #info non-AVX2 hardware is still in production, and widely deployed in long-lifecycle environments (emdedded/pos/appliances/etc.) that could be impacted by this change
17:55:31 <jforbes> If I am building software that gets no advantage from avx at all, I just do a single build of the baseline. The idea is flags are a "preference" not a hard arch
17:55:34 <Eighth_Doctor> #info Concerns about ISV and integrator cases for x86_64-v3 for RHEL. Would prefer to see us look into supporting multiple flavors for x86_64 in parallel like done in the past with ppc64 with p7 and x86_32 with i586 and i686
17:55:47 <sgallagh> #info Edge devices frequently use low-power Intel chips such as Atom and many (most?) don't have v3 support
17:56:30 <jforbes> And the flags should be extensible to the point that people really could add flags for a variety of reasons. It doesn't all have to be hardware support
17:56:47 <sgallagh> #info Increasing the baseline will lead to smaller code (both in less disk space and higher efficiency)
17:56:51 <fweimer> #info On the other hand, multiple builds in the OS do not clean the pipes for ISVs, which would have to deliver multiple builds as well to avoid support issues.
17:56:54 <Eighth_Doctor> jforbes: yeah, ideally we could have dynamic hardware provide/supplement/requires magic around ISA extensions
17:57:20 <sgallagh> #info The guaranteed availability of AVX2 extensions would benefit high-performance computing, particularly machine-learning.
17:57:26 <Eighth_Doctor> that would be massively useful for ARM and RISC-V, both of which have massively more fragmented ISA bases
17:58:32 <Eighth_Doctor> #info OpenMandriva has an implementation of x86_64 flavors we can use as a basis to understand implementing a similar system for supporting x86_64-vX levels in parallel.
17:59:54 <jforbes> If I had to guess, I would expect that very little real benefit would exist for many packages, so the number of things which needed to support multiple builds would be small. An ISV could even make a flag a hard requirement if they didn't want to support older builds
18:00:19 <Eighth_Doctor> yup
18:01:18 <sgallagh> OK, we are officially over time now. Any last #info?
18:01:22 <jforbes> While it works as a preference by default (often meaning users don't even have to know their system, we can detect and DTRT), a flag can also be a hard req.
18:01:38 <jforbes> nothing else from me.
18:01:45 <Eighth_Doctor> nothing else from me
18:02:00 <dcavalca> nope, this was a good discussion, thanks folks
18:02:43 <sgallagh> Indeed, thank you to everyone who participated in it.
18:03:06 <sgallagh> And especially to those who were working towards a possible compromise solution, which I think has merit.
18:03:44 <sgallagh> #endmeeting