2024-11-21 17:31:59 <@tflink:fedora.im> !startmeeting fedora-ai-ml 2024-11-21 17:31:59 <@meetbot:fedora.im> Meeting started at 2024-11-21 17:31:59 UTC 2024-11-21 17:32:00 <@meetbot:fedora.im> The Meeting name is 'fedora-ai-ml' 2024-11-21 17:32:02 <@mystro256:fedora.im> !hi 2024-11-21 17:32:03 <@zodbot:fedora.im> None (mystro256) 2024-11-21 17:32:12 <@mystro256:fedora.im> hello none 2024-11-21 17:32:13 <@trix:fedora.im> !hi 2024-11-21 17:32:13 <@zodbot:fedora.im> Tom Rix (trix) 2024-11-21 17:32:23 <@tflink:fedora.im> ok, that other meeting ended suddenly \o/ 2024-11-21 17:32:25 <@tflink:fedora.im> !hi 2024-11-21 17:32:26 <@zodbot:fedora.im> Tim Flink (tflink) 2024-11-21 17:33:12 <@tflink:fedora.im> !info today's agenda (living document): https://board.net/p/fedora-aiml-sig-meeting-agenda 2024-11-21 17:33:35 <@tflink:fedora.im> we have a lot of stuff to cover today so let's get started 2024-11-21 17:33:44 <@tflink:fedora.im> !topic F42 Planning 2024-11-21 17:33:51 <@man2dev:fedora.im> ! Hi 2024-11-21 17:34:02 <@man2dev:fedora.im> !Hi 2024-11-21 17:34:21 <@mystro256:fedora.im> I think it's caps sensitive 2024-11-21 17:34:21 <@tflink:fedora.im> Mohammadreza Hendiani: it's just lowercase 2024-11-21 17:34:37 <@man2dev:fedora.im> ! hi 2024-11-21 17:34:40 <@tflink:fedora.im> F42 is coming sooner than I'd like :) 2024-11-21 17:34:43 <@trix:fedora.im> it doesn't like being yelled at !?!? 2024-11-21 17:34:50 <@tflink:fedora.im> mass rebuild starts on 2025-01-15 2024-11-21 17:35:03 <@tflink:fedora.im> F42 branch is 2025-02-04 2024-11-21 17:35:06 <@trix:fedora.im> i'd like to have everything settled before then 2024-11-21 17:35:26 <@tflink:fedora.im> F42 beta freeze is 2025-02-18 2024-11-21 17:35:58 <@tflink:fedora.im> !info F42 mass rebuild starts on 2025-01-15 2024-11-21 17:36:07 <@tflink:fedora.im> !info F42 branch is 2025-02-04 2024-11-21 17:36:14 <@trix:fedora.im> big things for me will be 6.3 (assuming it's there) and llvm-rocm (working on now) 2024-11-21 17:36:52 <@tflink:fedora.im> do we want to leave llvm-rocm and the pending llvm18 problem for a separate topic? 2024-11-21 17:37:07 <@trix:fedora.im> if you want. 2024-11-21 17:37:16 <@tflink:fedora.im> !info ROCm 6.3 is planned for F42 2024-11-21 17:37:52 <@mystro256:fedora.im> might be able to get 6.4 as an update if it comes in time 2024-11-21 17:38:17 <@mystro256:fedora.im> no idea what the GA data is though 2024-11-21 17:38:19 <@trix:fedora.im> i would rather not do 6.4 if it is close. 2024-11-21 17:38:31 <@mystro256:fedora.im> yeah 6.3 for sure, 6.4 maybe 2024-11-21 17:38:47 <@tflink:fedora.im> Tom Rix: is there a pytorch update planned? I'm not sure when the next release for that is 2024-11-21 17:39:31 <@trix:fedora.im> oh man hard questions.. i am spending my time on the llvm problem, so pytorch has not gotten any luv. 2024-11-21 17:39:40 <@tflink:fedora.im> fair enough 2024-11-21 17:40:09 <@trix:fedora.im> all of rocm will fall over from llvm problem, including pytorch. 2024-11-21 17:40:58 <@tflink:fedora.im> !info F42 pytorch release is as of yet unknown 2024-11-21 17:42:13 <@tflink:fedora.im> so it looks like the planned feature set for F42 is: rocm 6.3 (maybe 6.4 but not planning on it) and maybe SDL3? 2024-11-21 17:42:41 <@tflink:fedora.im> are there other features that folks are planning on? 2024-11-21 17:43:32 <@man2dev:fedora.im> Oh I wanted to Proposed the SDL3 thing, but considering that it's not stable yet, I don't think it's a very good idea, but maybe packaging it as. As for the proposal, and I couldn't figure out how to submit a wiki, so I never did it. 2024-11-21 17:44:21 <@tflink:fedora.im> Mohammadreza Hendiani: let us know if you want help with submitting a feature for F42. it sounds like SDL3 may not be ready in time for F42, though 2024-11-21 17:44:45 <@man2dev:fedora.im> Yeah 2024-11-21 17:45:17 <@tflink:fedora.im> cool, let us know if that changes in time for F42 features and someone can help you with the wiki bits as needed 2024-11-21 17:45:30 <@trix:fedora.im> ollama is a possible. 2024-11-21 17:45:54 <@tflink:fedora.im> !info ollama is possible for F42 but not yet sure if that work will be finished in time 2024-11-21 17:46:09 <@trix:fedora.im> i'd like to have ollama+apu shiny in F42. 2024-11-21 17:46:29 <@tflink:fedora.im> are you planning to include APU support as part of the ROCm 6.3 feature? 2024-11-21 17:46:42 <@trix:fedora.im> yes, apu is already in 2024-11-21 17:46:54 <@tflink:fedora.im> cool 2024-11-21 17:47:09 <@trix:fedora.im> maybe we add whatever new one comes along in 6.3 1152? 2024-11-21 17:47:38 <@trix:fedora.im> we have 1035,1103 and 1151, 3 generations of laptops apus. 2024-11-21 17:48:25 <@trix:fedora.im> another feature we have that is in now is removing the split libs, everything is out of the usual space. /usr/lib64 2024-11-21 17:48:55 <@tflink:fedora.im> which makes building stuff much less crazy. I'm glad to see that 2024-11-21 17:48:59 <@man2dev:fedora.im> What?  2024-11-21 17:49:38 <@tflink:fedora.im> the new compression feature in llvm means that the split libs (gfx11, gfx10 etc.) are gone for the rocm packages 2024-11-21 17:50:08 <@trix:fedora.im> yes. that is the gem the amd compiler guys gave us. 2024-11-21 17:50:53 <@tflink:fedora.im> in terms of the writeups for known features, @trix is doing rocm but I think that's the only F42 feature for now 2024-11-21 17:51:26 <@trix:fedora.im> yup, i can do the writeup stuff i just yakked about. 2024-11-21 17:51:54 <@tflink:fedora.im> !info trix will be writing up the F42 change proposal for ROCm 2024-11-21 17:52:15 <@tflink:fedora.im> ok, anything F42 related other than the llvm fun? 2024-11-21 17:52:35 <@trix:fedora.im> testing, but that is related. 2024-11-21 17:52:55 <@tflink:fedora.im> I figured we'd finish up non-llvm F42 planning, talk about llvm and then get to testing 2024-11-21 17:53:04 <@trix:fedora.im> coolio 2024-11-21 17:53:42 <@tflink:fedora.im> !topic rocm and llvm18 2024-11-21 17:54:16 <@tflink:fedora.im> as I understand it, there are two related issues here 2024-11-21 17:54:42 <@tflink:fedora.im> 1. F41 was a bit of a disaster and rocm still isn't working w/o updates-testing in F41 due to the late llvm change 2024-11-21 17:54:51 <@trix:fedora.im> yes 2024-11-21 17:55:09 <@trix:fedora.im> s/bit/flaming bag/ 2024-11-21 17:55:14 <@tflink:fedora.im> 2. ROCm 6.3 will not support llvm19 and llvm18 will be orphaned by the llvm maintainers for F42 2024-11-21 17:55:36 <@trix:fedora.im> yes 2024-11-21 17:55:56 <@trix:fedora.im> llvm20 will come in around F42 beta 2 2024-11-21 17:55:58 <@mystro256:fedora.im> yeah 6.3 is likely going to be llvm 18 based on upstreams feedback 2024-11-21 17:56:12 <@mystro256:fedora.im> 6.4 might be 19, but it;s unknown 2024-11-21 17:56:22 <@tflink:fedora.im> so, at a minimum, someone will need to take on the llvm18 compat packages once they're orphaned by the llvm folks 2024-11-21 17:56:48 <@trix:fedora.im> that is an option. 2024-11-21 17:56:55 <@tflink:fedora.im> unless we can somehow convince the llvm folks not to orphan them but I wouldn't get your hopes up there 2024-11-21 17:57:12 <@tflink:fedora.im> the other option is to start bundling llvm and stop depending on system llvm 2024-11-21 17:57:37 <@trix:fedora.im> the bundled llvm is what i have been working on. 2024-11-21 17:58:00 <@trix:fedora.im> as a fallback to the first problem. 2024-11-21 17:58:09 <@tflink:fedora.im> I think we'd have to get a waiver from FESCo to bundle llvm like that but I think we have a pretty good case for it 2024-11-21 17:58:13 <@trix:fedora.im> now as a primary to the second problem. 2024-11-21 17:58:38 <@tflink:fedora.im> llvm changes have blown up ROCm for two releases in a row and I don't see that being fixed until FESCo stops accepting llvm version changes so late 2024-11-21 17:59:28 <@trix:fedora.im> on the orphan question, is there someone that want to pick up llvm18 ? 2024-11-21 17:59:44 <@trix:fedora.im> i will not. 2024-11-21 18:00:54 <@mystro256:fedora.im> well the problem is what is the advantage of using llvm18 over a fork 2024-11-21 18:01:09 <@mystro256:fedora.im> if llvm18 is abandoned, we might as well build the fork 2024-11-21 18:01:16 <@tflink:fedora.im> technical adherence to policy 2024-11-21 18:01:29 <@mystro256:fedora.im> did we ask fesco? 2024-11-21 18:01:34 <@tflink:fedora.im> not yet, no 2024-11-21 18:01:42 <@mystro256:fedora.im> someone should 2024-11-21 18:01:56 <@mystro256:fedora.im> can someone volunteer opening a ticket? 2024-11-21 18:01:59 <@tflink:fedora.im> I can do that unless someone else wants to 2024-11-21 18:02:24 <@mystro256:fedora.im> I just keep forgetting, so if you can, please 2024-11-21 18:02:46 <@tflink:fedora.im> !action tflink to submit ticket to FESCo about bundling llvm for ROCm 2024-11-21 18:03:27 <@tflink:fedora.im> so I think that we're mostly in a holding pattern on this until the FESCo question is answered 2024-11-21 18:03:34 <@tflink:fedora.im> is there anything else on this topic for today? 2024-11-21 18:03:46 <@man2dev:fedora.im> Yes 2024-11-21 18:03:58 <@tflink:fedora.im> I don't understand 2024-11-21 18:04:16 <@tflink:fedora.im> are you saying that there is more on this topic or agreeing with the fact that we need to submit a ticket to fesco 2024-11-21 18:04:17 <@man2dev:fedora.im> Testing infra: testfarm research result: 2024-11-21 18:04:25 <@tflink:fedora.im> that's not llvm related 2024-11-21 18:04:51 <@tflink:fedora.im> but it is the next topic if there's nothing more on llvm for today 2024-11-21 18:05:01 <@man2dev:fedora.im> Oh I though you were talking overal 2024-11-21 18:05:48 <@tflink:fedora.im> !topic HW Testing 2024-11-21 18:06:02 <@tflink:fedora.im> this might get a bit messy, it sounds like there are 3 of us working on this independently 2024-11-21 18:06:07 <@tflink:fedora.im> who wants to go first? 2024-11-21 18:06:17 <@man2dev:fedora.im> - [Testing Farm GitLab](https://gitlab.com/testing-farm)   2024-11-21 18:06:17 <@man2dev:fedora.im> Testing Farm, primarily supported by AWS infrastructure, provides a robust platform for managing and executing tests with customizable hardware. It is widely used by upstream projects like Systemd and Cockpit to ensure seamless integration and reliable testing workflows. Relevant resources include:   2024-11-21 18:06:17 <@man2dev:fedora.im>   2024-11-21 18:06:17 <@man2dev:fedora.im>  Proposal to Enhance Testing Efficiency Using Testing Farm and Associated Tools   2024-11-21 18:06:17 <@man2dev:fedora.im> - [Testing Farm YouTube Guide](https://www.youtube.com/watch?v=F7C82Fwdvis) 2024-11-21 18:06:17 <@man2dev:fedora.im> - [Testing Farm](https://testing-farm.io)   2024-11-21 18:06:25 <@man2dev:fedora.im>      testing-farm reserve --compose Fedora-Rawhide 2024-11-21 18:06:25 <@man2dev:fedora.im>      testing-farm reserve --compose Fedora-Rawhide --hardware virtualization.is-virtualized=false 2024-11-21 18:06:25 <@man2dev:fedora.im>      2024-11-21 18:06:25 <@man2dev:fedora.im>    - Utilize Testing Farm reservations ([docs](https://gitlab.com/testing-farm)) for experiments, e.g.:   2024-11-21 18:06:27 <@man2dev:fedora.im>    - Automate upstream CI testing through Packit ([docs](https://packit.dev/docs/configuration/upstream/tests)), with results available on the  2024-11-21 18:06:27 <@man2dev:fedora.im>   - Has wide variety of interfaces from API, test-farm cli tool, tmt cli tool to integrate testing workflows for preferably the upstream projects or downstream in Fedora SRC repo 2024-11-21 18:06:27 <@man2dev:fedora.im> ### Integrate: 2024-11-21 18:06:27 <@man2dev:fedora.im>      + TMT ([docs](https://tmt.readthedocs.io/en/stable/)) to manage tests with FMF metadata.   2024-11-21 18:06:27 <@man2dev:fedora.im>      + ([Fedora CI Mtrix] (#fedora-ci:fedoraproject.org)) 2024-11-21 18:06:27 <@man2dev:fedora.im>      + ([Fedora CI docs](https://docs.fedoraproject.org/en-US/ci)) 2024-11-21 18:06:27 <@man2dev:fedora.im>    [Packit dashboard](https://dashboard.packit.dev/jobs/testing-farm). 2024-11-21 18:06:32 <@man2dev:fedora.im>   2024-11-21 18:06:32 <@man2dev:fedora.im>    - [Testing Farm status page](https://status.testing-farm.io). 2024-11-21 18:06:32 <@man2dev:fedora.im>   2024-11-21 18:06:32 <@man2dev:fedora.im>   ### currently used by: 2024-11-21 18:06:32 <@man2dev:fedora.im>    - Collaborate with Testing Farm to request additional resources and ensure test working for packages like rcom. 2024-11-21 18:06:32 <@man2dev:fedora.im>    systemd and pcockpit: 2024-11-21 18:06:32 <@man2dev:fedora.im>    - cockpit: Performs automated testing with a FMF files, as used by Cockpit ([example FMF file](https://github.com/cockpit-project/starter-kit/blob/main/test/browser/main.fmf)). 2024-11-21 18:06:34 <@man2dev:fedora.im> ### Use case 2024-11-21 18:06:34 <@man2dev:fedora.im>   2024-11-21 18:06:34 <@man2dev:fedora.im> - Expand and standardize  workflows across upstream project like AMD's fork of llvm or, rcom ... 2024-11-21 18:06:51 <@tflink:fedora.im> that's a lot of text to dump in a meeting 2024-11-21 18:07:19 <@man2dev:fedora.im> Main resource for anyone wanting to get good grasp on topic is youtube video https://www.youtube.com/watch?v=F7C82Fwdvis 2024-11-21 18:07:55 <@tflink:fedora.im> unless something significant has changed in the last year, testing farm is not an option for HW specific testing unless we're talking about nvidia in AWS 2024-11-21 18:08:31 <@tflink:fedora.im> I'm trying to wrap my head around what all you're proposing, though 2024-11-21 18:08:50 <@man2dev:fedora.im> Its no itegrated into the fedora ci in some places 2024-11-21 18:09:56 <@man2dev:fedora.im> Sorry I tried to some up all the main point and links in to one text 2024-11-21 18:10:33 <@man2dev:fedora.im> Its now itegrated into the fedora ci in some places 2024-11-21 18:10:33 <@man2dev:fedora.im> > <@tflink:fedora.im> unless something significant has changed in the last year, testing farm is not an option for HW specific testing unless we're talking about nvidia in AWS 2024-11-21 18:10:33 <@man2dev:fedora.im> 2024-11-21 18:10:40 <@tflink:fedora.im> outside of the "this is what testing farm is" part, it sounds like a general proposal to use testing farm for the testing that we want to do for ai-ml in Fedora? 2024-11-21 18:10:49 <@tflink:fedora.im> or am I missing something? 2024-11-21 18:11:39 <@man2dev:fedora.im> Yes we can actually use additional resources which are are necessary for our use cases. 2024-11-21 18:12:09 <@man2dev:fedora.im> From the ci sig https://matrix.to/#/!cfWVeczGVJbiKSlrwi:fedoraproject.org 2024-11-21 18:12:23 <@tflink:fedora.im> have those features been added recently? last I checked, there was little to no support for HW specific testing in testing farm 2024-11-21 18:12:37 <@tflink:fedora.im> hopes and dreams, yes. production code and systems, not as much 2024-11-21 18:12:42 <@man2dev:fedora.im> And I also research different ways of how it can be integrated into our workflow. 2024-11-21 18:13:29 <@tflink:fedora.im> hopes and dreams, yes. production code and systems, not as much I didn't mean that to sound as disrepectful as it sounds. I know how hard it is to get systems like that working and was just trying to express that it's hard to use those things until the supporting bits are in production 2024-11-21 18:13:52 <@trix:fedora.im> my workflow is manual, i'd like to stop that. 2024-11-21 18:14:19 <@trix:fedora.im> fedora as a project i don't believe has hw testing 2024-11-21 18:14:20 <@man2dev:fedora.im> My main source of information was the youtube video and they seem to indicate that they do provide hardware-specific builds. But maybe that's false 2024-11-21 18:14:28 <@tflink:fedora.im> I'm not against using TF but I still have concerns that it's not anywhere close to supporting our usecases 2024-11-21 18:15:04 <@man2dev:fedora.im> It has api and CLI so it does support manual workflow 2024-11-21 18:15:57 <@trix:fedora.im> to test, i have to manually build and run a bunch of -test subpackages, then manuall test applications like blender and torch. 2024-11-21 18:16:23 <@trix:fedora.im> so the 'testing' for each release is me do that for a week or two. 2024-11-21 18:16:54 <@man2dev:fedora.im> I haven't tested it I'm just brining it up because They indicate that they can accommodate any computational need that Fedora Project might need as long as the need is valid and is within reason. That's the main reason I thought it might be usefull 2024-11-21 18:17:51 <@trix:fedora.im> llvm monkey wrenched that testing in F40 and F41 2024-11-21 18:18:15 <@tflink:fedora.im> I feel like we just set off a chaos grenade and we're starting to talk past eachother 2024-11-21 18:18:23 <@trix:fedora.im> yes. 2024-11-21 18:18:37 <@man2dev:fedora.im> There are variousways of triggering the build for example on each PR in the upstream project 2024-11-21 18:18:46 <@trix:fedora.im> lets pass the stick.. who wants to talk ? 2024-11-21 18:18:54 <@man2dev:fedora.im> If they add packit 2024-11-21 18:18:59 <@trix:fedora.im> to get to something we can use in F42 2024-11-21 18:19:21 <@tflink:fedora.im> we're talking about solutions before we're talking about what is needed 2024-11-21 18:20:04 <@tflink:fedora.im> well, half of the conversation is around solutions 2024-11-21 18:20:14 <@trix:fedora.im> we have a testing gap. we build for 20 gpus, only 7900 gets any testing and that is manual. 2024-11-21 18:22:18 <@tflink:fedora.im> sorry, struggling with summarizing everything for notes 2024-11-21 18:22:49 <@tflink:fedora.im> !info there is a proposal to start using testing farm for Fedora ai-ml testing 2024-11-21 18:24:46 <@tflink:fedora.im> we have less than 10 minutes left, I propose the following: we discuss the needs we have for the next several minutes and leave the discussion of solutions to a later meeting or another venue (matrix or discourse) 2024-11-21 18:25:06 <@tflink:fedora.im> any objections? 2024-11-21 18:25:11 <@trix:fedora.im> nope 2024-11-21 18:25:27 <@man2dev:fedora.im> No 2024-11-21 18:25:41 <@tflink:fedora.im> !info due to lack of time in this meeting, we will discuss solutions in another venue and leave the discussion to what is needed in this meeting 2024-11-21 18:25:54 <@tflink:fedora.im> !topic ai-ml HW testing needs 2024-11-21 18:26:15 <@tflink:fedora.im> !info there is a huge gap between what we're currently building and what sees regular testing 2024-11-21 18:26:48 <@tflink:fedora.im> !info the level of manual testing we currently have is not sustainable and should be automated if at all possible 2024-11-21 18:27:36 <@tflink:fedora.im> as a more fleshed out point: Tom Rix is one of the only people testing bits right now and mostly on one subset of ROCm (gfx1100) 2024-11-21 18:27:55 <@tflink:fedora.im> as I understand it, the wishlist for automated testing is: 2024-11-21 18:28:15 <@tflink:fedora.im> 1. run the rocm self tests on packaging changes (including dependencies) 2024-11-21 18:28:31 <@tflink:fedora.im> 2. regularly rebuild rocm and the bits that depend on it to find build errors early 2024-11-21 18:28:50 <@trix:fedora.im> 2 is already done. 2024-11-21 18:29:00 <@tflink:fedora.im> ah, I should rephrase that 2024-11-21 18:29:27 <@tflink:fedora.im> 2. regularly rebuild rocm and the bits that depend on it to find build errors early in an automated what that doesn't require Tom Rix to do it by hand 2024-11-21 18:29:56 <@trix:fedora.im> https://copr.fedorainfracloud.org/coprs/g/rocm-packagers-sig/RH/ 2024-11-21 18:29:58 <@tflink:fedora.im> 3. expand our testing matrix to cover at least the most commonly used HW 2024-11-21 18:30:22 <@trix:fedora.im> i set up to a copr to include rocm and its whatrequires 2024-11-21 18:30:30 <@trix:fedora.im> hsakmt 2024-11-21 18:31:04 <@tflink:fedora.im> but that still requires you to do the packaging and submission by hand, no? I thought you were looking for an automated setup so that's happening more often. did I misunderstand? 2024-11-21 18:31:44 <@trix:fedora.im> copr auto builds when someone makes a comit, what else is there to do ? 2024-11-21 18:32:09 <@tflink:fedora.im> dunno, it sounds like i misunderstood what you were looking for, though :) 2024-11-21 18:32:43 <@trix:fedora.im> its ok if we want 2 . 2024-11-21 18:32:55 <@trix:fedora.im> building stuff is the easy part. 2024-11-21 18:33:02 <@tflink:fedora.im> very true 2024-11-21 18:33:21 <@trix:fedora.im> what hw do we want to test ? 2024-11-21 18:33:49 <@tflink:fedora.im> my thought was to start with gfx1100 since that's the easiest and expand from there 2024-11-21 18:34:21 <@tflink:fedora.im> it'd depend heavily on what HW we can get our hands on 2024-11-21 18:35:05 <@trix:fedora.im> ok, automate gfx1100 first. 2024-11-21 18:35:09 <@tflink:fedora.im> since we're over time and it seems like there is still confusion here, I propose that we move the conversation around what we want to do for testing to discourse 2024-11-21 18:35:47 <@trix:fedora.im> yes. 2024-11-21 18:36:01 <@tflink:fedora.im> for what it's worth, I also have a proposed solution to all of this that I've been working on but that will wait for another day 2024-11-21 18:36:14 <@tflink:fedora.im> !action tflink to start conversation on discourse about testing desires and requirements 2024-11-21 18:36:17 <@trix:fedora.im> no worries, problem will still be here. 2024-11-21 18:37:08 <@tflink:fedora.im> ok, moving on to open floor if there's nothing else on this topic 2024-11-21 18:37:14 <@tflink:fedora.im> !topic open floor 2024-11-21 18:37:25 <@tflink:fedora.im> is there any topic we didn't get to that needs to be discussed today? 2024-11-21 18:38:14 <@trix:fedora.im> i'm good. 2024-11-21 18:38:24 <@tflink:fedora.im> ok, I'll end the meeting for now. we can always schedule something for next week if there is a need 2024-11-21 18:38:32 <@tflink:fedora.im> thanks for coming, everyone. I'll post minutes shortly 2024-11-21 18:38:35 <@tflink:fedora.im> !endmeeting