2025-07-17 16:32:46 <@tflink:fedora.im> !startmeeting fedora-ai-ml-sig 2025-07-17 16:32:47 <@meetbot:fedora.im> Meeting started at 2025-07-17 16:32:46 UTC 2025-07-17 16:32:47 <@meetbot:fedora.im> The Meeting name is 'fedora-ai-ml-sig' 2025-07-17 16:32:52 <@tflink:fedora.im> !topic roll call 2025-07-17 16:33:01 <@tflink:fedora.im> Who all's here for the AI/ML SIG meeting? 2025-07-17 16:33:02 <@tflink:fedora.im> !hi 2025-07-17 16:33:03 <@zodbot:fedora.im> Tim Flink (tflink) 2025-07-17 16:33:13 <@xanderlent:fedora.im> !hi 2025-07-17 16:33:14 <@zodbot:fedora.im> Alexander Lent (xanderlent) 2025-07-17 16:35:17 <@tflink:fedora.im> just the two of us? I know I was late but I'm still a bit surprised 2025-07-17 16:36:11 <@trix:fedora.im> !hi 2025-07-17 16:36:12 <@zodbot:fedora.im> Tom Rix (trix) 2025-07-17 16:36:31 <@trix:fedora.im> sorry i'm late, i have been doing stuff beyond my control 2025-07-17 16:36:43 <@tflink:fedora.im> no worries, it happens 2025-07-17 16:36:46 <@tflink:fedora.im> let's get this party started :) 2025-07-17 16:37:02 <@trix:fedora.im> so in the beyond my control, python 3.14 and pytorch 2025-07-17 16:37:11 <@tflink:fedora.im> !topic pytorch et. al broken by python 3.14 2025-07-17 16:37:32 <@tflink:fedora.im> how much more is broken than just pytorch? 2025-07-17 16:37:36 <@trix:fedora.im> will i have to do something heroic to pytorch ? 2025-07-17 16:38:10 <@trix:fedora.im> pytorch is the start of a lot of other torch things. it's just me doing all of it. 2025-07-17 16:38:26 <@tflink:fedora.im> hopefully it won't come to needing something heroic 2025-07-17 16:39:07 <@tflink:fedora.im> but I'm not sure I have any good solutions. I can think of a few ways to fix it long term but that will require a lot of work - maybe more than fixing pytorch 2025-07-17 16:39:15 <@trix:fedora.im> time is getting sort and hero-ing needs to start 2025-07-17 16:39:55 <@man2dev:fedora.im> !hi 2025-07-17 16:39:56 <@zodbot:fedora.im> Mohammadreza Hendiani (man2dev) 2025-07-17 16:39:57 <@tflink:fedora.im> F43 branch is in a little less than a month, right? 2025-07-17 16:39:59 <@trix:fedora.im> as branch is happening soon 2025-07-17 16:40:03 <@trix:fedora.im> yes. 2025-07-17 16:40:03 <@tflink:fedora.im> August 14, I think 2025-07-17 16:40:14 <@xanderlent:fedora.im> I'd be happy to help with some of the torch stuff since torch is a common dependency for huggingface libraries. 2025-07-17 16:40:26 <@tflink:fedora.im> August 12 is the F43 branch date - I was close 2025-07-17 16:40:28 <@xanderlent:fedora.im> Anything that depends on PyTorch is currently FTI in rawhide. For the one huggingface package that we already have in rawhide, I temporarily dropped the torch integration to solve the FTI. 2025-07-17 16:40:43 <@tflink:fedora.im> !info pytorch is broken with python 3.14 and the fix will not be trivial 2025-07-17 16:40:59 <@trix:fedora.im> its a bad problem. upstrem has not moved to 3.14 so at best its a lot of hack the stuff that depends on 3.14 out. 2025-07-17 16:41:07 <@tflink:fedora.im> !info F43 branch will be on 2025-08-12 and there isn't much time left to fix pytorch for F43 2025-07-17 16:41:19 <@trix:fedora.im> and tie off the problems that causes. 2025-07-17 16:41:52 <@trix:fedora.im> pytorch is a very very big project now, this is getting harder and harder to do. 2025-07-17 16:42:43 <@trix:fedora.im> i really don't want to do this every python release. 2025-07-17 16:43:19 <@tflink:fedora.im> yeah, I was wondering how often this was going to end up being a problem given Fedora's track record of early python adoption 2025-07-17 16:43:23 <@trix:fedora.im> I can see that F42 works just fine with the latest pytorch. 2025-07-17 16:43:59 <@trix:fedora.im> would it be ok to NOT use rawhide as the release 2025-07-17 16:44:03 <@tflink:fedora.im> do you have any feel for the odds that a working pytorch will ship in F43? 2025-07-17 16:44:08 <@man2dev:fedora.im> Does upstream pytorch even support any sort of linux packaging like deb?  2025-07-17 16:44:18 <@tflink:fedora.im> upstream is wheel only, AFAIK 2025-07-17 16:45:02 <@trix:fedora.im> wheel is a contrived release model, pytorch does not build well on any bare distro. maybe ubuntu is better, i don't really know. 2025-07-17 16:45:30 <@xanderlent:fedora.im> My understanding is that if we want to only target stable versions of Fedora with pytorch, we'd have to keep it in a COPR or other external repo. 😞 2025-07-17 16:45:35 <@tflink:fedora.im> I wonder how hard it would be to build pytorch as a flatpak and whether that would end up being usable 2025-07-17 16:45:53 <@trix:fedora.im> From the tag on the 3.14 issue, the upstream will have a fix in 2.9 2025-07-17 16:46:04 <@trix:fedora.im> 2.8 is not out yet. 2025-07-17 16:46:24 <@tflink:fedora.im> I was going to guess sept/oct for 2.9 but it sounds like it might be even later 2025-07-17 16:46:32 <@trix:fedora.im> I have been updating the -next part to try to take that. 2025-07-17 16:46:50 <@trix:fedora.im> but having to build pytorch-next on F42 2025-07-17 16:47:39 <@trix:fedora.im> So pytorch could, without heroics and risk of breakage, ship sometime in the F43 life 2025-07-17 16:47:55 <@tflink:fedora.im> I'm not sure it's worth heroics, TBH 2025-07-17 16:48:27 <@tflink:fedora.im> if heroics work, that just brings demands for more heroics next time this happens 2025-07-17 16:48:43 <@trix:fedora.im> there are other heroics in my job going on, so i would rather just not. 2025-07-17 16:49:24 <@tflink:fedora.im> has anyone tried talking to the python folks to see if they have a possible solution? 2025-07-17 16:49:33 <@tflink:fedora.im> pytorch can't be the first package that's run into this problem 2025-07-17 16:49:42 <@xanderlent:fedora.im> I completely agree it's not worth fighting with upstream being on a different release schedule; is there any project wide guidance we have for situations like this? 2025-07-17 16:49:58 <@tflink:fedora.im> other than compat packages? not that I'm aware of 2025-07-17 16:50:11 <@tflink:fedora.im> and I don't consider compat packages to be an acceptable solution 2025-07-17 16:50:33 <@tflink:fedora.im> that's a bunch of fiddly work 2025-07-17 16:51:05 <@tflink:fedora.im> I can talk to the python folk to see if they have any suggestions unless someone else wants to do it 2025-07-17 16:52:31 <@trix:fedora.im> i also don't have the energy or time to fight another entrenched position of updating the tools before the actual release is a feature of fedora. 2025-07-17 16:53:06 <@tflink:fedora.im> I take that as a "no, you should do it" :-D 2025-07-17 16:53:08 <@trix:fedora.im> we already had to take a clang fork on because of clang update. 2025-07-17 16:53:25 <@trix:fedora.im> thankfully i get paid to do that now. 2025-07-17 16:53:29 <@tflink:fedora.im> !action tflink to ask Fedora python SIG about possible solutions to the pytorch+py3.14 problem 2025-07-17 16:53:45 <@trix:fedora.im> i am not getting paid to do pytorch. 2025-07-17 16:53:57 <@tflink:fedora.im> it sounds like we're pretty much stuck on this for now, though. anything else or should we move on? 2025-07-17 16:54:16 <@trix:fedora.im> move on, thanks for listening to me 2025-07-17 16:54:24 <@tflink:fedora.im> np, thanks for working on all this 2025-07-17 16:54:32 <@tflink:fedora.im> !topic Progress on Intel NPU 2025-07-17 16:54:48 <@tflink:fedora.im> Alexander Lent: the floor is yours 2025-07-17 16:55:00 <@xanderlent:fedora.im> Sure, thanks 2025-07-17 16:55:52 <@xanderlent:fedora.im> Building the compiler in driver component was what I was trying to do separately and failing 2025-07-17 16:55:52 <@xanderlent:fedora.im> So, I took some time to re-implement upstream's preferred compiler build process in a RPM spec file. 2025-07-17 16:55:52 <@xanderlent:fedora.im> 2025-07-17 16:56:12 <@xanderlent:fedora.im> It turns out even with implementing the upstream preferred process in a spec file, I get the same failures. 2025-07-17 16:56:20 <@tflink:fedora.im> oof 2025-07-17 16:56:26 <@xanderlent:fedora.im> So I think it's something about what's being injected into the RPM build environment 2025-07-17 16:56:37 <@xanderlent:fedora.im> Which is substantial progress, despite the fact that it's still failing. At least it fails in exactly the same way. 2025-07-17 16:56:53 <@tflink:fedora.im> you can override most rpm-injected things, AFAIK 2025-07-17 16:57:28 <@tflink:fedora.im> but you'd either have to know which flag is causing the problem or be willing to do trial-and-error 2025-07-17 16:57:33 <@xanderlent:fedora.im> Indeed, Tom Rix had suggested that in the main chat. Haven't yet gotten to implementing it due to time constraints. 2025-07-17 16:57:59 <@tflink:fedora.im> no worries, from what you've said, the NPU stuff is far from a trivial task 2025-07-17 16:58:16 <@trix:fedora.im> clang forks are hard, its a commitment to get started and keep going. 2025-07-17 16:58:34 <@tflink:fedora.im> !info work on the intel NPU driver continues but is still a work in progress 2025-07-17 16:58:50 <@trix:fedora.im> is there any intel person that maybe could do this ? 2025-07-17 16:59:55 <@xanderlent:fedora.im> It seems like they're leaving it up to distributions to package it outside of their official Debian packages. I'm not sure if they ended up assisting Canonical with the snap packages or if that was entirely on the Canonical side. 2025-07-17 17:00:33 <@tflink:fedora.im> fun 2025-07-17 17:00:35 <@xanderlent:fedora.im> Upstream has been helpful when I have filed issues, but they would prefer we build RPMs with CPack, which as far as I know is not compatible with the Fedora model. 2025-07-17 17:01:09 <@xanderlent:fedora.im> (also they have a whole thing where cmake downloads a bunch of dependencies and again that's not allowed in Fedora builds for good reason) 2025-07-17 17:01:11 <@tflink:fedora.im> yeah, anything shipped with Fedora has to be built in koji with mock 2025-07-17 17:01:22 <@tflink:fedora.im> on top of the network isolation 2025-07-17 17:01:42 <@trix:fedora.im> do you know if the amd npu has similar problems and/or interest in doing its packaging ? 2025-07-17 17:02:22 <@xanderlent:fedora.im> My understanding is that the official way to build RPMs from the AMD NPU source is by running the scripts there which use CPack in part. 2025-07-17 17:02:22 <@xanderlent:fedora.im> I did have some luck manually translating those scripts into Fedora RPMs though. 2025-07-17 17:02:22 <@xanderlent:fedora.im> 2025-07-17 17:02:59 <@xanderlent:fedora.im> To be fair to AMD/Xilinx upstream, their build process worked without changes on Fedora and produced RPMs that just worked. 2025-07-17 17:03:34 <@trix:fedora.im> If you have real interest, I could try to mediate and put you in contact for the xilinx team. 2025-07-17 17:04:54 <@xanderlent:fedora.im> 2025-07-17 17:04:54 <@xanderlent:fedora.im> I already sent a patch which was accepted that disabled the kernel module build. 😀 2025-07-17 17:04:54 <@xanderlent:fedora.im> I don't think I'll have the bandwidth for that in time for the 43 cycle, but I would definitely be interested in working with them to simplify the build process (maybe make it all into CMake instead of shell scripts). 2025-07-17 17:05:11 <@xanderlent:fedora.im> So maybe something we can pursue in the medium term 2025-07-17 17:05:33 <@trix:fedora.im> ok, just reach out when you have time. 2025-07-17 17:05:43 <@xanderlent:fedora.im> Will do, thanks! 2025-07-17 17:05:52 <@xanderlent:fedora.im> So unless there are any other concerns about NPU, I think we can move on. 2025-07-17 17:05:55 <@tflink:fedora.im> cool, anything else on this? 2025-07-17 17:06:30 <@tflink:fedora.im> looks like no. moving on to the next topic 2025-07-17 17:06:40 <@tflink:fedora.im> !topic Stalled Huggingface Libraries 2025-07-17 17:06:54 <@xanderlent:fedora.im> Sure, this is also mine 2025-07-17 17:08:33 <@xanderlent:fedora.im> The stall is two things: 2025-07-17 17:08:33 <@xanderlent:fedora.im> Second, I've been having some issues cleaning up the various libraries for review. One of the big issues is that the tests for one package often depend on another package and vice versa. 2025-07-17 17:08:33 <@xanderlent:fedora.im> 2025-07-17 17:08:33 <@xanderlent:fedora.im> First, a lot of the huggingface libraries have a hard or soft dependency on pytorch. I'm happy to help put whatever cycles I have towards that. 2025-07-17 17:08:33 <@xanderlent:fedora.im> 2025-07-17 17:09:03 <@trix:fedora.im> yes, that sucks, i looked at it a bit ago. 2025-07-17 17:09:06 <@tflink:fedora.im> for bootstrapping, I think you can either leave the tests out for now or just comment them out until things are reviewed 2025-07-17 17:09:20 <@xanderlent:fedora.im> Thanks for the suggestion 2025-07-17 17:09:55 <@tflink:fedora.im> it's a reasonably common problem when bootstrapping 2025-07-17 17:10:39 <@xanderlent:fedora.im> 2025-07-17 17:10:39 <@xanderlent:fedora.im> So overall, I've been thinking about maybe putting the huggingface stuff on the back burner and readdressing it during the 44 cycle. 2025-07-17 17:10:39 <@xanderlent:fedora.im> That would let me focus more on the actual NPU driver and also helping with the torch situation. 2025-07-17 17:10:58 <@xanderlent:fedora.im> That said, if anyone wants to help clean up packages, I've got a couple prototypes in a GitHub repository & COPR that I can link. 2025-07-17 17:11:41 <@tflink:fedora.im> !info progress has been made on packaging the huggingface libraries but they can't really proceed while pytorch is broken 2025-07-17 17:11:56 <@trix:fedora.im> yes. 2025-07-17 17:12:42 <@xanderlent:fedora.im> I think that's pretty much all for me on the huggingface topic for now. 2025-07-17 17:13:13 <@tflink:fedora.im> sorry that you keep hitting all these roadblocks 2025-07-17 17:13:22 <@tflink:fedora.im> thank you for working on all of this, though 2025-07-17 17:13:32 <@trix:fedora.im> as AI mostly runs on python, if fedora want better ai, then it needs better python. 2025-07-17 17:13:57 <@tflink:fedora.im> that's not really fair to the python folks 2025-07-17 17:14:10 <@tflink:fedora.im> for better or worse, the projects definition of "better" just doesn't match our use cases 2025-07-17 17:14:49 <@xanderlent:fedora.im> I think this is definitely a difficult situation with the upstream schedule mismatch. 2025-07-17 17:14:49 <@xanderlent:fedora.im> 2025-07-17 17:14:49 <@xanderlent:fedora.im> I know some of the big distros (is/Ubuntu) were able to get gnome and KDE on a cycle that worked for twice a year releases... 2025-07-17 17:14:58 <@xanderlent:fedora.im> 2025-07-17 17:14:58 <@xanderlent:fedora.im> I think this is definitely a difficult situation with the upstream schedule mismatch. 2025-07-17 17:14:58 <@xanderlent:fedora.im> I know some of the big distros (us/Ubuntu) were able to get gnome and KDE on a cycle that worked for twice a year releases... 2025-07-17 17:15:02 <@trix:fedora.im> it's a trade off, and ai is on the losing side. it will always be on the losing side. 2025-07-17 17:16:04 <@tflink:fedora.im> it's the downside of being on the leading edge when the AI stuff is very much not keeping up on that front 2025-07-17 17:16:39 <@tflink:fedora.im> anyhow, moving on to open floor unless there is more on this topic 2025-07-17 17:17:09 <@tflink:fedora.im> !topic open floor 2025-07-17 17:17:22 <@tflink:fedora.im> any other topics that folks want to bring up? 2025-07-17 17:19:35 <@tflink:fedora.im> !info The next AI/ML SIG meeting will be on Thursday, July 31 @ 16:30 UTC 2025-07-17 17:20:18 <@tflink:fedora.im> I know I said 5 minutes but since there aren't many folks here, I don't think there are more topics coming 2025-07-17 17:20:34 <@trix:fedora.im> its ok. 2025-07-17 17:20:39 <@tflink:fedora.im> if there are, we can continue the conversation in #ai-ml:fedoraproject.org 2025-07-17 17:20:43 <@tflink:fedora.im> thanks for coming, everyone 2025-07-17 17:20:53 <@tflink:fedora.im> !endmeeting