<@tflink:fedora.im>
16:32:46
!startmeeting fedora-ai-ml-sig
<@meetbot:fedora.im>
16:32:47
Meeting started at 2025-07-17 16:32:46 UTC
<@meetbot:fedora.im>
16:32:47
The Meeting name is 'fedora-ai-ml-sig'
<@tflink:fedora.im>
16:32:52
!topic roll call
<@tflink:fedora.im>
16:33:01
Who all's here for the AI/ML SIG meeting?
<@tflink:fedora.im>
16:33:02
!hi
<@zodbot:fedora.im>
16:33:03
Tim Flink (tflink)
<@xanderlent:fedora.im>
16:33:13
!hi
<@zodbot:fedora.im>
16:33:14
Alexander Lent (xanderlent)
<@tflink:fedora.im>
16:35:17
just the two of us? I know I was late but I'm still a bit surprised
<@trix:fedora.im>
16:36:11
!hi
<@zodbot:fedora.im>
16:36:12
Tom Rix (trix)
<@trix:fedora.im>
16:36:31
sorry i'm late, i have been doing stuff beyond my control
<@tflink:fedora.im>
16:36:43
no worries, it happens
<@tflink:fedora.im>
16:36:46
let's get this party started :)
<@trix:fedora.im>
16:37:02
so in the beyond my control, python 3.14 and pytorch
<@tflink:fedora.im>
16:37:11
!topic pytorch et. al broken by python 3.14
<@tflink:fedora.im>
16:37:32
how much more is broken than just pytorch?
<@trix:fedora.im>
16:37:36
will i have to do something heroic to pytorch ?
<@trix:fedora.im>
16:38:10
pytorch is the start of a lot of other torch things. it's just me doing all of it.
<@tflink:fedora.im>
16:38:26
hopefully it won't come to needing something heroic
<@tflink:fedora.im>
16:39:07
but I'm not sure I have any good solutions. I can think of a few ways to fix it long term but that will require a lot of work - maybe more than fixing pytorch
<@trix:fedora.im>
16:39:15
time is getting sort and hero-ing needs to start
<@man2dev:fedora.im>
16:39:55
!hi
<@zodbot:fedora.im>
16:39:56
Mohammadreza Hendiani (man2dev)
<@tflink:fedora.im>
16:39:57
F43 branch is in a little less than a month, right?
<@trix:fedora.im>
16:39:59
as branch is happening soon
<@trix:fedora.im>
16:40:03
yes.
<@tflink:fedora.im>
16:40:03
August 14, I think
<@xanderlent:fedora.im>
16:40:14
I'd be happy to help with some of the torch stuff since torch is a common dependency for huggingface libraries.
<@tflink:fedora.im>
16:40:26
August 12 is the F43 branch date - I was close
<@xanderlent:fedora.im>
16:40:28
Anything that depends on PyTorch is currently FTI in rawhide. For the one huggingface package that we already have in rawhide, I temporarily dropped the torch integration to solve the FTI.
<@tflink:fedora.im>
16:40:43
!info pytorch is broken with python 3.14 and the fix will not be trivial
<@trix:fedora.im>
16:40:59
its a bad problem. upstrem has not moved to 3.14 so at best its a lot of hack the stuff that depends on 3.14 out.
<@tflink:fedora.im>
16:41:07
!info F43 branch will be on 2025-08-12 and there isn't much time left to fix pytorch for F43
<@trix:fedora.im>
16:41:19
and tie off the problems that causes.
<@trix:fedora.im>
16:41:52
pytorch is a very very big project now, this is getting harder and harder to do.
<@trix:fedora.im>
16:42:43
i really don't want to do this every python release.
<@tflink:fedora.im>
16:43:19
yeah, I was wondering how often this was going to end up being a problem given Fedora's track record of early python adoption
<@trix:fedora.im>
16:43:23
I can see that F42 works just fine with the latest pytorch.
<@trix:fedora.im>
16:43:59
would it be ok to NOT use rawhide as the release
<@tflink:fedora.im>
16:44:03
do you have any feel for the odds that a working pytorch will ship in F43?
<@man2dev:fedora.im>
16:44:08
Does upstream pytorch even support any sort of linux packaging like deb?
<@tflink:fedora.im>
16:44:18
upstream is wheel only, AFAIK
<@trix:fedora.im>
16:45:02
wheel is a contrived release model, pytorch does not build well on any bare distro. maybe ubuntu is better, i don't really know.
<@xanderlent:fedora.im>
16:45:30
My understanding is that if we want to only target stable versions of Fedora with pytorch, we'd have to keep it in a COPR or other external repo. 😞
<@tflink:fedora.im>
16:45:35
I wonder how hard it would be to build pytorch as a flatpak and whether that would end up being usable
<@trix:fedora.im>
16:45:53
From the tag on the 3.14 issue, the upstream will have a fix in 2.9
<@trix:fedora.im>
16:46:04
2.8 is not out yet.
<@tflink:fedora.im>
16:46:24
I was going to guess sept/oct for 2.9 but it sounds like it might be even later
<@trix:fedora.im>
16:46:32
I have been updating the -next part to try to take that.
<@trix:fedora.im>
16:46:50
but having to build pytorch-next on F42
<@trix:fedora.im>
16:47:39
So pytorch could, without heroics and risk of breakage, ship sometime in the F43 life
<@tflink:fedora.im>
16:47:55
I'm not sure it's worth heroics, TBH
<@tflink:fedora.im>
16:48:27
if heroics work, that just brings demands for more heroics next time this happens
<@trix:fedora.im>
16:48:43
there are other heroics in my job going on, so i would rather just not.
<@tflink:fedora.im>
16:49:24
has anyone tried talking to the python folks to see if they have a possible solution?
<@tflink:fedora.im>
16:49:33
pytorch can't be the first package that's run into this problem
<@xanderlent:fedora.im>
16:49:42
I completely agree it's not worth fighting with upstream being on a different release schedule; is there any project wide guidance we have for situations like this?
<@tflink:fedora.im>
16:49:58
other than compat packages? not that I'm aware of
<@tflink:fedora.im>
16:50:11
and I don't consider compat packages to be an acceptable solution
<@tflink:fedora.im>
16:50:33
that's a bunch of fiddly work
<@tflink:fedora.im>
16:51:05
I can talk to the python folk to see if they have any suggestions unless someone else wants to do it
<@trix:fedora.im>
16:52:31
i also don't have the energy or time to fight another entrenched position of updating the tools before the actual release is a feature of fedora.
<@tflink:fedora.im>
16:53:06
I take that as a "no, you should do it" :-D
<@trix:fedora.im>
16:53:08
we already had to take a clang fork on because of clang update.
<@trix:fedora.im>
16:53:25
thankfully i get paid to do that now.
<@tflink:fedora.im>
16:53:29
!action tflink to ask Fedora python SIG about possible solutions to the pytorch+py3.14 problem
<@trix:fedora.im>
16:53:45
i am not getting paid to do pytorch.
<@tflink:fedora.im>
16:53:57
it sounds like we're pretty much stuck on this for now, though. anything else or should we move on?
<@trix:fedora.im>
16:54:16
move on, thanks for listening to me
<@tflink:fedora.im>
16:54:24
np, thanks for working on all this
<@tflink:fedora.im>
16:54:32
!topic Progress on Intel NPU
<@tflink:fedora.im>
16:54:48
Alexander Lent: the floor is yours
<@xanderlent:fedora.im>
16:55:00
Sure, thanks
<@xanderlent:fedora.im>
16:55:52
Building the compiler in driver component was what I was trying to do separately and failing
<@xanderlent:fedora.im>
16:55:52
So, I took some time to re-implement upstream's preferred compiler build process in a RPM spec file.
<@xanderlent:fedora.im>
16:55:52
<@xanderlent:fedora.im>
16:56:12
It turns out even with implementing the upstream preferred process in a spec file, I get the same failures.
<@tflink:fedora.im>
16:56:20
oof
<@xanderlent:fedora.im>
16:56:26
So I think it's something about what's being injected into the RPM build environment
<@xanderlent:fedora.im>
16:56:37
Which is substantial progress, despite the fact that it's still failing. At least it fails in exactly the same way.
<@tflink:fedora.im>
16:56:53
you can override most rpm-injected things, AFAIK
<@tflink:fedora.im>
16:57:28
but you'd either have to know which flag is causing the problem or be willing to do trial-and-error
<@xanderlent:fedora.im>
16:57:33
Indeed, Tom Rix had suggested that in the main chat. Haven't yet gotten to implementing it due to time constraints.
<@tflink:fedora.im>
16:57:59
no worries, from what you've said, the NPU stuff is far from a trivial task
<@trix:fedora.im>
16:58:16
clang forks are hard, its a commitment to get started and keep going.
<@tflink:fedora.im>
16:58:34
!info work on the intel NPU driver continues but is still a work in progress
<@trix:fedora.im>
16:58:50
is there any intel person that maybe could do this ?
<@xanderlent:fedora.im>
16:59:55
It seems like they're leaving it up to distributions to package it outside of their official Debian packages. I'm not sure if they ended up assisting Canonical with the snap packages or if that was entirely on the Canonical side.
<@tflink:fedora.im>
17:00:33
fun
<@xanderlent:fedora.im>
17:00:35
Upstream has been helpful when I have filed issues, but they would prefer we build RPMs with CPack, which as far as I know is not compatible with the Fedora model.
<@xanderlent:fedora.im>
17:01:09
(also they have a whole thing where cmake downloads a bunch of dependencies and again that's not allowed in Fedora builds for good reason)
<@tflink:fedora.im>
17:01:11
yeah, anything shipped with Fedora has to be built in koji with mock
<@tflink:fedora.im>
17:01:22
on top of the network isolation
<@trix:fedora.im>
17:01:42
do you know if the amd npu has similar problems and/or interest in doing its packaging ?
<@xanderlent:fedora.im>
17:02:22
My understanding is that the official way to build RPMs from the AMD NPU source is by running the scripts there which use CPack in part.
<@xanderlent:fedora.im>
17:02:22
I did have some luck manually translating those scripts into Fedora RPMs though.
<@xanderlent:fedora.im>
17:02:22
<@xanderlent:fedora.im>
17:02:59
To be fair to AMD/Xilinx upstream, their build process worked without changes on Fedora and produced RPMs that just worked.
<@trix:fedora.im>
17:03:34
If you have real interest, I could try to mediate and put you in contact for the xilinx team.
<@xanderlent:fedora.im>
17:04:54
<@xanderlent:fedora.im>
17:04:54
I already sent a patch which was accepted that disabled the kernel module build. 😀
<@xanderlent:fedora.im>
17:04:54
I don't think I'll have the bandwidth for that in time for the 43 cycle, but I would definitely be interested in working with them to simplify the build process (maybe make it all into CMake instead of shell scripts).
<@xanderlent:fedora.im>
17:05:11
So maybe something we can pursue in the medium term
<@trix:fedora.im>
17:05:33
ok, just reach out when you have time.
<@xanderlent:fedora.im>
17:05:43
Will do, thanks!
<@xanderlent:fedora.im>
17:05:52
So unless there are any other concerns about NPU, I think we can move on.
<@tflink:fedora.im>
17:05:55
cool, anything else on this?
<@tflink:fedora.im>
17:06:30
looks like no. moving on to the next topic
<@tflink:fedora.im>
17:06:40
!topic Stalled Huggingface Libraries
<@xanderlent:fedora.im>
17:06:54
Sure, this is also mine
<@xanderlent:fedora.im>
17:08:33
The stall is two things:
<@xanderlent:fedora.im>
17:08:33
Second, I've been having some issues cleaning up the various libraries for review. One of the big issues is that the tests for one package often depend on another package and vice versa.
<@xanderlent:fedora.im>
17:08:33
<@xanderlent:fedora.im>
17:08:33
First, a lot of the huggingface libraries have a hard or soft dependency on pytorch. I'm happy to help put whatever cycles I have towards that.
<@xanderlent:fedora.im>
17:08:33
<@trix:fedora.im>
17:09:03
yes, that sucks, i looked at it a bit ago.
<@tflink:fedora.im>
17:09:06
for bootstrapping, I think you can either leave the tests out for now or just comment them out until things are reviewed
<@xanderlent:fedora.im>
17:09:20
Thanks for the suggestion
<@tflink:fedora.im>
17:09:55
it's a reasonably common problem when bootstrapping
<@xanderlent:fedora.im>
17:10:39
<@xanderlent:fedora.im>
17:10:39
So overall, I've been thinking about maybe putting the huggingface stuff on the back burner and readdressing it during the 44 cycle.
<@xanderlent:fedora.im>
17:10:39
That would let me focus more on the actual NPU driver and also helping with the torch situation.
<@xanderlent:fedora.im>
17:10:58
That said, if anyone wants to help clean up packages, I've got a couple prototypes in a GitHub repository & COPR that I can link.
<@tflink:fedora.im>
17:11:41
!info progress has been made on packaging the huggingface libraries but they can't really proceed while pytorch is broken
<@trix:fedora.im>
17:11:56
yes.
<@xanderlent:fedora.im>
17:12:42
I think that's pretty much all for me on the huggingface topic for now.
<@tflink:fedora.im>
17:13:13
sorry that you keep hitting all these roadblocks
<@tflink:fedora.im>
17:13:22
thank you for working on all of this, though
<@trix:fedora.im>
17:13:32
as AI mostly runs on python, if fedora want better ai, then it needs better python.
<@tflink:fedora.im>
17:13:57
that's not really fair to the python folks
<@tflink:fedora.im>
17:14:10
for better or worse, the projects definition of "better" just doesn't match our use cases
<@xanderlent:fedora.im>
17:14:49
I think this is definitely a difficult situation with the upstream schedule mismatch.
<@xanderlent:fedora.im>
17:14:49
<@xanderlent:fedora.im>
17:14:49
I know some of the big distros (is/Ubuntu) were able to get gnome and KDE on a cycle that worked for twice a year releases...
<@xanderlent:fedora.im>
17:14:58
<@xanderlent:fedora.im>
17:14:58
I think this is definitely a difficult situation with the upstream schedule mismatch.
<@xanderlent:fedora.im>
17:14:58
I know some of the big distros (us/Ubuntu) were able to get gnome and KDE on a cycle that worked for twice a year releases...
<@trix:fedora.im>
17:15:02
it's a trade off, and ai is on the losing side. it will always be on the losing side.
<@tflink:fedora.im>
17:16:04
it's the downside of being on the leading edge when the AI stuff is very much not keeping up on that front
<@tflink:fedora.im>
17:16:39
anyhow, moving on to open floor unless there is more on this topic
<@tflink:fedora.im>
17:17:09
!topic open floor
<@tflink:fedora.im>
17:17:22
any other topics that folks want to bring up?
<@tflink:fedora.im>
17:19:35
!info The next AI/ML SIG meeting will be on Thursday, July 31 @ 16:30 UTC
<@tflink:fedora.im>
17:20:18
I know I said 5 minutes but since there aren't many folks here, I don't think there are more topics coming
<@trix:fedora.im>
17:20:34
its ok.
<@tflink:fedora.im>
17:20:39
if there are, we can continue the conversation in #ai-ml:fedoraproject.org
<@tflink:fedora.im>
17:20:43
thanks for coming, everyone
<@tflink:fedora.im>
17:20:53
!endmeeting