<@Zlopez:matrix.org>
16:00:25
!startmeeting Infrastructure (2024-05-23)
<@meetbot:fedora.im>
16:00:28
Meeting started at 2024-05-23 16:00:25 UTC
<@meetbot:fedora.im>
16:00:28
The Meeting name is 'Infrastructure (2024-05-23)'
<@Zlopez:matrix.org>
16:00:36
!meetingname infrastructure
<@Zlopez:matrix.org>
16:00:41
!chair nirik zlopez nb bodanel dtometzki jnsamyak lenkaseg patrikp
<@Zlopez:matrix.org>
16:00:48
!info Agenda is at: https://board.net/p/fedora-infra
About our team: https://docs.fedoraproject.org/en-US/cpe/
Fedora Infra documentation: https://docs.fedoraproject.org/en-US/infra
<@Zlopez:matrix.org>
16:00:56
!topic namaste
<@Zlopez:matrix.org>
16:00:59
!hi
<@zodbot:fedora.im>
16:01:03
Michal Konecny (zlopez)
<@jnsamyak:matrix.org>
16:01:15
!hi
<@zodbot:fedora.im>
16:01:20
Samyak Jain (jnsamyak) - he / him / his
<@Zlopez:matrix.org>
16:01:23
Welcome everyone to today fedora infrastructure meeting
<@leo:fedora.im>
16:01:28
hey!
<@Zlopez:matrix.org>
16:01:38
I'm your chair for today
<@lenkaseg:fedora.im>
16:01:39
!hi
<@zodbot:fedora.im>
16:01:44
Lenka Segura (lenkaseg)
<@nirik:matrix.scrye.com>
16:02:17
morning
<@Zlopez:matrix.org>
16:05:19
So let's look if there is somebody new here
<@Zlopez:matrix.org>
16:05:36
!topic New folks introductions
<@Zlopez:matrix.org>
16:05:40
!info This is a place where people who are interested in Fedora Infrastructure can introduce themselves
Getting Started Guide: https://docs.fedoraproject.org/en-US/infra/gettingstarted/
<@Zlopez:matrix.org>
16:07:01
It doesn't seem that there is anybody new here today
<@Zlopez:matrix.org>
16:07:17
So let's look at the parade of chairs
<@Zlopez:matrix.org>
16:07:19
!topic Next chair
<@Zlopez:matrix.org>
16:07:26
!info magic eight ball says:
<@Zlopez:matrix.org>
16:07:30
!info chair 2024-05-23 zlopez
<@Zlopez:matrix.org>
16:07:35
!info chair 2024-05-30 ???
<@Zlopez:matrix.org>
16:08:08
It seems that the chair for next week is unoccupied
<@phsmoura:fedora.im>
16:08:21
I can take it
<@Zlopez:matrix.org>
16:08:21
Who wants to sit in that comfortable chair?
<@Zlopez:matrix.org>
16:08:29
Sold!
<@Zlopez:matrix.org>
16:08:38
!info chair 2024-05-30 phsmoura
<@Zlopez:matrix.org>
16:09:35
!info chair 2024-06-06 ???
<@Zlopez:matrix.org>
16:09:49
Anybody else wants comfortable chair?
<@Zlopez:matrix.org>
16:10:04
This one is nice it has two 06 in the date :-)
<@nirik:matrix.scrye.com>
16:11:31
I guess I could if no one else wants it... or we could wait and see next week. ;)
<@Zlopez:matrix.org>
16:12:28
Let's leave it for next week
<@Zlopez:matrix.org>
16:12:49
So next topic for this awesome meeting is ...
<@Zlopez:matrix.org>
16:12:53
!topic announcements and information
<@Zlopez:matrix.org>
16:13:00
!info CPE Infra&Releng EU-hours team has a Monday through Thursday 30 minute meeting going through tickets at 0800 UTC in https://matrix.to/#/#meeting-3:fedoraproject.org
<@Zlopez:matrix.org>
16:13:05
!info CPE Infra&Releng NA-hours team has a Monday through Thursday 30 minute meeting going through tickets at 1800 UTC in #fedora-meeting-3
<@Zlopez:matrix.org>
16:13:16
!info Friday 24th May RedHat Recharge day
<@Zlopez:matrix.org>
16:13:38
Plenty of folks will be unavailable during Red Hat Recharge Day
<@nirik:matrix.scrye.com>
16:14:12
!info MBS has been retired along with message tagging service
<@nirik:matrix.scrye.com>
16:14:34
!info Monday May 27th is a holiday in the US.
<@Zlopez:matrix.org>
16:16:14
Anything else to announce here?
<@nirik:matrix.scrye.com>
16:17:42
oh, one last one:
<@nirik:matrix.scrye.com>
16:18:09
!info nirik will be out the week of June 10th - 14th. :)
<@Zlopez:matrix.org>
16:18:24
Good for nirik
<@nirik:matrix.scrye.com>
16:18:46
Hopefully things will be quiet.
<@Zlopez:matrix.org>
16:19:12
I will try to keep the ship afloat with others :-)
<@Zlopez:matrix.org>
16:19:46
Let's move to another topic :-)
<@Zlopez:matrix.org>
16:19:47
!topic Oncall
<@Zlopez:matrix.org>
16:19:53
!info https://fedoraproject.org/wiki/Infrastructure/Oncall
https://docs.fedoraproject.org/en-US/cpe/day_to_day_fedora/
<@Zlopez:matrix.org>
16:20:02
!info leo is on call from 2024-05-16 to 2024-05-23
!info nirik is on call from 2024-05-23 to 2024-05-30
<@Zlopez:matrix.org>
16:20:57
!info ??? is on call from 2024-05-31 to 2024-06-06
<@Zlopez:matrix.org>
16:21:24
Do we have somebody to take the last week?
<@Zlopez:matrix.org>
16:22:42
If none, we still have somebody to take over the next week
<@Zlopez:matrix.org>
16:22:57
And we can figure the week after that on next meeting
<@Zlopez:matrix.org>
16:23:04
!info Summary of last week: (from current oncall)
<@Zlopez:matrix.org>
16:23:12
leo: The floor is yours
<@leo:fedora.im>
16:23:18
zero oncall pings ;)
<@Zlopez:matrix.org>
16:23:33
That was quick :-)
<@nirik:matrix.scrye.com>
16:23:37
nice
<@Zlopez:matrix.org>
16:23:42
!topic Monitoring discussion [nirik]
<@Zlopez:matrix.org>
16:23:47
!info https://nagios.fedoraproject.org/nagios
Go over existing out items and fix
<@Zlopez:matrix.org>
16:24:00
I saw proxy14 misbehaving again
<@nirik:matrix.scrye.com>
16:24:34
yeah, I bumped its max again
<@nirik:matrix.scrye.com>
16:24:48
I guess I should push that to all of them.
<@nirik:matrix.scrye.com>
16:25:06
it fills up workers and stops processing
<@Zlopez:matrix.org>
16:25:44
So max processes are not enough?
<@nirik:matrix.scrye.com>
16:25:47
https://admin.fedoraproject.org/collectd/bin/index.cgi?hostname=proxy14.fedoraproject.org&plugin=apache×pan=86400&action=show_selection&ok_button=OK
<@nirik:matrix.scrye.com>
16:26:15
looks like it spiked to 1.4k/sec...
<@nirik:matrix.scrye.com>
16:26:33
and max was 2500 total. I bumped it to 3500
<@nirik:matrix.scrye.com>
16:27:17
open to other ideas if that doesn't do it
<@Zlopez:matrix.org>
16:27:24
That is strange that there were so much connections at once
<@nirik:matrix.scrye.com>
16:27:44
yeah, I think its mirrorllist requests for epel7.
<@nirik:matrix.scrye.com>
16:28:10
https://data-analysis.fedoraproject.org/csv-reports/images/epel-stacked.png
<@nirik:matrix.scrye.com>
16:28:23
smooge talked about this at the last epel meeting
<@Zlopez:matrix.org>
16:28:34
Strange it that it doesn't share the load between all the proxies
<@Zlopez:matrix.org>
16:29:02
Strange is that it doesn't share the load between all the proxies
<@nirik:matrix.scrye.com>
16:29:06
yeah, although it might just be NA ones
<@nirik:matrix.scrye.com>
16:30:35
should look and see if any others spike at the same time
<@nirik:matrix.scrye.com>
16:31:38
anyhow, for monitoring: some certs we need to renew, pagure-stg01 being off which we need to fix, some badges alerts and a mysql backup selinux thing.
<@nirik:matrix.scrye.com>
16:32:19
ot sure if the badges ones are due to slow processing, or needing to change something for the new deployment
<@nirik:matrix.scrye.com>
16:32:34
CRIT: no fedbadges messages in 259200 seconds
<@Zlopez:matrix.org>
16:32:50
This is production or staging deployment?
<@nirik:matrix.scrye.com>
16:32:56
RABBITMQ_QUEUE CRITICAL - messages CRITICAL (122632), messages_ready OK (122557) messages_unacknowledged OK (75) consumers OK (3)
<@nirik:matrix.scrye.com>
16:33:15
prod
<@Zlopez:matrix.org>
16:33:37
That is strange
<@nirik:matrix.scrye.com>
16:34:08
I think the first one might need adjusting, and the second is just slower processing
<@Zlopez:matrix.org>
16:34:15
It seems that is not processing or not processing fast enough
<@nirik:matrix.scrye.com>
16:34:34
it does go down over time, then back up
<@Zlopez:matrix.org>
16:35:46
Could be related to some exact event in Fedora infra
<@Zlopez:matrix.org>
16:37:21
I think we can now move on
<@Zlopez:matrix.org>
16:38:20
So what we will do next, add thumb up to what you want to do
<@nirik:matrix.scrye.com>
16:38:27
yeah, sorry, was looking for a graph of fedbadges. ;)
<@Zlopez:matrix.org>
16:38:28
Backlog refinement
<@Zlopez:matrix.org>
16:38:36
RHEL 7 EOL
<@Zlopez:matrix.org>
16:38:54
Lerning topic, if we can come with something
<@Zlopez:matrix.org>
16:39:03
Finish the meeting early
<@aggraxis:fedora.im>
16:39:16
I have a dumb question
<@aggraxis:fedora.im>
16:39:24
re: epel 7 specifically
<@Zlopez:matrix.org>
16:39:33
Let's ask
<@aggraxis:fedora.im>
16:40:12
At my office we fire off reposyncs nightly for the main repos we consume, including epel 7, 8, and 9
<@aggraxis:fedora.im>
16:40:27
would that contribute to the issue we're seeing on the proxy?
<@aggraxis:fedora.im>
16:40:45
I wanna say it's at like 0100-ish UTC
<@Zlopez:matrix.org>
16:41:12
Maybe adding more workers on fedbadges side will help to process it faster
<@nirik:matrix.scrye.com>
16:41:14
nope. it shouldn't be related. ;)
<@nirik:matrix.scrye.com>
16:41:50
that should just be one metalink request (or 3) and then directly to the mirrors that are pointed at
<@aggraxis:fedora.im>
16:42:16
ok cool. and it skips over stuff it already has, being efficient and happy along the way
<@nirik:matrix.scrye.com>
16:42:18
Over a longer time you can see it spiked before and caught up eventually:
<@nirik:matrix.scrye.com>
16:42:40
yeah, should be no problem...
<@Zlopez:matrix.org>
16:43:21
I didn't noticed that this is just a few hours on the graph, if it recovers, it seems that is just related to some event that produce a large amount of messages
<@nirik:matrix.scrye.com>
16:43:56
could be. I know Aurélien B has been trying to optimise it... but it's written in a poor way for it's queries.
<@Zlopez:matrix.org>
16:44:30
Yeah, it would need a rewrite in the future, but at least it's now working on RHEL 9 at least
<@nirik:matrix.scrye.com>
16:44:33
ie, it does a query for 'hey, give me every build user X has ever done' and then it filters that in python to what it really wants...
<@Zlopez:matrix.org>
16:45:15
That is not a good query, it looks like `SELECT * FROM builds;`
<@nirik:matrix.scrye.com>
16:45:35
yeah. :( and the datanommer db is... large.
<@Zlopez:matrix.org>
16:46:16
How big it is now? Around 50 TB?
<@aggraxis:fedora.im>
16:46:21
oh wow
<@nirik:matrix.scrye.com>
16:46:59
ha. no, it's not that big... let me see.
<@nirik:matrix.scrye.com>
16:47:23
something around 1.3TB...
<@nirik:matrix.scrye.com>
16:47:47
similar to the koji db
<@Zlopez:matrix.org>
16:47:49
Oh, I was really off :-D
<@Zlopez:matrix.org>
16:48:05
But it's still big for a db
<@Zlopez:matrix.org>
16:48:46
Let me just switch the topic as we are doing open floor anyway
<@Zlopez:matrix.org>
16:48:47
!topic Open Floor
<@nirik:matrix.scrye.com>
16:49:42
so, I just looked and other proxies also do see a spike around the same time... so it seems like proxy14 either doesn't process them as fast or gets more for some reason.
<@nirik:matrix.scrye.com>
16:50:53
so, oh well, will try higher limit for now. ;)
<@Zlopez:matrix.org>
16:52:04
Hopefully that will fix the issue, at least for now
<@nirik:matrix.scrye.com>
16:52:33
perhaps once rhel7 goes eol the high requests there will stop/go down... I guess we will see.
<@Zlopez:matrix.org>
16:52:51
It should happen soon
<@nirik:matrix.scrye.com>
16:53:23
well, even at eol there's no certenty that they will stop using it. :)
<@aggraxis:fedora.im>
16:53:35
I hope so. We're pushing hard for EL7 retirement. I think I have two infrastructure related items left, then I can retire the repo managers and the template VMs.
<@nirik:matrix.scrye.com>
16:53:38
I mean they should... but...
<@nirik:matrix.scrye.com>
16:53:54
yeah, we have a few things left too. Hopefully we will make it in time.
<@nirik:matrix.scrye.com>
16:54:57
24 vm's left
<@aggraxis:fedora.im>
16:55:10
Ran into a nasty dependency issue where I tried to take this last asset straight to EL9, only to find out it has the flexera license manager in it, and THAT depends on redhat-lsb-core. straight up won't launch without it. so a lot of other stuff that uses flexLM will have issues going to 9. fortunately, the package is still in EL8, so we migrated to that instead.
<@nirik:matrix.scrye.com>
16:55:37
There's a package in epel9 that should help with that...
<@aggraxis:fedora.im>
16:56:17
I'll look into it. I kept getting a weird message about whatever command it wanted not existing.
<@Zlopez:matrix.org>
16:56:30
We are trying to work on them as fast as we could
<@nirik:matrix.scrye.com>
16:56:47
lsb_release is the package. It just provides a lsb_release command...
<@aggraxis:fedora.im>
16:57:06
I tried that, but had no luck
<@nirik:matrix.scrye.com>
16:57:21
it's not everything because LSB is... dead and bad these days, but it provides the script things use. ;)
<@nirik:matrix.scrye.com>
16:57:26
ah bummer.
<@nirik:matrix.scrye.com>
16:57:49
well, 8 is ok for now too.
<@aggraxis:fedora.im>
16:58:07
good news is someone will fix that eventually. my oddball edge case isn't going to matter for now. (it was the teradici pcoip management console)
<@aggraxis:fedora.im>
16:59:49
we use it to send configuration data and firmware to our teradici-based zero clients. we don't put actual hosts on peoples' desks, just these little dumb clients that reach back to a vmware horizon vdi broker
<@aggraxis:fedora.im>
17:01:00
anyhow I've hijacked the meeting sorry lol
<@Zlopez:matrix.org>
17:01:27
We are at the end of time dedicated for this meeting, feel free to keep the discussion running in other channel :-)
<@Zlopez:matrix.org>
17:01:32
!endmeeting