<@meetbot:fedora.im>
08:30:08
HTML Minutes: https://meetbot.fedoraproject.org/meeting-3_matrix_fedoraproject-org/2024-06-20/cpe-infra-releng-daily-standup.2024-06-20-08.01.html
<@lenkaseg:fedora.im>
16:00:50
!startmeeting Infrastructure (2024-06-20)
<@meetbot:fedora.im>
16:00:50
Meeting started at 2024-06-20 16:00:50 UTC
<@meetbot:fedora.im>
16:00:50
The Meeting name is 'Infrastructure (2024-06-20)'
<@lenkaseg:fedora.im>
16:00:59
!topic namaste
<@lenkaseg:fedora.im>
16:00:59
!info About our team: https://docs.fedoraproject.org/en-US/cpe/
<@lenkaseg:fedora.im>
16:00:59
!info Fedora Infra documentation: https://docs.fedoraproject.org/en-US/infra
<@lenkaseg:fedora.im>
16:00:59
!info Agenda is at: https://board.net/p/fedora-infra
<@lenkaseg:fedora.im>
16:00:59
!chair nirik zlopez nb bodanel dtometzki jnsamyak lenkaseg patrikp
<@lenkaseg:fedora.im>
16:00:59
meetingname infrastructure
<@lenkaseg:fedora.im>
16:01:14
!info Agenda is at: https://board.net/p/fedora-infra
<@lenkaseg:fedora.im>
16:01:14
!info Fedora Infra documentation: https://docs.fedoraproject.org/en-US/infra
<@lenkaseg:fedora.im>
16:01:14
!info About our team: https://docs.fedoraproject.org/en-US/cpe/
<@lenkaseg:fedora.im>
16:01:14
!topic namaste
<@lenkaseg:fedora.im>
16:01:14
!meetingname infrastructure
<@lenkaseg:fedora.im>
16:01:14
!chair nirik zlopez nb bodanel dtometzki jnsamyak lenkaseg patrikp
<@meetbot:fedora.im>
16:01:14
The Meeting Name is now infrastructure
<@nirik:matrix.scrye.com>
16:01:34
morning
<@lenkaseg:fedora.im>
16:01:41
Hello everybody!
<@Zlopez:matrix.org>
16:01:42
!hi
<@zodbot:fedora.im>
16:01:44
Michal Konecny (zlopez)
<@lenkaseg:fedora.im>
16:03:47
!info This is a place where people who are interested in Fedora Infrastructure can introduce themselves
<@lenkaseg:fedora.im>
16:03:47
!topic New folks introductions
<@lenkaseg:fedora.im>
16:03:47
!info Getting Started Guide: https://docs.fedoraproject.org/en-US/infra/gettingstarted/
<@lenkaseg:fedora.im>
16:05:08
Any new folks today?
<@lenkaseg:fedora.im>
16:06:22
seems not!
<@lenkaseg:fedora.im>
16:06:28
!topic Next chair
<@lenkaseg:fedora.im>
16:06:35
!info chair 2024-07-11 pcreech
<@lenkaseg:fedora.im>
16:06:35
!info magic eight ball says:
<@lenkaseg:fedora.im>
16:06:35
!info chair 2024-06-27 nirik
<@lenkaseg:fedora.im>
16:06:35
!info chair 2024-06-20 lenkaseg
<@nirik:matrix.scrye.com>
16:06:53
looks like we are pretty booked up. :)
<@pcreech:matrix.org>
16:07:01
:D
<@lenkaseg:fedora.im>
16:07:04
yep, that's nice!
<@Zlopez:matrix.org>
16:07:17
Aren't we missing one week?
<@leo:fedora.im>
16:07:31
hey!
<@lenkaseg:fedora.im>
16:07:32
any volunteer for chair 2024-07-18?
<@Zlopez:matrix.org>
16:07:38
Oh right, that's 4th July
<@nirik:matrix.scrye.com>
16:07:45
07-04 is a holiday in the us... I think we decided to cancel?
<@lenkaseg:fedora.im>
16:08:05
I think in cz it's holiday as well?
<@Zlopez:matrix.org>
16:08:12
@nirik I think it's in announcements :-)
<@nirik:matrix.scrye.com>
16:08:25
ok, fair. :)
<@lenkaseg:fedora.im>
16:08:43
!info chair 2024-07-04 cancelled due to holidays
<@Zlopez:matrix.org>
16:09:08
lenkaseg: No, the 6th July is holiday
<@lenkaseg:fedora.im>
16:09:19
ah true
<@Zlopez:matrix.org>
16:09:31
And 5th as well :-D
<@lenkaseg:fedora.im>
16:09:39
Let's get to the next topic...
<@lenkaseg:fedora.im>
16:09:46
!topic announcements and information
<@lenkaseg:fedora.im>
16:09:52
!info CPE Infra&Releng NA-hours team has a Monday through Thursday 30 minute meeting going through tickets at 1800 UTC in #fedora-meeting-3
<@lenkaseg:fedora.im>
16:09:52
!info CPE Infra&Releng EU-hours team has a Monday through Thursday 30 minute meeting going through tickets at 0800 UTC in https://matrix.to/#/#meeting-3:fedoraproject.org
<@lenkaseg:fedora.im>
16:09:52
!info Fedora infra meeting is cancelled for 2024-07-04 due to US Holidays
<@nirik:matrix.scrye.com>
16:10:12
!info rhel7 eol is coming up fast: 2024-06-30
<@lenkaseg:fedora.im>
16:11:15
10 days to go...are we ready?
<@nirik:matrix.scrye.com>
16:11:52
getting closer... :)
<@Zlopez:matrix.org>
16:11:55
!info There are issues with signing, if you see signing of packages stuck, please let us know in #admin:fedoraproject.org
<@Zlopez:matrix.org>
16:12:45
I would like to go through the current state of the rhel7 EOL today
<@lenkaseg:fedora.im>
16:14:05
Let's get there, but first:
<@lenkaseg:fedora.im>
16:14:17
!info https://fedoraproject.org/wiki/Infrastructure/Oncall
<@lenkaseg:fedora.im>
16:14:17
<@lenkaseg:fedora.im>
16:14:17
!info https://docs.fedoraproject.org/en-US/cpe/day_to_day_fedora/
<@lenkaseg:fedora.im>
16:14:17
!topic Oncall
<@lenkaseg:fedora.im>
16:14:28
!info leo is on call from 2024-06-14 to 2024-06-20
<@lenkaseg:fedora.im>
16:14:28
!info zlopez is on call from 2024-06-21 to 2024-06-27
<@lenkaseg:fedora.im>
16:14:39
!info Summary of last week: (from current oncall)
<@lenkaseg:fedora.im>
16:14:43
over to leo!
<@leo:fedora.im>
16:15:15
i think i had one oncall ping about signing, but it was 2am for me
<@leo:fedora.im>
16:15:26
that’s a bit it
<@leo:fedora.im>
16:15:37
that’s about it
<@Zlopez:matrix.org>
16:15:42
Let me assign myself to oncall
<@lenkaseg:fedora.im>
16:16:13
leo: Thanks for the report!
<@Zlopez:matrix.org>
16:16:21
!oncall
<@zodbot:fedora.im>
16:16:23
<@zodbot:fedora.im>
16:16:23
If they do not respond, please file a ticket (https://pagure.io/fedora-infrastructure/issues)
<@zodbot:fedora.im>
16:16:23
● @Zlopez:matrix.org (zlopez) Current Time for them: 18:16 (Europe/Prague)
<@zodbot:fedora.im>
16:16:23
The following people are oncall:
<@Zlopez:matrix.org>
16:16:29
Done
<@lenkaseg:fedora.im>
16:16:47
D we have some volunteer for 2024-06-27 to 2024-07-04?
<@nirik:matrix.scrye.com>
16:17:14
I can take it if no one else...
<@lenkaseg:fedora.im>
16:17:29
thanks nirik!
<@lenkaseg:fedora.im>
16:17:42
!info nirik is on call from 2024-06-28 to 2024-07-04
<@lenkaseg:fedora.im>
16:17:54
<@lenkaseg:fedora.im>
16:17:54
!topic Monitoring discussion [nirik]
<@lenkaseg:fedora.im>
16:17:59
!info https://nagios.fedoraproject.org/nagios
<@lenkaseg:fedora.im>
16:17:59
!info Go over existing out items and fix
<@nirik:matrix.scrye.com>
16:18:29
So, proxy31 seems to be alerting a lot, looks like general network stuff tho, so not much we can do about it off hand...
<@nirik:matrix.scrye.com>
16:18:45
otherwise we have just buildhw-x86-16 down
<@Zlopez:matrix.org>
16:19:12
I'm not sure what is happening there I looked at the nrpe log and saw plenty of ssl handshake errors
<@Zlopez:matrix.org>
16:19:23
On proxy31
<@nirik:matrix.scrye.com>
16:19:37
just pinging over the vpn link there gives packet loss.
<@nirik:matrix.scrye.com>
16:19:50
so I think it's just a network problem between it and iad2.
<@Zlopez:matrix.org>
16:19:54
But we had some issue with ssh today, but nils was able to solve it by invalidating kerberos cache on batcave01
<@nirik:matrix.scrye.com>
16:20:14
oh? issue on batcave01 ssh?
<@Zlopez:matrix.org>
16:20:42
It was ssh from batcave01, it either took too long or just rejected you
<@nirik:matrix.scrye.com>
16:20:54
from batcave01 to proxy31?
<@Zlopez:matrix.org>
16:21:03
So the playbooks were failing with unreachable machines
<@Zlopez:matrix.org>
16:21:12
From batcave01 to everything
<@nirik:matrix.scrye.com>
16:21:26
huh, sounds like sssd was having some problem?
<@Zlopez:matrix.org>
16:21:58
Yeah, the ssh was failing on gssapi-key authenticaton, publickey worked fine
<@Zlopez:matrix.org>
16:22:25
Invalidating the kerberos cache helped
<@nirik:matrix.scrye.com>
16:22:57
we shouldn't be using gssapi normally I thought...
<@Zlopez:matrix.org>
16:23:16
And I saw every proxy today alerting about TicketKey age, but it got resolved by itself after few hours
<@nirik:matrix.scrye.com>
16:23:25
but ok, we can discuss more out of meeting? not sure what could have happened aside sssd issues.
<@Zlopez:matrix.org>
16:23:38
Could it be that something was changed in ssh policy?
<@nirik:matrix.scrye.com>
16:23:47
thats updated by a cron from batcave01, so if ssh wasn't working it would have cause that too.
<@nirik:matrix.scrye.com>
16:24:11
I don't think anything would have changed... but I could be wrong
<@Zlopez:matrix.org>
16:24:29
You are right, the ticketkey alert was resolved after we fixed the ssh
<@Zlopez:matrix.org>
16:25:43
And I noticed that robosignatory playbook is saying that autosign02 is unreachable
<@nirik:matrix.scrye.com>
16:26:28
yes, this is normal. I marked to reply to that
<@nirik:matrix.scrye.com>
16:26:38
I just haven't had any time yet to reply to emails/tickets today. :)
<@Zlopez:matrix.org>
16:26:44
Ok, thanks
<@Zlopez:matrix.org>
16:27:05
I got to e-mails about 1500 my time, didn't had time till then :-D
<@nirik:matrix.scrye.com>
16:27:38
I have a pile to reply to (since I was out yesterday due to us holiday)
<@Zlopez:matrix.org>
16:27:51
Just take your time
<@Zlopez:matrix.org>
16:28:19
I don't have anything else related to monitoring, I'm trying to watch nagios more
<@nirik:matrix.scrye.com>
16:28:44
yeah, hopefully we can get alerts down so things are more calm...
<@Zlopez:matrix.org>
16:28:44
Just didn't get the kerberos authentication working with flatpaked browser
<@nirik:matrix.scrye.com>
16:29:47
we can move on from nagios I think...
<@lenkaseg:fedora.im>
16:30:24
let's move on to RHEL7 EOL?
<@Zlopez:matrix.org>
16:30:33
+1
<@lenkaseg:fedora.im>
16:30:47
!ticket https://pagure.io/fedora-infrastructure/issue/11815
<@lenkaseg:fedora.im>
16:30:47
!info Check the progress on tickets related to RHEL7 EOL (end of June)
<@lenkaseg:fedora.im>
16:30:47
!topic RHEL7 EOL
<@lenkaseg:fedora.im>
16:31:05
!ticket 11821
<@zodbot:fedora.im>
16:31:06
<@zodbot:fedora.im>
16:31:06
● **Assignee:** zlopez
<@zodbot:fedora.im>
16:31:06
● **Last Updated:** 6 hours ago
<@zodbot:fedora.im>
16:31:06
● **Opened:** 3 months ago by zlopez
<@zodbot:fedora.im>
16:31:06
**fedora-infrastructure #11821** (https://pagure.io/fedora-infrastructure/issue/11821):**rhel7 - sundries servers**
<@nirik:matrix.scrye.com>
16:31:25
I saw there was packaging stuff still here. ;(
<@nirik:matrix.scrye.com>
16:31:34
I can try and look today/tomorrow on it...
<@Zlopez:matrix.org>
16:31:59
I tried to build the `translate-toolkit` in `epel9-infra` and it's missing `python-panda`
<@Zlopez:matrix.org>
16:32:21
Or `python3.i686`
<@nirik:matrix.scrye.com>
16:32:39
thats pretty crazy.
<@Zlopez:matrix.org>
16:33:02
In COPR I was able to build it as they are using `CodeReady builder` repository to provide `python3.i686` package
<@Zlopez:matrix.org>
16:33:33
The python-panda has around 20 dependencies that are missing in EPEL9 :/
<@nirik:matrix.scrye.com>
16:34:02
koji buildroot repos are not multiarch... it's pretty strange to require a 32bit python tho...
<@Zlopez:matrix.org>
16:34:32
I can still try to build that, but for now I will probably disable the role, just to see what else will fail
<@Zlopez:matrix.org>
16:35:16
<@nirik:matrix.scrye.com>
16:35:23
ok.
<@Zlopez:matrix.org>
16:35:34
This is the dependency I'm struggling with
<@Zlopez:matrix.org>
16:36:25
I'm stuck on the web-builder role for pretty long :-D
<@Zlopez:matrix.org>
16:36:48
Just to get one dependency in place
<@Zlopez:matrix.org>
16:37:41
I think we can move to next ticket, I will continue on this one tomorrow
<@nirik:matrix.scrye.com>
16:37:55
ok, I can look and see if I can see anything too...
<@lenkaseg:fedora.im>
16:38:32
!ticket 8281
<@zodbot:fedora.im>
16:38:33
● **Last Updated:** 4 years ago
<@zodbot:fedora.im>
16:38:33
**fedora-infrastructure #8281** (https://pagure.io/fedora-infrastructure/issue/8281):**Create fedora-cloud and fedora-testing Google Cloud projects**
<@zodbot:fedora.im>
16:38:33
<@zodbot:fedora.im>
16:38:33
● **Closed: Fixed** 4 years ago by dustymabe
<@zodbot:fedora.im>
16:38:33
● **Opened:** 4 years ago by bgilbert
<@zodbot:fedora.im>
16:38:33
● **Assignee:** Not Assigned
<@lenkaseg:fedora.im>
16:38:38
ah sorry
<@lenkaseg:fedora.im>
16:38:50
!ticket 8213
<@zodbot:fedora.im>
16:38:51
● **Last Updated:** a month ago
<@zodbot:fedora.im>
16:38:51
● **Assignee:** zlopez
<@zodbot:fedora.im>
16:38:51
<@zodbot:fedora.im>
16:38:51
**fedora-infrastructure #8213** (https://pagure.io/fedora-infrastructure/issue/8213):**fedmsg -> fedora-messaging migration tracker**
<@zodbot:fedora.im>
16:38:51
● **Opened:** 4 years ago by kevin
<@Zlopez:matrix.org>
16:39:24
We can skip this one, once everything is moved from RHEL7, we can retire fedmsg completelly
<@lenkaseg:fedora.im>
16:39:34
alright!
<@lenkaseg:fedora.im>
16:39:43
!ticket 8455
<@zodbot:fedora.im>
16:39:44
● **Last Updated:** 2 hours ago
<@zodbot:fedora.im>
16:39:44
● **Opened:** 4 years ago by smooge
<@zodbot:fedora.im>
16:39:44
<@zodbot:fedora.im>
16:39:44
**fedora-infrastructure #8455** (https://pagure.io/fedora-infrastructure/issue/8455):**Move mailman to newer release of Fedora or CentOS**
<@zodbot:fedora.im>
16:39:44
● **Assignee:** zlopez
<@Zlopez:matrix.org>
16:40:18
I got SMTP port open in Data Center, but I'm still seeing issues on mailman
<@nirik:matrix.scrye.com>
16:40:21
so, I had this to reply too too...
<@lenkaseg:fedora.im>
16:40:42
btw, maybe let's close ticket 8213?
<@nirik:matrix.scrye.com>
16:40:45
there were alerts on mailman01.stg yesterday... so I gave it more memory and that seemed to make it happier
<@nirik:matrix.scrye.com>
16:41:06
perhaps low memory was the cause of the issues you were seeing?
<@Zlopez:matrix.org>
16:41:09
I'm able to sent e-mail from mailman01.stg to my inbox, and from batcave to lists.stg.fedoraproject.org, but it's failing when handling the message
<@Zlopez:matrix.org>
16:41:50
I will probably recreate the message with more CPUs and RAM, just to test if this isn't the issue
<@Zlopez:matrix.org>
16:42:10
Because the worker is just being killed during handling of the message
<@nirik:matrix.scrye.com>
16:42:27
The vm now has 32gb memory and seems like it's happier, but I am not sure if this would fix the problem you were seeing... needs testing. ;)
<@Zlopez:matrix.org>
16:42:46
The machine now has only 2 CPU and 16 GB RAM and one CPU is 100% most of the time
<@nirik:matrix.scrye.com>
16:43:16
I gave it 32gb yesterday. :)
<@Zlopez:matrix.org>
16:43:26
It's possible that I changed the host_vars and didn't redeploy it yet
<@nirik:matrix.scrye.com>
16:43:56
I manually adjusted it... ;) but if you want to redeploy you could.
<@Zlopez:matrix.org>
16:44:06
Oh, you are right :-D
<@Zlopez:matrix.org>
16:44:40
If you did that yesterday, then the issue will be somewhere else
<@nirik:matrix.scrye.com>
16:44:47
we should also think about scheduling the prod rollout if we want to do it next week... but I guess we can't know until we are sure it's ready
<@Zlopez:matrix.org>
16:45:14
There are errors like this `[CRITICAL] WORKER TIMEOUT (pid:926547)`
<@nirik:matrix.scrye.com>
16:45:15
well, did you test since yesterday? is it still failing handling the message?
<@Zlopez:matrix.org>
16:45:31
I got the port open today, so the testing is from today
<@nirik:matrix.scrye.com>
16:45:47
ok then, it's likely not fixed by more memory
<@Zlopez:matrix.org>
16:46:02
I will investigate more tomorrow and see if I can find anything
<@Zlopez:matrix.org>
16:46:39
I don't see anything strange on SMTP level, as the postfix is processing the message without any error
<@nirik:matrix.scrye.com>
16:46:59
yeah, I think thats a guinicorn error.
<@Zlopez:matrix.org>
16:47:13
Yeah, it is
<@Zlopez:matrix.org>
16:47:53
I will try to turn the debugging on for gunicorn, maybe that will show me more info
<@nirik:matrix.scrye.com>
16:48:18
https://github.com/benoitc/gunicorn/issues/1801
<@nirik:matrix.scrye.com>
16:48:30
but I think thats just saying to increase the timeout
<@Zlopez:matrix.org>
16:49:29
I will look at it tomorrow and maybe I will figure it out, but the mailman upgrade is getting closer :-)
<@nirik:matrix.scrye.com>
16:49:34
and indeed might mean too few cpus
<@Zlopez:matrix.org>
16:50:08
It depends if it is working well with threads or needs a CPU for itself
<@nirik:matrix.scrye.com>
16:50:23
yeah, not sure.
<@Zlopez:matrix.org>
16:51:00
I think we can continue to next ticket
<@lenkaseg:fedora.im>
16:51:25
Zlopez: is there some urgent one from those in RHEL7 eol?
<@Zlopez:matrix.org>
16:53:08
Yeah, fedimg one
<@lenkaseg:fedora.im>
16:53:21
!ticket 11803
<@zodbot:fedora.im>
16:53:24
**fedora-infrastructure #11803** (https://pagure.io/fedora-infrastructure/issue/11803):**rhel7 EOL - Fedimg**
<@zodbot:fedora.im>
16:53:24
● **Opened:** 4 months ago by zlopez
<@zodbot:fedora.im>
16:53:24
<@zodbot:fedora.im>
16:53:24
● **Last Updated:** 3 hours ago
<@zodbot:fedora.im>
16:53:24
● **Assignee:** humaton
<@Zlopez:matrix.org>
16:53:31
There is now related request from Jeremy Cline for setup of AWS machine
<@nirik:matrix.scrye.com>
16:53:45
yeah, I'll try and get to that today...
<@nirik:matrix.scrye.com>
16:53:54
well, a aws account/role... not a machine.
<@Zlopez:matrix.org>
16:53:55
After that it will be supporting the AWS cloud, which was missing
<@Zlopez:matrix.org>
16:54:13
My bad account/role
<@nirik:matrix.scrye.com>
16:54:27
yep. looking possible before the deadline. ;)
<@naraiank:fedora.im>
16:55:12
!Hi
<@Zlopez:matrix.org>
16:55:29
I would like to know the state of people.fp.o
<@nirik:matrix.scrye.com>
16:55:50
well, it's blocked on planet...
<@nirik:matrix.scrye.com>
16:56:09
phsmoura has the needed changes ready for fasjson/noggin, but needed some help getting tests working
<@Zlopez:matrix.org>
16:56:13
!11822
<@Zlopez:matrix.org>
16:56:19
!ticket 11822
<@nirik:matrix.scrye.com>
16:56:23
if someone could assist with tests there that would be great.
<@zodbot:fedora.im>
16:56:25
● **Last Updated:** 3 hours ago
<@zodbot:fedora.im>
16:56:25
● **Assignee:** lenkaseg
<@zodbot:fedora.im>
16:56:25
**fedora-infrastructure #11822** (https://pagure.io/fedora-infrastructure/issue/11822):**rhel7 - Fedora people**
<@zodbot:fedora.im>
16:56:25
<@zodbot:fedora.im>
16:56:25
● **Opened:** 3 months ago by zlopez
<@Zlopez:matrix.org>
16:56:28
Hello The Exorcist
<@naraiank:fedora.im>
16:56:33
I'm from Fedora Testing team.
<@naraiank:fedora.im>
16:56:41
Hi Zlopez
<@Zlopez:matrix.org>
16:57:02
And right now we need somebody for testing :-D
<@nirik:matrix.scrye.com>
16:57:30
https://github.com/fedora-infra/noggin/pull/1410
<@lenkaseg:fedora.im>
16:57:49
you were thinking it so strongly that you summoned The Exorcist!
<@nirik:matrix.scrye.com>
16:57:56
https://github.com/fedora-infra/fasjson/pull/720
<@naraiank:fedora.im>
16:58:22
I'm not into infrastructure / Server Testing sofar., but would like to participate if opportunity is given.
<@Zlopez:matrix.org>
16:59:33
I see that one is waiting for Aurélien B review and the second one is waiting for change from phsmoura
<@phsmoura:fedora.im>
17:00:10
yep, I think its the last one, the PR in freeipa fas was approved already
<@lenkaseg:fedora.im>
17:00:43
We're out of time for today, so I'm gonna end the meeting here, but feel free to continue the discussion!
<@lenkaseg:fedora.im>
17:00:44
!endmeeting