15:00:05 #startmeeting Infrastructure (2019-02-27) 15:00:05 Meeting started Thu Feb 28 15:00:05 2019 UTC. 15:00:05 This meeting is logged and archived in a public location. 15:00:05 The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:05 Useful Commands: #action #agreed #halp #info #idea #link #topic. 15:00:05 The meeting name has been set to 'infrastructure_(2019-02-27)' 15:00:05 #meetingname infrastructure 15:00:05 #topic aloha 15:00:05 #chair nirik pingou puiterwijk relrod smooge tflink threebean cverna mkonecny mizdebsk 15:00:05 The meeting name has been set to 'infrastructure' 15:00:05 Current chairs: cverna mizdebsk mkonecny nirik pingou puiterwijk relrod smooge tflink threebean 15:00:15 morning 15:00:27 .hello2 15:00:28 mizdebsk: mizdebsk 'Mikolaj Izdebski' 15:00:30 .hello2 15:00:31 pingou: pingou 'Pierre-YvesChibon' 15:00:32 .hello smooge 15:00:34 smooge: smooge 'Stephen J Smoogen' 15:00:38 .hello zlopez 15:00:39 mkonecny: zlopez 'Michal Konečný' 15:00:46 hello 15:02:43 #topic New folks introductions 15:02:43 #info This is a place where people who are interested in Fedora Infrastructure can introduce themselves 15:02:43 #info Getting Started Guide: https://fedoraproject.org/wiki/Infrastructure/GettingStarted 15:02:47 o/ 15:02:58 Hello any new people this week? And what would you like to say? 15:04:36 #topic announcements and information 15:04:36 #info nirik will have sparse hours due to house move 15:04:36 #info Mass update/reboots planned for 2019-02-26 -> 2019-03-01 15:04:36 #info Beta Freeze Begins 2019-03-05 15:04:36 #info Pagure 3.2.0/3.2.1 was deployed to stg/prod/pkgs. 15:04:37 #info Staging Koji sync planned for 2019-03-08 (ticket 7600) 15:04:39 #info Test gating temporarily disabled in Bodhi, awaiting a 3.13.3 release for https://github.com/fedora-infra/bodhi/issues/3044 15:05:13 #info Build infrastructure updated/rebooted on Friday because of crunch before freeze 15:05:32 any other announcements for this week? 15:05:33 Fri ? 15:05:42 if anyone has any objections for staging koji sync, please comment in the ticket or let me know 15:05:44 another round? 15:06:11 yeah.. we freeze on Tuesday and could not update/reboot koji last night due to trying to get a working compose 15:06:13 which failed 15:06:44 oh build infra 15:06:50 misread that, sorry 15:07:13 ah ok. fudge 15:07:23 the email to say we are doing this never got sent 15:08:16 .hello2 15:08:17 bowlofeggs: bowlofeggs 'Randy Barlow' 15:08:36 ok sent that 15:08:52 #info Test gating temporarily disabled in Bodhi, awaiting a 3.13.3 release for https://github.com/fedora-infra/bodhi/issues/3044 15:09:06 smooge: i added a topic to gobby a few min ago, fyi 15:09:14 yeah I put it in the above 15:09:16 oh you did have that info 15:09:18 hahahaha 15:09:20 sorry 15:09:27 I do this live people! 15:09:29 i'd blame coffee, but i have coffee 15:09:41 #topic Oncall 15:09:41 #info smooge is on call from 2019-02-14 -> 2019-02-21 15:09:41 #info puiterwijk is on call from 2019-02-21 -> 2019-02-28 15:09:41 #info smooge is on call from 2019-02-28 -> 2019-03-07 15:09:41 #info ?????? is on call from 2019-03-07 -> 2019-03-14 15:09:42 #info ?????? is on call from 2019-03-14 -> 2019-03-21 15:09:44 #info Summary of last week: (from smooge ) 15:09:46 and I missed that 15:10:04 You mean from Patrick? Not much special, just some pings. 15:10:16 puiterwijk, did a lot of ping work this last week 15:10:27 thank you puiterwijk 15:12:20 +1 15:12:28 zodbot: alias add oncall "echo smooge (Stephen Smoogen) is oncall. Please file a ticket if you don't hear from me ( https://pagure.io/fedora-infrastructure/issues ) My current hours are 1100 UTC to 1900 UTC Monday through Friday" 15:12:28 smooge: Kneel before zod! 15:12:46 I am moving my hours 1 hour later.. I will sleep in again 15:12:59 is anyone able to take oncall next week? 15:13:54 the week of the 7th? 15:14:07 7th -> 14th 15:14:16 smooge, i can take it if needed 15:14:29 thanks mizdebsk 15:14:39 ah i cant' do taht weekend 15:14:44 but i could do the business days 15:14:53 ah mizdebsk has it 15:15:13 i can do 14-21 15:15:24 bowlofeggs: note that also in smooge's alias, there's no expectation you are available over weekends. Other times people just file issues :) 15:15:34 ah ok 15:15:45 well i'd be like extra unavailable, like wouldn't even see tickets 15:15:49 no interweb 15:15:56 but bars 15:16:06 bears 15:16:10 haha yeah 15:16:12 #topic Monitoring discussion 15:16:12 #info https://nagios.fedoraproject.org/nagios 15:16:12 #info Go over existing out items and fix 15:16:26 smooge: am i on for the 14th-21st? 15:17:07 bowlofeggs, you said you can do it so I added you there 15:17:12 cool 15:17:12 sorry for not confirming 15:17:34 so on our monitoring.. we have a bunch of red services which I will try to clean up. 15:17:54 The one I am worried about is notifs-backend01.phx2.fedoraproject.org 15:17:54 15:17:54 15:17:54 Check fedmsg-hub consumers backlog 15:17:54 15:17:55 This service has 1 comment associated with it This service problem has been acknowledged 15:17:57 UNKNOWN 02-28-2019 15:16:00 0d 18h 20m 21s 3/3 UNKNOWN: fedmsg consumer FMNConsumer not found 15:18:30 this came up after the reboot but I don't know enough about fedmsg or notifs-backend to diagnose/fix 15:18:41 smooge: rerun playbook basically. 15:18:42 can someone help me with this later? 15:18:47 ok will do 15:18:54 Basically, the permissions on monitoring are... funny at times 15:20:00 pkgs02 swap should clear up after the next reboot 15:20:13 #topic Tickets discussion 15:20:14 #info https://pagure.io/fedora-infrastructure/report/Meetings%20ticket 15:20:30 mizdebsk, this is your ticket 15:21:21 .ticket 7588 15:21:22 mizdebsk: Issue #7588: OpenShift app monitoring with Nagios - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/7588 15:21:46 like we discussed on one of previous meetings, we need a monitoring for apps in openshift 15:22:01 yeah.. it was quite clear last weekend 15:22:02 prometheus ? 15:22:03 or at least i need, not sure how others feel with production apps with no monitoring 15:22:05 last week 15:22:14 +1 15:22:28 for me a simple check for number of running pods is sufficient 15:23:02 chris7871, we are currently using nagios, it is integrated with notification system 15:23:18 i think prometheus would need much more work than adding a simple nagios plugin 15:23:32 I would be glad to get notification about failure with log :-) 15:23:39 but i am wondering whether we need something more than just checking number of running pods 15:23:44 mizdebsk: we do have prometheus running in openshift though. Automatically configured and what not :) 15:24:06 if yes then it could make sense to package and deploy one of existing nagios plugins for monitoring openshift 15:24:06 Prometheus is checking things like how often pods get restarted, how many pods are dead, whether the control plane is okay, etc 15:24:38 smooge found one nagios plugin that can check various things, it would need a serviceaccount with enough read-only privileges 15:24:42 i didn't know there was a nagios plugin for openshift monitoring 15:25:25 someone in ubuntu land made one 15:25:35 on the other hand, if there's not much interest in getting nagios to work with openshift then from my pov it would be much simpler to write a custom plugin myself and put it in ansible.git 15:26:03 so i would like to hear your opinions about nagios + openshift 15:26:22 mizdebsk, there is interest. I think packaging up that git repo and testing it makes sense. I am hoping to do so next week 15:26:35 1. make something quick until a different monitoring is implemented and deployed? or 2. invest more time in a proper nagios plugin 15:27:06 ok, thx smooge 15:27:07 puiterwijk, can we make prometheus send out emails to our pages? 15:27:15 and irc 15:27:22 smooge: we should be able to set that up, yeah. Otherwise, we can have nagios monitor it... 15:28:35 ok either way. mizdebsk if I don't have a PoC by next Thursday meeting.. you have complete permission to do what is needed :) 15:28:54 ok 15:29:01 nothing else from my side on this topic 15:29:20 #topic Priorities for next week? 15:29:20 #info please put tickets needing to be focused on here 15:29:44 #info Beta Freeze on Tuesday, composers willing 15:30:43 #info upgrade/fix of ci box on Monday 15:31:03 #info reboot/updates of build systems on Friday (will need extra eyes) 15:31:19 those are the items I know about. any other items needed focus on? 15:32:16 we will be pushing the fedora-messaging effort next week, not sure how well it will go with the freeze 15:32:27 but I guess we will find out :-) 15:32:52 just ask for freeze exceptions and do extra coordination with mboddu/adamw 15:33:03 robot says no 15:33:05 sorry what? 15:33:10 aka if it breaks them then no 15:33:25 freeze is a perfect time to do work in staging 15:33:34 but yeah trying to push it during the freeze seems unfortunate timing 15:33:39 oh man, freeze already? crazy 15:33:45 gotta bust out my heavy cota 15:33:46 everyone who wanted to migrate to messaging would need freeze exceptions to do it 15:33:47 *coat 15:34:04 cverna: we'll be playing in stg which isn't covered by freeze 15:34:11 oh ok 15:34:16 sorry I thought this was prod 15:34:38 cverna, skip what I said.. if this is stg you have a good time 15:34:59 I think the end goal is to put it in prod, and it would be nice not to have to wait for the end of the freezw 15:35:17 freeze but we will see how that goes no need to worry too much :) 15:35:40 how long is the freeze ? 2 or 3 weeks ? 15:35:49 cverna, until a beta is out the door 15:35:58 depends if its slips or not 15:36:16 cverna: once we have all our ducks ready in stg, it's easier to ask for freeze break for prod 15:36:20 ok even better :) 15:36:28 starting with the out layer and working inword 15:36:52 we'll have patch that people can review and proof that things should work :) 15:37:07 pingou: yes that 's why I think no need to worry to much but I think it is good to share that info :) 15:37:14 ok anything else for priorities next week. I think the non-stage work is 1 week out 15:37:44 oh I have one freeze related question but that can wait for Open floor :) 15:37:57 ok next 15:38:04 #topic Discuss: Is the Fedora pastebin still useful? - relrod 15:38:04 #info how many users are using it? [3000 posts a day from 350 ips] 15:38:06 * mboddu is happy to help 15:38:18 this is just to answer an item.. that was a question from last meeting 15:38:34 we avg 3000 posts a day from around 350 different ips a day 15:38:54 next one is a short one too 15:38:56 #topic Lots of cron job noise. 15:38:56 #info filing tickets on each one 15:38:56 #info should we assign to 'owner' to fix? 15:39:46 I am going through the daily 200+ emails sysadmin-* gets from various cron jobs. I will be filing tickets on ones which show up each day and will try to clean them up because I am tired of them 15:40:12 smooge++ 15:40:47 oh wow, that's not a small undertaking, thanks for doing this smooge 15:41:05 please look out for them and if you see one your service has.. fix it and close it. If you are a bystander/apprentice/extra time look and see what script in ansible is doing it and suggest a fix 15:41:17 that is all 15:41:32 topic Discuss: Future of fedora-packages - cverna 15:41:32 #info ongoing thread on the infra list https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org/thread/VT54S4LBEGTC6SAGHHPZ3VZA6K5GDOQ7/ 15:41:40 cverna, your turn 15:41:59 not much on this just raising awareness of the thread on the mailing list 15:43:07 Overall I propose looking at elasticsearch to replace fedora-packages 15:43:23 s/looking/to look* 15:43:40 we woooould need the one part of bodhi that uses it to use something to get that info 15:43:47 bodhi uses packages to search package names in the web UI 15:43:50 not a major feature 15:43:52 but still 15:44:08 my keyboard has been doing this thing lately where it sticks on keys for some reasons 15:44:12 that's why those ooooooo's 15:44:14 hahahah 15:44:27 bowlofeggs: yes it would make more sense if bodhi queried elasticsearch directly 15:44:30 you need to eat less honey baklava at the keyboard. its what brings the bears 15:44:36 haha 15:44:45 src.fp.o would be a good candidate too 15:44:50 and ants 15:44:50 yeah as long as i have some place to get it that works for me 15:45:30 cverna, how much data does fedpkg get/have/keep these days? When I was looking at elasticsearch in the past we were looking to replace ALL the things because everyone wanted something 15:45:47 I want to use packages to guess the name of the package in Anitya 15:46:45 smooge just for fedora-packages currently we store ~24000 documents ( one per package ) 15:46:49 cverna: i also do use the web UI 15:46:53 what would replace that? 15:47:08 the main reason i use the web UI is to find out, in one place, what versions of a package are in which releases 15:47:27 bodhi can kiiinda answer it, except rawhide (soon to be fixed) and it's UI is not as nice for it 15:47:28 bowlofeggs: that's another question that don't have a good answer :S 15:47:36 but maybe i can just make bodhi['s UI better at that 15:47:55 ok that is a lot smaller than I was building out for 15:48:01 the problem is that bodhi can only search and you have to use your brain to collate teh answers 15:48:17 packages does taht collation for you, which is nice (but not critical i suppose) 15:48:27 ryan pointed out that src.fp.o could be a good candidate too 15:48:29 it's really just that it presents the data in an easier to consume format for my brain ☺ 15:48:30 ok I would like to get the next item in before we close out the meeting. 15:48:53 I recommend everyone with an interest to read the thread and we come up with a way forward at next meeting? 15:49:07 i'm not opposed to removing it 15:49:07 yes thanks smooge 15:49:11 just things to consider 15:49:15 +1 smooge 15:49:19 #topic bodhi-3.13.3 deployment date - bowlofeggs 15:49:19 #info 3.13.3 to address https://github.com/fedora-infra/bodhi/issues/3044 15:49:19 #info test gating is disabled until 3.13.3 is released 15:49:19 #info do we want to deploy 3.13.3 this late in the week, or wait for Monday? 15:49:23 anyway it will stop working when f29 is eol 15:49:24 alright 15:49:34 so we have 1 year or so 15:49:36 OK I would really like to wait until Monday 15:49:45 so right now test gating is disabled in bodhi, because greenwave is returning HTTP 500's on about 1/14 requests 15:49:55 yeah i myself would also suggest monday 15:50:04 i cannot be around on this weekend to react if something goes weird 15:50:04 unless it would get pulled in when I do yum update tomorrow on the bodhi boexes 15:50:13 it won't get pulled in 15:50:19 i only tag to stg until i want to deploy 15:50:34 but i wanted to ask the ops for their preference 15:50:44 sounds like monday is preferred and that's also what i recommend 15:50:58 ok then I would prefer to Monday as I think this weekend will be a lot of 'why did this brreaaaaakkk omg ' 15:51:10 #info decided to update on Monday 15:51:12 i generally don't like to do deployments on thursday/friday or even wednesday unless it's critical 15:51:16 imo, this is not critical 15:51:23 few packages use gating today 15:51:26 bowlofeggs, I can help with this after 15:00 UTC on Monday 15:51:35 cool 15:51:50 i don't expect problems, but then again, i never do and there are problems sometimes ☺ 15:52:04 ok last item for this week 15:52:14 #topic Open Flush 15:52:30 cverna, you said you had something? 15:53:01 yes how do we get a system off the freeze list ? I think OSBS should not be impacted by the freeze 15:53:24 why not? it is used to build the release 15:53:34 cverna, if something is producing somehting for the release it is mission critical 15:53:47 mizdebsk smooge we release container every 2 weeks 15:54:07 so we should freeze OSBS every week 15:54:13 right, but doesn't QA work on the branch containers during the freeze? 15:54:24 and QA doesnt' QA the containers the rest of the cycle? 15:54:26 and the base image is not built by OSBS 15:54:31 ah 15:54:32 at least currently 15:54:35 cverna, don't you want to use it for building base image? 15:54:38 yeah i was thinkin gabout the base container 15:54:48 later but currently it does not make sense 15:55:08 do we not consider our container applications to be part of the release then? 15:55:14 seems like a releng question actually 15:55:15 if OSBS is broken that does not impact the container release at all 15:55:49 we don't mass rebuild layered image maybe we should 15:55:51 should we ask releng if they consider the artifacts from OSBS to be part of the release to inform the question? 15:56:19 bowlofeggs, +1 15:56:35 sounds good, anyway the question was is there a process to ask to remove a system from the freeze list 15:56:44 cverna, so to get it off the list, I would make an email case that the artifacts it creates are not in the items that QA or Releng needs to be stable. Get those groups to say 'sure thing' and we can make it official 15:56:56 cverna, it should be fine to discuss it here, on the list or in a ticket 15:57:04 there is no other process for that 15:57:23 I would prefer on the list or a ticket for something that is easy for people to go back to in 2 years 15:57:25 ok thanks that's sound like a good plan 15:57:45 thank you for bringing it up and giving the reasons 15:57:46 yes the list might be a better medium 15:58:03 is there anything else on this? 15:58:12 not from me 15:58:33 any other items? I can close this out with 30 seconds to spare... 15:58:50 thank you all for coming and helping each other 15:58:56 #endmeeting