18:00:43 #startmeeting Fedora Infrastructure Ops Daily Standup Meeting 18:00:43 Meeting started Tue May 26 18:00:43 2020 UTC. 18:00:43 This meeting is logged and archived in a public location. 18:00:43 The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:43 Useful Commands: #action #agreed #halp #info #idea #link #topic. 18:00:43 The meeting name has been set to 'fedora_infrastructure_ops_daily_standup_meeting' 18:00:43 #chair cverna mboddu nirik smooge 18:00:43 #meetingname fedora_infrastructure_ops_daily_standup_meeting 18:00:43 #info meeting is 30 minutes MAX. At the end of 30, its stops 18:00:43 Current chairs: cverna mboddu nirik smooge 18:00:43 The meeting name has been set to 'fedora_infrastructure_ops_daily_standup_meeting' 18:00:44 #info agenda is at https://board.net/p/fedora-infra-daily 18:00:45 #topic Tickets needing review 18:00:46 #info https://pagure.io/fedora-infrastructure/issues?status=Open&priority=1 18:00:49 someone brought the post-it? 18:00:55 * mboddu is kinda here 18:00:55 * siddharthvipul is here to observe.. don't mind it :) 18:01:02 s/it/him sigh 18:01:04 * pingou will take note 18:01:18 siddharthvipul: we do not mind you observing, but feel free to participate :) 18:01:21 * cverna waives 18:01:23 +1 18:01:31 pingou: definitely :D 18:01:45 nice pile of tickets today due to long weekend and me filing iad2 ones. ;) 18:01:52 .ticket 8904 18:01:53 nirik: Issue #8904: Please provide copr frontend/keygen backups from 2020-05-08 - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8904 18:02:01 does someone want to edit tickets today 18:02:02 ? 18:02:11 * pingou 18:02:23 This one lets move to waiting on asignee, low-trouble, medium-gain, groomed. 18:02:31 and we can do it later when we are not crazy busy 18:02:46 pagure.issue.edit -- pingou edited the priority fields of ticket fedora-infrastructure#8904 https://pagure.io/fedora-infrastructure/issue/8904 18:02:48 .ticket 8941 18:02:49 nirik: Issue #8941: get ipad01/02.iad2 replicating with ipa01/02.phx2 - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8941 18:02:52 they say they're fine with waiting 18:02:58 this one puiterwijk is going to look into. hurray. 18:03:08 This one lets move to waiting on asignee, medium-trouble, medium-gain, groomed. 18:03:17 .8942 18:03:23 pagure.issue.tag.added -- pingou tagged ticket fedora-infrastructure#8941: groomed, medium-gain, and medium-trouble https://pagure.io/fedora-infrastructure/issue/8941 18:03:24 .ticket 8942 18:03:24 pagure.issue.edit -- pingou edited the priority fields of ticket fedora-infrastructure#8941 https://pagure.io/fedora-infrastructure/issue/8941 18:03:27 nirik: Issue #8942: rabbitmq cluster in iad2 not clustering - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8942 18:03:36 this one abompard was going to look into. :) 18:03:46 \ó/ 18:03:51 I had to get sudo working, but that should be the case now (non yubikey) 18:03:53 more help! 18:03:59 This one lets move to waiting on asignee, medium-trouble, medium-gain, groomed. 18:04:08 .8943 18:04:14 sigh 18:04:16 pagure.issue.tag.added -- pingou tagged ticket fedora-infrastructure#8942: groomed, medium-gain, and medium-trouble https://pagure.io/fedora-infrastructure/issue/8942 18:04:17 pagure.issue.edit -- pingou edited the priority fields of ticket fedora-infrastructure#8942 https://pagure.io/fedora-infrastructure/issue/8942 18:04:18 .ticket 8943 18:04:20 nirik: Issue #8943: sigul rpm for epel8/python3 - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8943 18:04:27 this one also puiterwijk is going to look into. 18:04:32 This one lets move to waiting on asignee, medium-trouble, medium-gain, groomed. 18:04:43 pagure.issue.tag.added -- pingou tagged ticket fedora-infrastructure#8943: groomed, medium-gain, and medium-trouble https://pagure.io/fedora-infrastructure/issue/8943 18:04:43 pagure.issue.edit -- pingou edited the priority fields of ticket fedora-infrastructure#8943 https://pagure.io/fedora-infrastructure/issue/8943 18:04:46 .ticket 8944 18:04:48 nirik: Issue #8944: odcs: choose new deployment os - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8944 18:05:00 after discussion on ticket we are going to try for rhel8... 18:05:10 so I set up some rhel8 instances just a few minutes ago. 18:05:31 so, I think we can just close this one now and if we need to reevaluate open a new one/reopen 18:05:33 med-med-groomed? 18:05:39 ok :) 18:05:53 closed as? 18:06:06 fixed I guess, since we decided the question in the title 18:06:18 .ticket 8945 18:06:19 nirik: Issue #8945: mbs fails to deploy in iad2 - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8945 18:06:29 pagure.issue.edit -- pingou edited the close_status and status fields of ticket fedora-infrastructure#8944 https://pagure.io/fedora-infrastructure/issue/8944 18:06:29 pagure.issue.comment.added -- pingou commented on ticket fedora-infrastructure#8944: "odcs: choose new deployment os" https://pagure.io/fedora-infrastructure/issue/8944#comment-654446 18:06:37 this needs looking into. I am not sure who to ping about mbs these days... anyone have ideas? 18:06:50 the koji folks 18:07:00 so Mike McLean and Tomas Kopececk? 18:07:15 * pingou hopes the spelling isn't too far from the reality 18:07:16 well, they are taking it over, but are they also helping us run the old instance we have? 18:07:26 but we can try pinging them sure. 18:07:35 anyone who can get it working is fine with me. 18:07:38 if they aren't, then I honestly don't know 18:07:50 We can try pinging them, then if not we go back to the guys who deployed it for us 18:08:00 mprahl, contyk ? 18:08:12 * pingou needs to step out, can someone take over the tickets/tagging? 18:08:12 ok, so ping them, and med/med/groomed 18:08:19 * mboddu can take over 18:08:34 pingou: will you be back? had a thing for you... but later is fine. 18:08:39 thanks mboddu 18:08:51 I think its high-trouble and high-gain? 18:08:53 Hmm. 18:09:01 yeah, probibly pretty important 18:09:07 Yup 18:09:08 Well, Mike McLean would be the contact person. 18:09:23 contyk: ok, will give him a ring... 18:09:24 pagure.issue.tag.added -- mohanboddu tagged ticket fedora-infrastructure#8945: groomed, high-gain, and high-trouble https://pagure.io/fedora-infrastructure/issue/8945 18:09:25 pagure.issue.edit -- mohanboddu edited the priority fields of ticket fedora-infrastructure#8945 https://pagure.io/fedora-infrastructure/issue/8945 18:09:34 Thanks contyk 18:09:37 .ticket 8946 18:09:38 nirik: Issue #8946: copr backend needs larger volume, but we miss AWS some permissions - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8946 18:10:00 I can try and do this later today if I can find time. 18:10:16 waiting on assignee, medium trouble, high gain, groomed 18:10:25 ack 18:10:35 .ticket 8947 18:10:40 nirik: Issue #8947: Rawhide builds are not pushed to stable - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8947 18:10:46 pagure.issue.tag.added -- mohanboddu tagged ticket fedora-infrastructure#8946: groomed, high-gain, and medium-trouble https://pagure.io/fedora-infrastructure/issue/8946 18:10:48 pagure.issue.edit -- mohanboddu edited the priority fields of ticket fedora-infrastructure#8946 https://pagure.io/fedora-infrastructure/issue/8946 18:11:00 so, I am not sure what our service level is here... I mean I know we want it to be fast, but... 18:11:19 And also, its random 18:11:27 I had a quick look at the celery worker in openshift get killed OOO 18:11:33 By the time we got to it, its fixed :( 18:11:42 which I think explain why it is random and sometime slow 18:11:53 ah... 18:12:00 it depends on the os-node the pod is allocated 18:12:01 cverna: Can we close the ticket and ask them to reopen if they notice it again? 18:12:04 so hopefully the bigger pods in iad2 will help this. 18:12:12 * pingou back, sorry 18:12:30 * mboddu can hand over the duty to pingou if he wants to 18:12:35 yeah the celery worked seems to take a lot of mem lie 22% of the os-node 18:12:39 I think we should explain that it's a OOM issue most likely and we are moving to bigger pods during the move, so not much we can do right now 18:12:43 mboddu: go for it, I'm catching up 18:12:50 pingou: Okay 18:12:55 so I want to look at it to understand if this is normal or not 18:12:55 unless you can see a way to clear it's memory or something? 18:13:10 +1... so lets leave it open for cverna to look? 18:13:18 Okay 18:13:26 cverna: Can you comment on the ticket with your findings? 18:13:33 yeah I need a bit more time to look at it 18:13:39 Okay 18:13:44 so waiting on asignee, med/med, groomed? 18:13:58 +1 18:14:06 ack, but cverna is kinda working on it, so assign it to him? 18:14:22 he's not working on it _now_... so no, leave unassigned. 18:14:27 Okay 18:14:33 then he assigns it when he actually sits down to work on it. :) 18:14:42 pagure.issue.tag.added -- mohanboddu tagged ticket fedora-infrastructure#8947: groomed, medium-gain, and medium-trouble https://pagure.io/fedora-infrastructure/issue/8947 18:14:43 .ticket 8949 18:14:43 pagure.issue.edit -- mohanboddu edited the priority fields of ticket fedora-infrastructure#8947 https://pagure.io/fedora-infrastructure/issue/8947 18:14:44 nirik: Issue #8949: bodhi times out on update with many (almost 300) packages - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8949 18:14:52 so, this might in fact be related. ;) 18:14:52 yeah I ll assign it to myself when I can focus on it 18:15:08 this my be memory related, trying to load too many things 18:15:08 cverna: what do you want to do with this one? 18:15:11 nirik: I feel so 18:15:21 but, wild guess 18:15:21 no different issue, OpenShift haproxy kill the request because it takes too long 18:15:33 I have change the timeout to 120s and it seems to help 18:15:53 This can be assigned to me since i look at it now :) 18:16:10 ok. 18:16:14 pagure.issue.assigned.added -- cverna assigned ticket fedora-infrastructure#8949 to cverna https://pagure.io/fedora-infrastructure/issue/8949 18:16:27 and move to assignee, etcetc 18:16:27 cverna: You got to it before me :) 18:16:31 pagure.issue.tag.added -- cverna tagged ticket fedora-infrastructure#8949: low-trouble and medium-gain https://pagure.io/fedora-infrastructure/issue/8949 18:16:32 pagure.issue.edit -- cverna edited the priority fields of ticket fedora-infrastructure#8949 https://pagure.io/fedora-infrastructure/issue/8949 18:16:41 .ticket 8950 18:16:42 nirik: Issue #8950: OpenShift build stuck — a node (or just Docker?) needs restarting, I think - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8950 18:16:49 and... possibly more related. ;) 18:16:54 :D 18:17:01 shall I restart docker on all the nodes? 18:17:01 I can take this one if it's just about restarting docker 18:17:11 or sure, pingou can or whoever wants 18:17:33 do we want to restart on all or see if we can pinpoint which one? 18:17:37 yeah we could maybe add a daily cron job that does a docker restart 18:17:40 low trouble, medium gain, groomed? 18:17:43 I'd just do them all for now. 18:17:47 mboddu: ack 18:17:52 nirik: roger on it 18:17:57 mboddu: assigne to me 18:18:03 pagure.issue.tag.added -- mohanboddu tagged ticket fedora-infrastructure#8950: groomed, low-trouble, and medium-gain https://pagure.io/fedora-infrastructure/issue/8950 18:18:04 pagure.issue.edit -- mohanboddu edited the priority fields of ticket fedora-infrastructure#8950 https://pagure.io/fedora-infrastructure/issue/8950 18:18:18 pagure.issue.assigned.added -- mohanboddu assigned ticket fedora-infrastructure#8950 to pingou https://pagure.io/fedora-infrastructure/issue/8950 18:18:24 pingou: Done 18:18:25 .ticket 8951 18:18:26 nirik: Issue #8951: yubikey auth isn't working in iad2 - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8951 18:18:35 puiterwijk: was going to look at this one too. 18:18:46 waiting on asignee, med/med/groomed 18:18:53 High gain? 18:19:09 well, otp works... so there's somewhat of a work around 18:19:12 but sure 18:19:28 hm, what's the difference b/w os_infra_nodes and os_nodes? 18:19:30 pagure.issue.tag.added -- mohanboddu tagged ticket fedora-infrastructure#8951: groomed, high-gain, and medium-trouble https://pagure.io/fedora-infrastructure/issue/8951 18:19:47 note that noggin doesn't support yubikey atm 18:19:53 Its security, so, its always a high gain for me :) 18:19:58 so we may loose that in a soonish future 18:20:16 infra nodes are ones that run routers/infra jobs, normal nodes can run other non infra tagged tasks 18:20:28 oh fudge sorry.. was focusing on somehting 18:20:42 so, thats all the needs-reviews in infra. 18:20:51 Now the releng side 18:21:00 nirik: so I want to restart os_nodes then, correct? 18:21:07 pingou: yep. 18:21:10 thanks 18:21:11 mboddu: go for it 18:21:24 .releng 9473 18:21:26 mboddu: Issue #9473: Fedora Python Classroom Lab container images not available @ candidate-registry.fedoraproject.org - releng - Pagure.io - https://pagure.io/releng/issue/9473 18:21:38 cverna: Any thoughts? 18:21:44 I didn't get a chance to look at it 18:21:45 pagure.issue.comment.added -- cverna commented on ticket fedora-infrastructure#8949: "bodhi times out on update with many (almost 300) packages" https://pagure.io/fedora-infrastructure/issue/8949#comment-654468 18:21:54 I didn't think we made any containers from labs/spins? 18:22:03 * cverna clicks 18:23:05 mboddu: candidate-registry is garbage collected I think we delete all images that are older than 30days 18:23:23 so that is likely the reason why this image is not there anymore 18:23:42 nirik: this is just a normal layered container image available in dist-git 18:23:52 ah... ok 18:23:58 pagure.issue.tag.removed -- pingou removed the groomed, low-trouble, and medium-gain tags from ticket fedora-infrastructure#8950 https://pagure.io/fedora-infrastructure/issue/8950 18:23:59 pagure.issue.assigned.reset -- pingou reset the assignee of ticket fedora-infrastructure#8950 https://pagure.io/fedora-infrastructure/issue/8950 18:24:00 pagure.issue.edit -- pingou edited the priority fields of ticket fedora-infrastructure#8950 https://pagure.io/fedora-infrastructure/issue/8950 18:24:01 pagure.issue.comment.added -- pingou commented on ticket fedora-infrastructure#8950: "OpenShift build stuck — a node (or just Docker?) needs restarting, I think" https://pagure.io/fedora-infrastructure/issue/8950#comment-654470 18:24:04 rahg! 18:24:05 Okay, I will comment on the ticket 18:24:31 pagure.issue.tag.added -- pingou tagged ticket fedora-infrastructure#8950: groomed, low-trouble, and medium-gain https://pagure.io/fedora-infrastructure/issue/8950 18:24:32 pagure.issue.assigned.added -- pingou assigned ticket fedora-infrastructure#8950 to pingou https://pagure.io/fedora-infrastructure/issue/8950 18:24:33 pagure.issue.edit -- pingou edited the priority fields of ticket fedora-infrastructure#8950 https://pagure.io/fedora-infrastructure/issue/8950 18:25:08 .releng 9472 18:25:09 mboddu: Issue #9472: update stuck because bodhi thinks it's not signed again - releng - Pagure.io - https://pagure.io/releng/issue/9472 18:25:23 nirik: Is it the same as the other ticket in infra? 18:25:43 no. that was a rawhide one, this is a f32 one. 18:25:43 mboddu: for some context https://pagure.io/ContainerSIG/container-sig/issue/33 18:26:17 yeah this is upstream https://github.com/fedora-infra/bodhi/issues/4032 18:26:54 I can manually fix the update, but I have an upstream fix since yesterday 18:27:36 Okay, I will comment on the ticket and add the groomed tag 18:27:57 cverna: Can you fix this manually for now? 18:28:00 mboddu: I ll fix it now it takes 2 min 18:28:08 Thanks cverna++ 18:28:26 I ll comment on the ticket how to fix that 18:28:58 cverna: Sure and close the ticket once its fixed as well 18:29:17 .releng 9469 18:29:18 mboddu: Issue #9469: Block nuvola-app-google-calendar from koji - releng - Pagure.io - https://pagure.io/releng/issue/9469 18:29:44 cverna: maybe the howto repo? 18:29:44 So, I fixed a bunch of pdc entries few days back but it seems some of them are still sneaky and got missed 18:29:52 we should also note how to check if a build is signed 18:30:20 yeah I usually check the tags in koji 18:30:22 Generally I will look at build history to check if a build is signed or not 18:30:40 I usually call koji write-signed-build. :) 18:30:47 if its not signed, that errors. 18:31:16 Coming back to 9469, I will work on them tomorrow, as I got EOL work going on. 18:31:23 So, adding groomed tag to it 18:31:26 sounds good. +1 18:31:51 I know we are over time, but I wanted to bring up a quick item... 18:31:59 mboddu: oh, did you have anymore? 18:32:13 nirik: well, unretirement stuff, not important 18:32:17 nirik: Go ahead 18:32:19 ok. 18:32:49 so, resultsdb is currently in the qa network... but we need it. it's currently f31 I think... 18:33:03 so, should we just move it into our normal prod network with the move? 18:33:15 and should we keep it at f31? or try and upgrade? 18:33:51 it also currently uses qa-db01... but if we move it into our prod network we can just use db01... 18:34:01 or should I ask this on the list? :) 18:34:13 I'd be ok w/ it in the main network 18:34:22 Maybe check with qa before you do that, but generally +1 18:34:25 I would like us not to make a decision on the OS 18:34:42 nirik: can you give me a date/deadline for this? 18:34:45 the fewer changes the better right now. ;) 18:34:49 I'd like to apply a little more pressure on this 18:34:55 can we run it in OpenShift ? 18:35:14 well, the virthost it's on will be turned off the week of june 8th? :) 18:35:20 so we have to move it before then... 18:35:26 cverna: I don't know 18:35:45 and I am not sure we have time... but I'll go with whatever people want 18:36:00 technically yes, but it'll require some changes to the clients that load data in it 18:36:14 ok so yeah we don't have time :) 18:36:28 nirik: thanks I'll raise this again 18:36:41 worst case I think I know the answer and how to proceed 18:36:42 I guess it's clear we are moving it... 18:36:51 I just don't want us to do it 18:36:53 I just want to know where it would make the most sense to move it to. 18:37:03 lol 18:37:09 a then b vs b then a :) 18:37:43 I think the chances of someone else doing it are... low. 18:37:56 I'd like them to at least be there 18:38:04 even if it's only to look over our shoulders 18:38:15 threebean did say he was willing to help with it on the move week... 18:39:38 so, give it more time and ask around a bit more? 18:40:10 let's give it until early next week 18:40:21 well, thats cutting it very very very close 18:40:24 but ok 18:40:32 thanks 18:40:46 I'm hoping to send out later today a link to a doc for testing/validating things in iad2. 18:41:06 most things are up and running there, in various levels of working. 18:41:21 we have over run our 30min time slot do we want to close the meeting ? and then we can continue the convo if needed 18:41:34 yeah, lets end, thats fine. 18:41:43 thanks cverna 18:41:46 #endmeeting