18:00:10 #startmeeting Infrastructure (2015-08-27) 18:00:10 Meeting started Thu Aug 27 18:00:10 2015 UTC. The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:10 Useful Commands: #action #agreed #halp #info #idea #link #topic. 18:00:10 #meetingname infrastructure 18:00:10 #topic aloha 18:00:10 #chair smooge relrod nirik abadger1999 lmacken dgilmore mdomsch threebean pingou puiterwijk pbrobinson 18:00:10 The meeting name has been set to 'infrastructure' 18:00:10 Current chairs: abadger1999 dgilmore lmacken mdomsch nirik pbrobinson pingou puiterwijk relrod smooge threebean 18:00:11 #topic New folks introductions / Apprentice feedback 18:00:16 hello everyone. 18:00:17 hi 18:00:18 hello everyone 18:00:20 * pingou 18:00:21 here 18:00:35 * tflink shows up for the right meeting this time 18:00:36 Hello! 18:00:37 * rahulrrixe_ here 18:00:42 tflink: \รณ/ 18:00:43 * relrod here 18:00:43 any new folks like to introduce themselves? 18:00:50 or apprentices with questions or comments ? 18:01:00 * threebean is here 18:01:04 * sayan is here 18:01:15 is here 18:01:27 is here 18:01:32 * roshi is here 18:01:36 hello 18:02:03 hola 18:02:21 * prth_ is here 18:02:36 ok, if no new folks with introductions or apprentices with questions, we can move on to info / announcements dump... 18:02:50 #topic announcements and information 18:02:50 #info Bunch more work on bodhi2 rollout, help on easyfix tickets appreciated - lmacken/threebean 18:02:50 #link https://github.com/fedora-infra/bodhi/pulse/monthly 18:02:50 #link https://fedoraproject.org/easyfix 18:02:51 #info Mass update/reboot cycle next week: monday/cloud, tuesday/build, wed/the rest - kevin/patrick/smooge 18:02:52 #info Week after next is freeze (2015-09-08) - everyone 18:02:54 #info May swap lockbox01 out for new batcave01 on friday if it's ready - kevin 18:02:56 #info Apprentice wiki pages updated. Look today! - kevin 18:02:58 #info taskotron, blockerbugs updated for bodhi2. most issues fixed - tflink 18:03:00 #info large pagure update on its way (bunch of bug fixes and small features) - pingou & all 18:03:02 anything in there folks would like to discuss more? 18:03:15 SmootherFrOgZ: are you around today?/ 18:03:39 what apprentice wiki pages? 18:03:58 https://fedoraproject.org/wiki/Infrastructure_Apprentice 18:04:14 got it. 18:04:31 nirik - just a comment, read up on the new quest, looks do-able...will start in on it soon 18:04:33 so, SmootherFrOgZ isn't around for fas3 talk... lets defer that one and move on. 18:04:49 aikidouke: cool. :) 18:05:08 we need to think of some more things like that... that are ongoing with lots of small parts so many people can chip away at them 18:05:13 nirik, on the cloud reboots? how many of the old instances are still up? 18:05:35 sorry - one other q - is it ok to add a link to ansible resources on the wiki? 18:05:55 smooge: let me see. There's 13. but let me kill the jenkins ones now that they are done and the old copr stuff... 18:05:58 aikidouke: sure! 18:06:07 ok ty 18:06:34 tflink: there's still a qadevel instance there... that can be terminated right? 18:06:47 and did we ever figure out about testdays? 18:06:48 nirik: yeah, I thought that had already been terminated 18:07:00 it is a ghost 18:07:40 i thought we were fine to turn it off but apparently folks were still relying on it 18:08:05 I don't see any upcoming events in the instance, though 18:08:29 well, we can leave it for a few more days... but we should sort it. 18:08:35 I guess we could try and copy all data off it. 18:08:51 apparently, I don't understand how important it is? 18:09:05 if it's holding up the cloud decomissioning, just turn it off 18:09:09 well, me either. :) 18:09:38 ok, so there's 2 instances left really... testdays and shogun-ca 18:09:39 iirc, it doesn't have any data on it we don't already have in the wiki 18:09:39 if it's that critical, I'll charge adam and mike in units of beer to get the new one up and running 18:10:01 yeah, the data isn't really important. the app is more so 18:10:05 * roshi readies his emergency case of PBR 18:10:08 we can leave them up for a bit as we need time to reinstall nodes and add them to the new cloud. 18:10:21 (you know, in case the hipsters attack, have to have something to barter with) 18:10:24 ha 18:10:26 roshi: those count as negative beers and I'm ashamed you have them stashed away 18:10:30 oh, that makes sense 18:10:38 :D 18:10:39 roshi: wait, are you calling me a hipster? 18:10:42 ok, lets move on and if we need to we can kill them when we get to those compute nodes. 18:10:54 :p 18:11:04 if it's holding up the migration, just terminate it and let us know 18:11:11 k 18:11:32 ok, on to some trac ticket review fun. ;) 18:11:34 #topic TRAC tickets review - p_klos 18:11:34 #link https://fedorahosted.org/fedora-infrastructure/ticket/833 18:11:45 833 18:11:55 Do we have IDS? Is that actual? 18:11:57 ok, so this is something we have talked about over the years. 18:11:58 no. 18:12:01 Any tasks to be done with IDS? 18:12:07 we deployed one a number of years ago, but it sucked. 18:12:21 and/or couldn't keep up with traffic. 18:12:30 I understand. 18:12:44 Ticket to be closed? 18:12:48 lmacken: you around? 18:13:04 hang on. lets see what lmacken says... since this was his ticket. 18:13:12 yes, yes :) 18:13:55 but I guess we can come back to it when he's around. :) 18:14:10 #link https://fedorahosted.org/fedora-infrastructure/ticket/1055 18:14:18 Fedora Search 18:14:24 ah, the saga of the search engine. ;) 18:14:39 22 months ago was suggestion that we should revisit that :) 18:14:44 so, the last one we tried was dpsearch... 18:15:10 * tflink is somewhat interested but has no cycles to spare 18:15:13 it got very slow, but it was indexing things very... badly 18:15:16 yeahhhhhh. 18:15:16 whoosh might be worth looking into 18:15:34 especially once they get support for more backends 18:15:54 emphasis on _might_, though. not sure it'd scale far enough 18:16:10 I think we looked at that one, but decided it needed too much frontend work? not sure. 18:16:19 yeah, whoosh only does indexing 18:16:32 no frontend, no crawling 18:16:36 https://fedoraproject.org/wiki/Infrastructure/Search 18:16:36 Elasticsearch? Did you tried?? 18:17:06 Maybe we should make some new assumtions? 18:17:10 we haven't. At the time we last tried it wasn't packaged... and I am not sure how thrilled I am with java, but perhaps it's doable these days 18:17:19 :D 18:17:41 Ok, I will take that and see what can be done. 18:18:03 so yeah, I think perhaps a new discussion on list/testing some options again.... 18:18:15 I'd be happy to work with someone on this project again, but I got somewhat discouraged last time because I couldn't find any good options and I was just burning a lot of cycles on essentially nothing because of that. 18:18:25 elasticsearch is also not in epel... it's fedora only I am pretty sure. 18:18:26 * roshi is also interested but doesn't have cycles 18:18:26 i think solr is the default answer for custom search engines nowadays 18:18:58 do we have a wiki dump somewhere? 18:19:14 nope, but we could look at making one. 18:19:23 tflink: I think mediawiki has a built-in thing to export everything. Not sure if we disable it though. 18:19:44 docs was one place that they really would like a search 18:19:54 but yes, the wiki default search is... horrible. 18:20:50 so, perhaps interested parties could talk outside meeting and try and move this forward and come back with suggestions? 18:20:56 ok, so maybe we could make a new Search Saga if there ar people interested in? 18:20:57 :) 18:20:57 are * 18:21:04 I'd say definitely reduce scope to start with to just docs and wiki 18:21:21 p_klos: yep. ;) post to list and gather interested folks and go from there. :) 18:21:30 docs for a single fedora release would probably be a good place to start 18:21:38 nirik: I will :) 18:21:39 i'm interested as well 18:21:52 great. ;) 18:21:58 tflink: yeah, that too. 18:22:02 #link https://fedorahosted.org/fedora-infrastructure/ticket/1421 18:22:03 define some sample queries that we should expect, some example results - to gauge efficacy 18:22:09 i could maybe spin up a vm and mirror the docs site with wget and try a few things 18:22:33 Ok, so the next 18:22:45 6 years since last significant activity ;) 18:22:53 yeah, I am not sure the status here. 18:23:03 we are no longer using mod_auth_pg anywhere. 18:23:22 but I do not know for sure if we are using https always. 18:23:49 me too. But we are working on FAS3. 18:24:13 Maybe we should better think how to configure it proporly from security pow? 18:24:14 I can investigate and update the ticket. 18:24:22 I think it likely already is. 18:24:22 nirik: thanks :) 18:24:26 just need to confirm that 18:24:50 #info nirik will update ticket after meeting 18:24:53 #link https://fedorahosted.org/fedora-infrastructure/ticket/1509 18:25:05 one more 6 years old 18:25:13 yeah, I do not know the status here. 18:25:16 nb: you around? 18:25:30 there's actually several of these tickets. 18:26:10 oh, I didn't do such intensive investigation to find them... 18:26:19 done * 18:26:36 I can ping nb out of band and get an update on these. 18:27:01 I know he was talking to people, but not sure where it was... 18:27:05 #link https://fedorahosted.org/fedora-infrastructure/ticket/1590 18:27:45 So the next. In my opinion it's good idea in that ticket 18:28:02 maybe we could come back to it? 18:28:25 so it doesn't look like this was ever applied... 18:28:42 Yes, It died 6 years ago 18:28:59 but the script changed in the mean time. 18:29:06 * p_klos thinks why there are so many "6" in todays tickes... 18:29:32 So forget that past work and make some new ;) 18:30:07 so, we need this patch rebased on the current script 18:30:50 https://git.fedorahosted.org/cgit/fedora-web.git/tree/fedorapeople.org/make-people-page.sh 18:31:26 updated ticket 18:31:52 cool... thanks for collecting these p_klos. :) great to get old stuff cleaned out. 18:32:02 #topic bodhi2 rollout retrospective - tflink 18:32:08 No problem :) There will be more next week ;) 18:32:20 p_klos: indeed. Until someday we finally finish them all. 18:33:01 nirik: or just will put them to the Fedora Museum :D 18:33:08 ok, so bodhi2 rolled out... 18:33:17 * tflink wanted to gather feedback for the bodhi2 rollout while the memories were still fresh 18:33:25 lmacken and threebean have been in massive bugfix mode since then 18:34:00 one thing I wonder: should we have announced it in stg before to gather some early feedbacks? 18:34:37 well, if we had lots of time yeah. 18:34:42 it's been a pretty rocky transition for QA. lmacken and threebean have been responsive on tickets but there were some late nights for kparal and I, attempting to prepare/fix things 18:34:48 true we were a little tight on time 18:35:08 we could have rolled it out slower... but the problem is that it had been so many years with no release we were diverging more and more from bodhi1 and spinning wheels trying to fix it. 18:35:29 sure, but 24 hours from first api client to production rollout? 18:36:08 at least we were all aware of the rollout (vs pkgdb2) so we are improving already :) 18:36:12 yeah, I had hoped the api would be done before then. ;( 18:36:16 that's been my primary complaint - there was no time for testing and little time for preparation 18:36:28 yeah, that was the roughest part. we shoulda/coulda allowed more time for python-fedora to land. 18:36:54 the data returned from python-fedora is not the same as in bodhi1 and in some cases, the meaning of settings has changed 18:37:05 tflink: once the api was there was stg ok to test with? or did you run into problems with it not being setup enough? 18:37:34 nirik: it helped some but it was too old to test blockerbugs with and didn't have enough data to catch all the taskotron issues we ran into with production 18:37:35 #info allow more time after a changed api is testable to deployment 18:37:55 #info try and have a more fully populated sig instance/setup for testing 18:37:55 we thought we'd finished the python-fedora preparation over the weekend but then discovered bugs on sunday (new release) and again on monday (new release) and again on tuesday (new release). wednesday was our scheduled date for cutover.. 18:38:49 yeah, we only had one date scheduled, which was the server cutover.. but we didn't have a scheduled time by which python-fedora had to be ready. and no policy or specific plan for bumping the server cutover if python-fedora was "slipping" 18:39:17 #info set dates for api testable, packages testing, packages stable _and_ server cutovers 18:39:30 the message I got was "there are tests for the client and folks using python-fedora won't have to change anything, so we're OK" 18:39:35 yeah, one date was probiblly poor there 18:40:02 tflink: that was the plan. but we should've allowed a month to verify that, instead of 24 hours to discover it wasn't true ;p 18:40:18 the other suggestion I have is to have some dialog with downstream when making big changes 18:40:45 so the next big release is FAS3 18:40:56 which means: announce it on devel@ when we push it to stg 18:41:29 threebean: then why was I dismissed and discounted when I had concerns about the timeline? 18:41:30 tflink: yeah. although do we have a good way to identify all the downstreams? of course you guys in this case... 18:41:49 pingou: yeah 18:41:51 yeah, reaching everyone would be close to impossible 18:41:59 Please note that even if we announce, and people have multiple weeks, that doesn't always mean they'll test it. For Ipsilon, I had given several weeks (I think even months?) notice that it was in stg, and only two days before going to prod, someone told me a pretty crucial part (the API) was plain broken 18:42:03 leave time to test the clients/consumers 18:42:06 in this case, f-e-k and f-g-k were probably going to break even with a months notice 18:42:28 I think they were broken before in bodhi1 (at least f-e-k was) 18:42:38 puiterwijk: but then at least we can tell them it's their fault :) 18:42:53 it is never their fault. 18:42:56 longer stg/announce time at least has a chance for more testing 18:43:01 pingou: fair enough. But they could've just followed the master branch, couldn't they? :) 18:43:12 ^^ 18:43:33 puiterwijk: that doesn't always work - half of testing against bodhi/koji is the data they contain and the coupling between various data sources 18:43:39 Anyway, was just saying, that you will always have people complain, but I do agree that more time then a day with the tools available might be useful 18:44:49 should we have something similar to the Fedora rollouts? If we are rolling out something big there needs to be a go/no go meeting before we commit to it and can have a way that people can safely say "We need to run this in testing for 1 week." 18:45:15 I don't know if the idea works for our kind of work.. 18:45:17 perhaps. release readyness meeting before rollout 18:45:20 smooge: might make sense for FAS3 18:45:28 tflink: why? it was a mistake, certainly. 18:46:12 tflink: note that my "followed the master branch" remark was just a joke, that was not meant to be serious 18:46:26 smooge: for big things, I think that might be a good idea, yeah 18:46:41 puiterwijk: yeah, but there are valid parts to it 18:47:19 might be good to try with fas3... 18:47:33 are there many downstreams with fas? 18:47:45 tflink: more than bodhi 18:47:46 centos maybe 18:47:49 not that it's the only thing - thinking outloud 18:47:58 well, not necessarily downstream instances. 18:47:59 rpmfusion 18:48:10 centos 18:48:12 it's unclear to me how much change there is in fas3... the api should mostly be the same... 18:48:16 but we have more services that integrate with FAS's API (like how taskotron is a downstream for bodhi's API) 18:48:23 nirik: hopefully improved :) 18:48:31 but yeah, lots of scripts to test, etc... 18:48:39 nirik: there are some. SmootherFrOgZ has submitted some python-fedora patches related to it. 18:48:46 I don't know how many of those live outside infrastructure. 18:48:48 threebean: yeah, that's the part i wasn't sure about - the only things I thought of were ipsilon and the pam stuff used in infra 18:48:52 nirik: I think that the API for fas3 is ENTIRELY different, and no backwards compatibility is included... 18:48:53 anyhow, back to bodhi2... 18:49:10 https://github.com/fedora-infra/python-fedora/compare/feature/fas3_support 18:49:26 we can/should talk about fas3 at some point... but not now? 18:49:26 tflink: (I bet it's an order of magnitude more than bodhi.. ;p) 18:49:36 * tflink learned something today 18:49:36 * threebean nods 18:49:40 * puiterwijk agrees 18:49:49 what else could we have done better with bodhi2? 18:50:02 I have one minor one: have moved to pagure.io before rollout. 18:50:12 some of our users are sensitive to github for whatever reason 18:50:13 at least mirrored 18:50:42 use case stories of how it is used. I think the "OMG ALL MY WORKFLOWS ARE BROKEN." was an underthread because things were different 18:50:52 #info consider being on pagure.io or fedorahosted before rollout 18:51:06 yeah. the same counter positions hold for this as they did for API testing. lmacken and I worked as much as we could before the rollout and decided we didn't have time to mess with a pagure move at that point. (fwiw) 18:51:21 yeah, understandable... 18:51:21 it definitely came up. 18:51:48 but there were good things, too - lmacken and threebean have been around a lot and responsive for questions/issues 18:52:01 it's sometimes been like pulling teeth to get some people to report issues. Not just due to github, but because people are lazy and would rather post to a list or on irc than file an issue. 18:52:10 it was going to be a somewhat painful migration, no matter when it happened 18:52:17 #info threebean and lmacken are rockstars. :) 18:52:52 :p 18:53:09 any other retrospective items? 18:53:57 nothing else from me 18:54:18 oh, minor one: 18:54:36 we should look at what impact a new app might have on other apps... like bodhi2 put more load on datagrepper... 18:54:44 lmacken++ 18:54:54 threebean++ 18:55:07 is bodhi2 what makes datagrepper swapping? 18:55:23 pingou: we think so... because the bodhi frontpage does a bunch of queries. 18:55:31 k 18:56:00 or... it could be people's profile pages? 18:56:13 or... did we setup robots.txt? wonder if it's crawlers. 18:56:22 anyhow, running low on time here... 18:56:27 crawler could be fun :D 18:56:30 #topic lockbox01 -> batcave01 migration - kevin 18:56:33 nirik: crawlers won't do queries against datagrepper. And the homepage is very light 18:56:36 * relrod can add looking at the datagrepper/bodhi interaction to his list, since most of that was my work I think. 18:56:49 I just wanted to note that I am working on a new system to replace lockbox01 18:56:53 relrod: I already suggested we should look at using statscache for that 18:57:02 puiterwijk: or that ;) 18:57:08 but I've had so many fires this week I haven't gotten as far as I would like. 18:57:14 nirik: looking forwawrd to ssh to the batcave :) 18:57:16 but I would really like to try and migrate before freeze 18:57:26 so, the change is coming. ;) 18:57:40 * nyazdani notes that the bodhi frontpage is a good use-case for statscache 18:57:42 * pingou would like to push umdl2 before freeze 18:57:53 nirik: ;do you need any help? 18:58:22 p_klos: sure. Look at the roles/batcave/tasks/main.yml for a list of things that need to be ported from puppet... many of them are just small scripts/crons. 18:58:40 I was hoping to migrate tomorrow, but likely now it will be next week sometime. 18:58:49 also we have a bunch of update/reboots next week. ;) 18:58:54 #topic Open Floor 18:59:00 ok, anything for open floor real quick? 18:59:04 * puiterwijk would like an update to Ipsilon, so I can get rid of some of the hotfixes, and get OpenID Connect. 18:59:19 nirik: I will, but can't make a promise that I'll do everything for tomorrow :( 18:59:20 * vk would like to introduce himself - was late for the meeting start 18:59:33 we have this week, next week, and freeze the week after. :) also day before freeze is a holiday. 18:59:51 p_klos: no problem, any you want to do just send me patches. ;) 19:00:07 vk: welcome. ;) are you more interested in sysadmin or application development, or both? 19:00:37 both, like 50% sysadmin, 40% development 19:01:09 10% doc writing? 19:01:11 :p 19:01:15 vk: and lasting 10%? :) 19:01:28 cool. Do see folks in #fedora-admin and #fedora-apps for ways to get started after the meeting. ;) 19:01:35 I'm going to send the patch for that fedora people ticket we just discussed (add people git repos to the page); also I saw at least 2 easyfixes I can do (rsync setup and migration from puppet to Ansible) 19:01:49 err, 60% sysadmin :) 19:01:54 vk: great! 19:02:25 vk: darkserver migration from puppet to ansible is quite done ;) I've attached v2 patch after nirik's review 19:02:33 also, I sent a "Hello World" email to the infrastructure list today, but for some reason it end up in moderation q (somehting to do with mailman 2> 3 migration I think) 19:02:42 p_klos: I saw, just haven't had a chance to look yet. 19:02:46 so would be good if someone can approve it, it has more info about me 19:02:50 vk: can do. 19:02:58 nirik: no problem ;) 19:03:19 p_klos: ack, there're more easyfixes - I'm sure I can find somehting else 19:03:25 nirik: thank you sir 19:03:31 no problem. ;) 19:03:41 if nothing else for open floor, will close out the meeting in a minute. 19:03:46 vk: sure! If not, we'll help you :) 19:04:05 vk: there are a lot of documentation to be written ;P 19:04:42 p_klos: well, writing docs is a challenge. Writing docs for something that was not written/implemented by you is challenge^2 19:05:08 * SmootherFrOgZ is here now if needed still 19:05:13 p_klos: but I'll take a note of it, I think I saw some tickets where I can be useful 19:05:40 vk: Doc's are important of coarse, but I was just joking ;) 19:05:51 p_klos: also, how do I get read-only acces to the Ansible repo? On the wiki I found that I can use bastion - but I don't have access there for obvious reasons 19:06:19 vk: we can add you to the apprentice group after the meeting. See me over there... 19:06:26 ok, thanks for coming everyone. 19:06:27 vk: ask nirik on #fedora-admin for adding you to fi-apprentice group after the meeeting 19:06:29 #endmeeting