18:00:24 #startmeeting Infrastructure (2017-07-27) 18:00:24 Meeting started Thu Jul 27 18:00:24 2017 UTC. The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:24 Useful Commands: #action #agreed #halp #info #idea #link #topic. 18:00:24 The meeting name has been set to 'infrastructure_(2017-07-27)' 18:00:24 #meetingname infrastructure 18:00:24 The meeting name has been set to 'infrastructure' 18:00:24 #topic aloha 18:00:24 #chair smooge relrod nirik abadger1999 dgilmore threebean pingou puiterwijk pbrobinson 18:00:24 Current chairs: abadger1999 dgilmore nirik pbrobinson pingou puiterwijk relrod smooge threebean 18:00:27 hey 18:00:27 Hello all 18:00:35 hi everyone 18:01:06 here 18:01:12 * doteast here 18:01:20 * pingou around but not really here 18:02:05 morning 18:02:39 #topic New folks introductions 18:02:47 Hi do we have any new folks here? 18:02:51 * cverna waves 18:04:04 ok looks like not 18:04:15 #topic announcements and information 18:04:15 #info PHX2 Colo Trip coming up, Aug 14-18 18:04:15 #info Major outage planned for Aug14->18 18:04:15 #info FLOCK at Cape Code Aug29->Sep01 18:04:16 #info Fedora F27 Rebuild 18:04:28 Any other announcements from people? 18:04:51 it's Cape Cod. 18:04:51 * nirik has nothing off hand. 18:05:15 clime, its in gobby you can fix it :) 18:05:27 ok 18:05:28 ah, but while we are there it will cape code. ;) so much coding to do 18:05:49 (yes, this is a joke, thats a typo) 18:06:20 fixed 18:06:33 :) 18:06:45 thanks :). I was leaving it that way until someone did so because I found my mistype funny 18:07:04 .hello bowlofeggs 18:07:04 nirik, how is the rebuild going? 18:07:04 bowlofeggs: bowlofeggs 'Randy Barlow' 18:07:11 .hello jcline 18:07:12 bowlofeggs: jcline 'Jeremy Cline' 18:07:18 I haven't seen much on it this round 18:07:57 it's in the 'r's commiting 18:08:29 probibly finish that later today... then just needs to finish up all the builds. 18:08:35 I saw a couple of weird errors on lists like arm32 running out of memory for no reason and some ppc items.. do we need much of a re-rebuild? 18:08:37 probibly take another day or so 18:08:50 ok cool 18:09:01 the pcc one we might rebuild failed things again for 18:09:06 (once it's fixed) 18:09:13 there was a pagure pkgs thing also this week. But I don't know the details on it 18:09:32 it's waiting on mass-rebuild to be done 18:10:01 but I just run into what is I think a bug in pkgdb, which is going to be required to be fixed for the migration to work properly 18:10:07 * pingou will investigate 18:10:28 https://bugzilla.redhat.com/show_bug.cgi?id=1475636 is the ppc binutils bug 18:11:38 ok thanks 18:11:54 ok next up 18:11:57 #topic (2017-07-27) Service Level Expectations (SLE) 18:11:57 #info What are SLE's? 18:11:57 #info Why do we need them? 18:11:57 #info Who sets them? 18:11:57 #info How are they followed? 18:11:58 #info Where do they affect things? 18:12:00 #info When do we put them in place? 18:12:02 #info https://pagure.io/fedora-infrastructure/issue/6140 18:12:04 #info https://fedoraproject.org/wiki/Infrastructure/ServiceLevelExpectations 18:12:06 #info https://confluence.cornell.edu/display/itsmp/Service+Level+Expectations 18:12:18 yeah, not too much discussion on list on this... 18:12:22 Hi nirik I put in questions just to frame a possible discussion 18:12:25 I guess I will try and work on it some more. 18:12:36 I can answer some of those questions. :) 18:13:03 SLE is a service level expectation... it's letting users/consumers of a service know what kind of service to expect. 18:13:55 ie, if the service was down at 3am, would people wake to fix it? would it be fixed the next morning? if on a weekend or holiday would it be fixed the next business day? 18:14:24 we can also use these with other projects we work with... 18:14:53 ie, say we use centos ci and something is broken on our side of that, what expectations do they have for us to fix things when, etc. 18:15:14 I set out some broad outline on the wiki page using domains... 18:15:25 but there's more to fill in, which I can work on this coming week 18:15:41 we will need/want to redo our status app to reflect this 18:16:03 and possibly outage pages from apps, etc. 18:16:27 any further thoughts on this? 18:17:50 so we could key items to nagios alerts 18:18:03 that also. 18:18:28 have all continue to go to irc... but pages only for some things at some time periods 18:18:29 with 24x7x52 vs 8x5x50 vs meh you got the bits didnt ya 18:19:21 yeah agreed 18:19:41 so there would definitely be some work to adjust to this, but in the end I hope it would be nicer for our users to know and us to not have to treat everything as urgent all the time 18:21:35 we might also at the same time work on the consistent app naming setup... 18:21:36 agreed there too 18:21:48 now lets not go overboard 18:22:04 https://pagure.io/fedora-infrastructure/issue/5644 is that ticket 18:22:42 sorry, SLE no SLA right? 18:22:49 correct 18:23:08 an agreement requires two entities 18:23:15 and money and signing things 18:23:21 I see ;) 18:23:34 There's no such relationship between us and our community... 18:23:45 * doteast kind like the E more than the A 18:24:14 even in outside of community context 18:24:27 if we fail to meet expectations, then thats something we would revist when it happened... and depending on why adjust the expectation or add more resources or something 18:24:44 sweet 18:24:54 Knowing some in our community.. we will always fail their expectations. 18:24:57 and that is ok 18:25:19 ah, you can win them all, even in business settings I guess 18:25:33 sure... but these are our expectations. 18:25:58 if we say app X will be addressed in 4 hours, but we don't... we need to adjust that or make sure we notify people harder or whatever. 18:26:20 ah that goes to #info Who sets them? 18:26:21 * doteast nods 18:26:24 probibly when apps are in their RFE process we can ask what expectations should be... 18:27:00 you mean RFR? 18:27:04 "this is a fedoracommunity app we are just playing around with" -> no monitoring, whoever runs it responsible, no SLE at all. 18:27:11 yeah, RFR, sorry. ;) 18:27:20 request for resources 18:27:49 "this is mirror lists for end users" -> monitored, will be acked in 15min, will be fixed asap, all hands on deck until it's working again. 18:28:47 .thisdoesntwork would be a good domain for apps in RFR ;) 18:28:59 can we get a 1 minute SLE on puiterwijk? ☺ 18:29:04 no 18:29:07 haha 18:29:29 he requires same reverse SLE 18:29:32 jcline pointed out that first we would need monitoring on puiterwijk 18:29:46 and then monitoring on us 18:29:54 since he's not here we can assign him that. ;) 18:30:04 who monitors the monitors? 18:30:05 but who monitors the monitoring? 18:30:07 hahaha 18:30:07 so much more on this? 18:30:25 or time to move onto clime's topic? 18:30:28 i'm a +1 to definiing clear SLEs 18:30:45 * nirik has nothing else on this unless theres questions. 18:30:57 cool far fetched idea: the page that shows when an app is down could also state that app's SLE 18:31:01 I'll try and add to it and get another round of review 18:31:09 (i think the proxy does that?) 18:31:14 bowlofeggs: yeah, we would want to adjus our status page for that 18:31:25 and yeah, there's a outage html page for down apps 18:31:27 i think that might be slightly different 18:31:34 #topic possible future Fedora Infrastructure support for COPR - clime 18:31:37 yeah i mean the page you see when you go to bodhi 18:31:37 yeah, sorry, but both should be adjusted. 18:32:15 right, so we will have COPR solely from Fedora packages soon 18:32:21 like this week 18:32:25 I hope 18:32:43 clime: what does it mean for them to be solely from fedora packages? 18:32:56 like, the install of copr itself? 18:33:01 or it hosts only fedora packages? 18:33:02 bowlofeggs: currently we deploy COPR from @copr/copr 18:33:05 ah i see 18:33:07 cool 18:33:15 which is good for hotfixes 18:33:26 cool. 18:33:37 we have an infra repo that we can use for hotfixes like that too 18:33:43 but one of the conditions for getting support of FI was to install everything from Fedora 18:33:52 oh that would be cool 18:34:05 I would really like to have an option to hotfix things quickly :) 18:34:22 but anyway I am happy that we are ready 18:34:44 the reason we have the koji tag is that the koji builds are immutable once done, etc. 18:35:17 right, so I would like to ask what next progress should be 18:35:31 but I guess I will need to discuss it with puiterwijk 18:35:39 once he is here 18:35:43 so, looking back at the RFR ticket... did we ever finish discussing/deciding what parts we want of copr to be in cloud and what parts outside? 18:35:58 actually I would like to move to OpenShift 18:36:08 hopefully our cloud will be improved a lot soon... 18:36:19 not sure if it is related or not but It would be cool 18:36:33 so dev instance first 18:36:34 clime: so, builds are done in build containers (like images are done in openshift)? 18:36:49 yup 18:36:57 nirik: that would be cool 18:37:17 I think that could be interesting... but not sure how much work it might be. it changes a lot of what copr does... 18:37:22 well I guess there will be some work included 18:38:23 but that is just the builder part right? the frontend/backend/keyserver/distgit would all be pretty much the same? 18:38:38 nirik: yes, exactly 18:38:49 nirik: unless they would also need to run in pods 18:39:16 well, we currently have no good story for persistent storage in openshift... and some of those need a lot of that 18:39:17 but that wouldn't normally affects things... 18:39:35 I see. 18:39:55 maybe I could cooperate on investigating this 18:40:04 wasn't there an idea to go for glusterfs for that.... 18:40:09 all the options are kinda poor for us... 18:40:20 * doteast is hazy on those matters 18:40:22 but I no pretty much none so far 18:40:26 *know 18:40:28 yeah, we could use glusterfs... 18:40:42 but we have had problems with performance in the past.... 18:41:11 and folks who run openshift in production don't use it, so hard to say how effective it will be 18:41:49 i'm excited about openshift coming to an infra near me 18:41:50 we could use nfs, but it requires us to manually make volumes to be used 18:42:05 anyhow... 18:42:09 nfs can also have problems with some apps, depending on what they do 18:42:43 so, in the next month or so we should get new cloud hardware and install RHOSP 10+ with HA. 18:43:05 we can move copr as it is now to that and get it much better supported hopefully. 18:43:21 that would be cool 18:43:23 I think a way to use openshift for builders at least (or the entire thing) would be pretty cool tho down the road 18:43:49 Can I talk about this to somebody? 18:44:00 relrod maybe 18:44:01 which part? :) 18:44:09 builders in openshift 18:44:36 and the first one as well 18:44:37 I think we are all feeling our way on openshift right now, perhaps the list would be good or just admin irc channel? 18:44:52 I would say the list would be good to capture it 18:45:05 alright 18:45:07 Yeah I'm not the best person to talk to about it just yet. Still learning the ropes. 18:46:04 That's everything I have got. 18:46:10 what about nearer future? once copr is installable from fedora rpms, can at least the frontend be made "supported" service (so that other apps can depend on it)? 18:46:17 https://docs.openshift.com/container-platform/3.4/creating_images/custom.html#creating-images-custom may be of note 18:46:29 mizdebsk: once it moves to the new openstack yes. 18:46:40 the current one is not HA, has issues 18:47:07 it is working pretty well in the end 18:47:11 i meant having frontend on kvm virthosts, with backend/builders still in cloud 18:47:12 but yeah 18:47:55 could be better for sure 18:48:02 we could do that. (see above where I mention we haven't decided that yet) 18:48:31 I think frontend would be easy to move out if we wanted. 18:48:51 backend makes more sense to stay in cloud IMHO. 18:49:02 OK we are coming up to the top of the hour and I would like to do the apprentice questions before we end 18:49:09 (it has storage there, it also doesn't need public ips to talk to builders) 18:49:22 so, anything else please bring up on list. ;) 18:49:29 okay 18:49:33 thank you. 18:49:58 I think a plan to finish making this 'supported' will be part of the list thread 18:50:04 ok next up 18:50:15 #topic Apprentice Open office hours 18:50:24 Hi any apprentices or new people with questions? 18:51:04 no questions, just a note - please let me/us know if there is more to help out with. 18:51:12 I'm new to both the project and the OS recently 18:51:19 hello ole88 18:53:58 as a new person, are you interested in sysadmin or coding? 18:54:15 bgray, yes.. we do need to put some more or clear up the easyfix 18:54:48 smooge: thanks! 18:55:00 bgray, have you looked at our docs? 18:55:21 I am a sysadmin at work and we are moving toward RHEL, but run Windows 18:55:31 those are probably the place we need the most work on these days even if it is a "hey this doc is from 1984.. does this still work?) 18:55:44 cool ole88 that can be a big leap 18:55:47 I am also learning python in my spare time 18:55:49 smooge: i have. they are very good! 18:56:07 I have prior experince (years ago) with c, c++ and cobol 18:56:11 Theres a issue about updating our mirrormanager docs... that could be a nice one for someone new. 18:56:16 bgray, well I am going to put in an easy fix ticket for them to get get updated/reviewed 18:56:16 smooge: ah, so outdated in places? 18:56:36 bgray, if the data at the top is over a year old they are probably outdated 18:56:39 I also use PowerShell and dotnetcore on my Fedora laptop 18:56:45 smooge: :) 18:57:04 ole88, that is a level of work I have not been able to get working yet. 18:57:48 I can help out there 18:57:51 cool 18:57:58 well we are at the top of the hour 18:58:07 #topic Open Floor 18:58:19 I would like to thank eveyrone for coming to the meeting today 18:58:26 thanks smooge 18:58:36 thanks everyone 18:58:36 thanks for running it, smooge! 18:58:39 thank you, smooge 18:58:48 we will have another meeting in a week and we will also have a mailing list for people to talk on 18:58:51 see you later 18:58:54 #endmeeting