18:00:12 <smooge> #startmeeting Infrastructure (2017-04-27) 18:00:12 <zodbot> Meeting started Thu Apr 27 18:00:12 2017 UTC. The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:12 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic. 18:00:12 <zodbot> The meeting name has been set to 'infrastructure_(2017-04-27)' 18:00:12 <smooge> #meetingname infrastructure 18:00:13 <zodbot> The meeting name has been set to 'infrastructure' 18:00:13 <smooge> #topic aloha 18:00:13 <smooge> #chair smooge relrod nirik abadger1999 dgilmore threebean pingou puiterwijk pbrobinson 18:00:13 <zodbot> Current chairs: abadger1999 dgilmore nirik pbrobinson pingou puiterwijk relrod smooge threebean 18:00:16 <smooge> hello all 18:00:20 <marc84> hi 18:00:38 <bt0> hi 18:00:40 <nirik> morning everyone. 18:01:55 <smooge> #topic New folks introductions 18:02:06 <smooge> Hello are there any new people this week? 18:02:16 * cverna is around 18:03:34 <nirik> gnu people? ;) 18:04:02 <smooge> ok looks like I should have sent that email out earlier :) 18:04:15 <smooge> #topic announcements and information 18:04:16 <smooge> #info beta freeze will start 2017-05-17 18:04:16 <smooge> #info infra hackfest in RDU 2017-05-08 to 2017-05-12 - everyone 18:04:16 <smooge> #info Fedora Infrastructure weathered large outage last Friday. Good job everyone 18:04:16 <smooge> #info mass update/reboot cycle next week (2017-05-02/03) - everyone 18:04:17 <smooge> #info bodhi 2.6.0 released. A few issues so look for 2.6.1 soon - bowlofeggs 18:04:18 <smooge> #info Moved production resultsdb database to separate machine to help with performance issues - tflink 18:04:32 <smooge> Looks like this week was mostly recovery and next week will be mostly reboots 18:04:42 <smooge> The week after that will be mostly meetings 18:04:51 <smooge> Finally we will have a freeze 18:05:38 <smooge> Any other items to put in the old stuff done list? 18:05:48 <smooge> If not it will be time to hand it over to nirik 18:05:57 <smooge> #topic Apprentice work day scheduling/topic - kevin 18:06:15 <nirik> yeah, so I was thinking we should schedule a apprentice workday again... 18:06:20 <nirik> and come up with a topic 18:06:32 <nirik> The last one we did was docs... we could do that again, or something else. 18:06:53 <nirik> Possibly we could look at all our apps and triage issues? 18:07:08 <nirik> or clean up ansible playbooks from ansible lint output? 18:07:14 <nirik> or ...your idea here. 18:07:33 <Skeer> some ansible sounds nice 18:07:39 <capitanocrunch> good to know...maybe that moving stage IP easyfix? 18:07:51 <Skeer> Any python related things that might qualify in the easyfix realm? 18:08:05 <nirik> As for time, perhaps week of the 22nd? we would be in freeze then... 18:08:11 <nirik> capitanocrunch: good thought yeah... 18:08:23 <nirik> Skeer: probibly tons on various apps. 18:08:43 <nirik> I can open a thread on it on the list to get more folks input... 18:08:53 <Skeer> If someone has time to laser focus a few I'd be willing to dive in 18:09:23 <capitanocrunch> +1 for thread on the list 18:09:28 <bowlofeggs> .hello bowlofeggs 18:09:29 <marc84> +1 18:09:29 <zodbot> bowlofeggs: bowlofeggs 'Randy Barlow' <randy@electronsweatshop.com> 18:09:32 <bowlofeggs> more like bowlofbrokebodhi 18:09:33 <nirik> we could also pick some poor neglected app and try and fix it up... askbot leaps to mind. ;) 18:09:50 <nirik> or packages could use love. 18:10:22 <nirik> how does the week of the 22nd sound for everyone? too soon? ok? bad time for some other reason? 18:10:41 <marc84> +1 for 22nd 18:10:46 <Skeer> +1 18:10:46 <bowlofeggs> Skeer: bodhi has some easyfix bugs and it's python 18:10:55 <cverna> 22nd sounds good 18:11:10 <Skeer> bowlofeggs: I'll head that way and ping you in I get lost ;) 18:11:17 <smooge> askbot good old askbot 18:11:38 <nirik> perhaps actually the 24th... (wed) 18:11:38 <smooge> 22nd sounds good 18:11:47 <smooge> or the 24th 18:11:54 <nirik> that way we are over the mondays... 18:11:56 <bt0> +1 for 22nd 18:12:29 <nirik> or actually, we last time did things over a week didn't we? 18:12:37 <nirik> not just a day? will have to look 18:13:12 <nirik> anyhow, thats all on this. I will post to the list and we can figure out a topic and such details. :) 18:13:28 <Skeer> sounds good 18:14:39 <smooge> #topic BDR (bidirectional Data Replication) in postgres - kevin 18:14:45 <smooge> back to you Kevin 18:14:58 <nirik> so I wanted to talk about this a bit... but not sure we have all our apps folks around... 18:15:13 <nirik> we could just wait and talk about it at the hackfest I suppose. 18:15:25 <nirik> basically: 18:16:01 <nirik> I want 2 things we don't currently have: replication (in case of disaster) and high availability (in cases where we reboot servers to apply updates, etc 18:16:17 <nirik> there are a number of ways to do this in the postgres world. 18:16:41 <nirik> There's pgpool which is a proxy... your apps talk to it, and it talks to postgres servers on the backend. 18:17:09 <nirik> The problem with that is that it doesn't understand the full set of sql and can get confused. Also, it's another single point. 18:17:27 <nirik> There's postgres'es native master/replicate stuff. 18:17:55 <nirik> But it requires you to do various things when you promote and when you demote/readd a old spare 18:18:24 <nirik> BDR does limit you on what you can do somewhat... but it makes things super easy otherwise. 18:18:35 <nirik> You can reboot any of the nodes and they resync when they come back up 18:18:50 <nirik> You don't have to do anything weird or arcane for that. 18:19:02 <nirik> But I do understand the sql limits are anoying on the app side. 18:19:13 <nirik> So, thats it in a nutshell. :) 18:19:25 <smooge> sql limits are god's way of saying slow down 18:19:41 <puiterwijk> Well, I think the sql limits are reasonable, honestly. One is "have a primary key in all tables", which with most things already happens by default 18:19:44 <nirik> I'll try and make sure we have everyone ok with it before I deploy things. 18:19:51 <puiterwijk> (for DBR) 18:20:14 <nirik> it would help a lot if sqlalchemy could do the right thing on updates. 18:20:21 <puiterwijk> The other is no things like "CREATE TABLE ... AS SELECT ...", which you can just split into two things 18:20:47 <puiterwijk> Well, that's an alembic thing. And the alembic "bug" is easy to fix, and only needs to be done once per application 18:20:55 <smooge> both of which I think go with "slow down" 18:20:57 <puiterwijk> (but yes, we should gt that added upstream) 18:21:13 <puiterwijk> smooge: I tend to disagree. They're both things that don't happen very often 18:21:51 <smooge> puiterwijk, I think we are violently agreeing. If an app HAS to do those things for some reason, it needs to slow down 18:22:05 <nirik> note also that they are trying to get all merged into postgres and should be in there hopefully in 10. 18:22:13 <puiterwijk> Ah, right 18:22:22 <smooge> door bell brb 18:22:29 <praiskup> fwiw, I'm not sure whether it is not too early for BDR, and ... if you need just "backup", you won't profit probably from BDR 18:23:04 <nirik> as I hopefully said, we don't need just backup 18:23:18 <nirik> we want HA without a bunch of manual steps 18:23:35 <nirik> and we have been running a bunch of things in stg with it for a while. ;) 18:23:40 <praiskup> ok, that's still not BDR, BDR == master <-> master 18:24:24 <nirik> yep. 18:24:48 <nirik> it is master/master, but we aren't pointing traffic to both masters. 18:25:15 <puiterwijk> nirik: which we totally could do, and maybe should :) 18:25:44 <nirik> we could... except then we would need to make sure our apps could reconnect cleanly... 18:26:09 <nirik> if we have masterA and masterB and spread load, and reboot masterA everything connected to it would need to reconnect to masterB 18:26:27 <nirik> and most of our apps seem poor at reconnects 18:27:14 <nirik> anyhow, from the sysadmin side this is a big win, IMHO, but I am happy to discuss further with anyone with reservations. ;) 18:27:17 <nirik> thats all I had on this. 18:28:52 <nirik> smooge: ... back to you? 18:28:53 <smooge> ok thanks 18:29:16 <smooge> On the i95 freeway we have a major backup 18:29:18 <smooge> #topic Apprentice Open office hours 18:29:27 <smooge> Hello Apprentices 18:29:58 <bt0> Hello 18:30:05 <Skeer> hi 18:30:36 <smooge> any open issues needing help with? 18:31:02 <nb> .hello nb 18:31:04 <zodbot> nb: nb 'Nick Bebout' <nb@nb.zone> 18:31:23 <nirik> Oh, I did some digging on an interesting old ticket... without solving it really. 18:31:38 <nirik> if someone wants to continue digging on it, might be a nice bit of fun. 18:31:41 <smooge> what was the ticket 18:31:43 * nirik looks 18:31:54 <Skeer> I need some pointers on teh jenkiniss cleanup tcket 18:32:06 <nirik> https://pagure.io/fedora-infrastructure/issue/4211 18:32:18 <Skeer> https://pagure.io/fedora-infrastructure/issue/6003 18:32:40 <nirik> Skeer: for that, it just needs a normal logrotate config file... which we could add in ansible, but shold really be added to the package 18:33:44 <Skeer> For teh cleanup? 18:33:54 <smooge> Skeer, I would write the logrotate config file 18:34:08 <nirik> oh, sorry, thinking of the wrong ticket here. 18:34:24 <Skeer> lol 18:34:24 <smooge> Then put in a patch for our ansible for the time being. 18:35:00 <nirik> https://pagure.io/fedora-infrastructure/issue/6010 18:35:43 <smooge> Skeer, I would say if they have not contacted you within a reasonable time they can be killed 18:36:04 <smooge> They can get back when they show they are actually interested 18:36:20 <smooge> but I have a sinus headache 18:36:27 * nirik nods. agree 18:36:30 <bt0> some background about https://pagure.io/fedora-infrastructure/issue/5989 18:36:40 <bt0> i look fine for me 18:37:47 <Skeer> smooge: Gotcha.. oko well I'll give it until tomorrow. Then update the ticket. 18:38:30 <smooge> Skeer, cool 18:38:47 <smooge> bt0, what do you need on that ticket? 18:38:54 <nirik> bt0: ah yeah, I can add more background there... 18:39:02 * nirik sees that it's kinda vuage. 18:39:06 <Skeer> bowlofeggs: Youo've likely guessed but Im totally lost on how to proceed on ticket https://pagure.io/fedora-infrastructure/issue/5932 18:39:56 <bt0> nirik, yeah please 18:42:00 <smooge> Skeer, for that one I would look at the two template files and see what really is different from them 18:42:32 <smooge> if possible I would syntax it as {% if env == 'stg' ... or whatever is correct text 18:42:41 <Skeer> Im hung up on not knowing the correct formatting to google for help. 18:42:55 <Skeer> Like what lang are thos ein? 18:42:58 <smooge> jinja 18:43:19 <smooge> I believe 18:43:23 <Skeer> Thats what I thought.. I found almost nothing IIRC 18:43:39 <smooge> jinja2 ansible is where I usually start my looking 18:43:51 <smooge> then I go throuhg the existing templates and pull out the logic I need :) 18:44:11 <Skeer> Gotcha.. I need to look for existing, working templates 18:44:29 <Skeer> Jinja2 by chance? 18:44:55 <Skeer> "friendly templating language for Python" 18:46:15 <nirik> Skeer: the haproxy one mgith be good to look at as an example 18:46:23 <Skeer> Im dragging the meeting out.. I'll go spelunking some more and see what I can find on that. 18:46:26 <bt0> Thanks nirik 18:46:37 <Skeer> nirik: noted :) 18:46:38 <smooge> ok thanks guys 18:46:42 <smooge> #topic Open Floor 18:46:50 <nirik> https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/haproxy/templates/haproxy.cfg 18:46:54 <mizdebsk> sorry, but i missed discussion about postgres BDR, could we still come back to this topic for a few minutes? or should just post to the mailing list? 18:47:00 <nirik> bt0: added a comment, hope it made sense. ;) 18:47:29 <nirik> mizdebsk: feel free to add or post to the list or whatever. I'm sure there's going to be more discussion. ;) 18:47:30 <smooge> mizdebsk, it is open floor so if you want to do so for 4 minutes 18:47:46 <mizdebsk> first, i quite dislike the situation we currently have with postgresql servers - staging has much different setup from production, which defeats the purpose of staging environment... 18:47:52 <mizdebsk> something may work in stg but will fail after moving to prod (we already hit this with koschei) 18:48:10 <mizdebsk> second, i have a feeling that BDR is not mature and painful to use 18:48:19 <nirik> well, yes, but I hope to fix that by rolling BDR to prod. ;) 18:48:25 <bowlofeggs> i missed the BDR topic due to some side convos i got pulled in to 18:48:26 <mizdebsk> most importantly from my pov, it does not support some features that i would like to use (such as partial unique index, or materialized view) 18:48:33 <bowlofeggs> i do have other concerns though 18:48:49 <mizdebsk> some apps are not as critical as others - the world won't end if they are not available for a few hours and they can even withstand data loss of a few days, eg. after restoring db from a bit older backup 18:48:56 <bowlofeggs> for example, BDR can get you into a deadlock situation that only human intervention can resolve 18:49:02 <mizdebsk> so i have an idea: what about having different db server for less critical apps? with no BDR and lower HA expectations 18:49:09 <nirik> sure. a dead postgres server can do 18:49:11 <nirik> that also 18:49:13 <mizdebsk> it could also run on fedora instead of rhel, to allow use of newer postgres features 18:49:25 <bowlofeggs> nirik: i mean a data deaklock 18:49:26 <nirik> mizdebsk: thats a thought indeed... 18:49:46 <bowlofeggs> like if a write is accepted by A, A goes down, B takes over and accepts a conflicting write 18:49:53 <bowlofeggs> that can't happen without BDR and it can happen with BDR 18:49:59 <nirik> many of our apps are very simple and don't really need vast features tho 18:50:06 <puiterwijk> bowlofeggs: it can happen with a master-slave HA postgres as well 18:50:15 <bowlofeggs> the slave is read only 18:50:17 <nirik> bowlofeggs: sure, but life is tradeoffs. 18:50:23 <mizdebsk> most of apps are tested with sqlite :) 18:50:24 <bowlofeggs> puiterwijk: i'm advocating to let bodhi be non-HA actually 18:50:25 <puiterwijk> bowlofeggs: not after it's promoted to master because the master is down 18:50:30 <puiterwijk> bowlofeggs: I'm not. 18:50:38 <bowlofeggs> i dont' think we should promote the slave 18:50:39 <puiterwijk> I'd say that bodhi is one of the mission critical apps actually 18:50:45 <bowlofeggs> unless we never bring the dead master back 18:50:56 <bowlofeggs> it's not user facing, it's devloper facing 18:51:09 <nirik> users test and add karma and comments? 18:51:17 <bowlofeggs> and it rarely has an actual outage (it has severe bugs, like right now, but rarely an actual outage) 18:51:17 <puiterwijk> bowlofeggs: fedora-easy-karma? 18:51:51 <bowlofeggs> even fedora-easy-karma i wouldn't consider mission critical 18:52:25 <puiterwijk> I'd call updates mashing/ pushing mission critical. 18:52:31 <nirik> I was dreaming of a world where we would no longer need scheduled outages for updates. 18:52:43 <bowlofeggs> my opinion: i'd rather not trade data safety for bodhi to get HA, when HA isn't a frequent problem for bodhi in the firs tplace 18:52:56 <smooge> bowlofeggs, I think the problem is you are wanting it not to be mission critical and we are being told by outside forces it is mission critical 18:53:26 <bowlofeggs> smooge: who is saying that bodhi is mission critical? obviously, it's not my call - this is just my opinion 18:53:33 <bowlofeggs> but a deadlock will bring it down too 18:53:54 * nirik notes we have seen... none of those (without human error) in stg 18:53:58 <bowlofeggs> so it's still not perfectly HA 18:54:04 <bowlofeggs> nirik: bodhi doesn't work in stg at all 18:54:26 <bowlofeggs> also, i saw another ocmment earlier that wasn't true about fk's 18:54:33 <nirik> sure, but I am just stating a data point 18:54:35 <puiterwijk> bowlofeggs: except it does as I said the other day, just not your account but that's because you have tables without primary key 18:54:47 <nirik> you are making it sound like it hits deadlocks all the time 18:54:48 <bowlofeggs> it's extrememly common for applications to have tables without fk's because that's how m2m relationships are most commonly done 18:54:57 <bowlofeggs> all of bodhi's non-fk tables are m2m tables 18:55:45 <bowlofeggs> nirik: i'm not saying it happens all the time, but i am saying tat bodhi doesn't really have true downtime today and i'd rather have data safety than HA, if it were up to me (not saying it *is* up to me) 18:56:06 <mizdebsk> koschei definitely doesn't need to be HA, so ideally i would like to move it (koschei) to fedora-based, non-bdr postgres; if db server is specific to koschei then we (sysadmin-koschei) can take care of its maintenance 18:56:08 <bowlofeggs> so i feel like i'm giving up a lot and not getting something i need in the trade 18:56:10 <puiterwijk> bowlofeggs: then you can (and probably should!) still add a primary key on the combination of the two tables - tada, a primary key that also gets you data safety 18:56:13 <nirik> well, I don't want to force things on people. :) I like reaching a consensus. ;) 18:56:38 <bowlofeggs> puiterwijk: BDR means no data safety 18:56:44 <smooge> I think we aren't going to reach it right here 18:56:53 <smooge> so let us move this to the list 18:56:59 <bowlofeggs> the pk doesn't solve the data safety problem, it's just a requirement for bdr 18:57:04 <bowlofeggs> smooge: ok 18:57:06 <puiterwijk> bowlofeggs: I disagree on that 18:57:10 * nirik does too 18:57:16 <bowlofeggs> puiterwijk: their docs say this, not me 18:57:23 <bowlofeggs> the deadlocks are documented 18:57:31 <bowlofeggs> but yes, let's talk more at the fad 18:57:35 <puiterwijk> bowlofeggs: also: the problem is that *you* aren't the one doing the database server maintenance, and needing to suddenly tell everyone everything is down for 15 minutes because you need to increase a disk 18:57:51 <bowlofeggs> again, i'm not sayign it's my call, but my opinion is that bodhi is better off as is,t hat's all 18:57:51 <smooge> I am closing this down.. you can argue more at the fad 18:58:01 <bowlofeggs> puiterwijk: fair 18:58:14 <smooge> #endmeeting