#fedora-meeting log

18:00:12 <smooge> #startmeeting Infrastructure (2017-04-27)
18:00:12 <zodbot> Meeting started Thu Apr 27 18:00:12 2017 UTC.  The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:00:12 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
18:00:12 <zodbot> The meeting name has been set to 'infrastructure_(2017-04-27)'
18:00:12 <smooge> #meetingname infrastructure
18:00:13 <zodbot> The meeting name has been set to 'infrastructure'
18:00:13 <smooge> #topic aloha
18:00:13 <smooge> #chair smooge relrod nirik abadger1999 dgilmore threebean pingou puiterwijk pbrobinson
18:00:13 <zodbot> Current chairs: abadger1999 dgilmore nirik pbrobinson pingou puiterwijk relrod smooge threebean
18:00:16 <smooge> hello all
18:00:20 <marc84> hi
18:00:38 <bt0> hi
18:00:40 <nirik> morning everyone.
18:01:55 <smooge> #topic New folks introductions
18:02:06 <smooge> Hello are there any new people this week?
18:02:16 * cverna is around
18:03:34 <nirik> gnu people? ;)
18:04:02 <smooge> ok looks like I should have sent that email out earlier :)
18:04:15 <smooge> #topic announcements and information
18:04:16 <smooge> #info beta freeze will start 2017-05-17
18:04:16 <smooge> #info infra hackfest in RDU 2017-05-08 to 2017-05-12 - everyone
18:04:16 <smooge> #info Fedora Infrastructure weathered large outage last Friday. Good job everyone
18:04:16 <smooge> #info mass update/reboot cycle next week (2017-05-02/03) - everyone
18:04:17 <smooge> #info bodhi 2.6.0 released. A few issues so look for 2.6.1 soon - bowlofeggs
18:04:18 <smooge> #info Moved production resultsdb database to separate machine to help with performance issues - tflink
18:04:32 <smooge> Looks like this week was mostly recovery and next week will be mostly reboots
18:04:42 <smooge> The week after that will be mostly meetings
18:04:51 <smooge> Finally we will have a freeze
18:05:38 <smooge> Any other items to put in the old stuff done list?
18:05:48 <smooge> If not it will be time to hand it over to nirik
18:05:57 <smooge> #topic Apprentice work day scheduling/topic - kevin
18:06:15 <nirik> yeah, so I was thinking we should schedule a apprentice workday again...
18:06:20 <nirik> and come up with a topic
18:06:32 <nirik> The last one we did was docs... we could do that again, or something else.
18:06:53 <nirik> Possibly we could look at all our apps and triage issues?
18:07:08 <nirik> or clean up ansible playbooks from ansible lint output?
18:07:14 <nirik> or ...your idea here.
18:07:33 <Skeer> some ansible sounds nice
18:07:39 <capitanocrunch> good to know...maybe that moving stage IP easyfix?
18:07:51 <Skeer> Any python related things that might qualify in the easyfix realm?
18:08:05 <nirik> As for time, perhaps week of the 22nd? we would be in freeze then...
18:08:11 <nirik> capitanocrunch: good thought yeah...
18:08:23 <nirik> Skeer: probibly tons on various apps.
18:08:43 <nirik> I can open a thread on it on the list to get more folks input...
18:08:53 <Skeer> If someone has time to laser focus a few I'd be willing to dive in
18:09:23 <capitanocrunch> +1 for thread on the list
18:09:28 <bowlofeggs> .hello bowlofeggs
18:09:29 <marc84> +1
18:09:29 <zodbot> bowlofeggs: bowlofeggs 'Randy Barlow' <randy@electronsweatshop.com>
18:09:32 <bowlofeggs> more like bowlofbrokebodhi
18:09:33 <nirik> we could also pick some poor neglected app and try and fix it up... askbot leaps to mind. ;)
18:09:50 <nirik> or packages could use love.
18:10:22 <nirik> how does the week of the 22nd sound for everyone? too soon? ok? bad time for some other reason?
18:10:41 <marc84> +1 for 22nd
18:10:46 <Skeer> +1
18:10:46 <bowlofeggs> Skeer: bodhi has some easyfix bugs and it's python
18:10:55 <cverna> 22nd sounds good
18:11:10 <Skeer> bowlofeggs: I'll head that way and ping you in I get lost ;)
18:11:17 <smooge> askbot good old askbot
18:11:38 <nirik> perhaps actually the 24th... (wed)
18:11:38 <smooge> 22nd sounds good
18:11:47 <smooge> or the 24th
18:11:54 <nirik> that way we are over the mondays...
18:11:56 <bt0> +1 for 22nd
18:12:29 <nirik> or actually, we last time did things over a week didn't we?
18:12:37 <nirik> not just a day? will have to look
18:13:12 <nirik> anyhow, thats all on this. I will post to the list and we can figure out a topic and such details. :)
18:13:28 <Skeer> sounds good
18:14:39 <smooge> #topic BDR (bidirectional Data Replication) in postgres - kevin
18:14:45 <smooge> back to you Kevin
18:14:58 <nirik> so I wanted to talk about this a bit... but not sure we have all our apps folks around...
18:15:13 <nirik> we could just wait and talk about it at the hackfest I suppose.
18:15:25 <nirik> basically:
18:16:01 <nirik> I want 2 things we don't currently have: replication (in case of disaster) and high availability (in cases where we reboot servers to apply updates, etc
18:16:17 <nirik> there are a number of ways to do this in the postgres world.
18:16:41 <nirik> There's pgpool which is a proxy... your apps talk to it, and it talks to postgres servers on the backend.
18:17:09 <nirik> The problem with that is that it doesn't understand the full set of sql and can get confused. Also, it's another single point.
18:17:27 <nirik> There's postgres'es native master/replicate stuff.
18:17:55 <nirik> But it requires you to do various things when you promote and when you demote/readd a old spare
18:18:24 <nirik> BDR does limit you on what you can do somewhat... but it makes things super easy otherwise.
18:18:35 <nirik> You can reboot any of the nodes and they resync when they come back up
18:18:50 <nirik> You don't have to do anything weird or arcane for that.
18:19:02 <nirik> But I do understand the sql limits are anoying on the app side.
18:19:13 <nirik> So, thats it in a nutshell. :)
18:19:25 <smooge> sql limits are god's way of saying slow down
18:19:41 <puiterwijk> Well, I think the sql limits are reasonable, honestly. One is "have a primary key in all tables", which with most things already happens by default
18:19:44 <nirik> I'll try and make sure we have everyone ok with it before I deploy things.
18:19:51 <puiterwijk> (for DBR)
18:20:14 <nirik> it would help a lot if sqlalchemy could do the right thing on updates.
18:20:21 <puiterwijk> The other is no things like "CREATE TABLE ... AS SELECT ...", which you can just split into two things
18:20:47 <puiterwijk> Well, that's an alembic thing. And the alembic "bug" is easy to fix, and only needs to be done once per application
18:20:55 <smooge> both of which I think go with "slow down"
18:20:57 <puiterwijk> (but yes, we should gt that added upstream)
18:21:13 <puiterwijk> smooge: I tend to disagree. They're both things that don't happen very often
18:21:51 <smooge> puiterwijk, I think we are violently agreeing. If an app HAS to do those things for some reason, it needs to slow down
18:22:05 <nirik> note also that they are trying to get all merged into postgres and should be in there hopefully in 10.
18:22:13 <puiterwijk> Ah, right
18:22:22 <smooge> door bell brb
18:22:29 <praiskup> fwiw, I'm not sure whether it is not too early for BDR, and ... if you need just "backup", you won't profit probably from BDR
18:23:04 <nirik> as I hopefully said, we don't need just backup
18:23:18 <nirik> we want HA without a bunch of manual steps
18:23:35 <nirik> and we have been running a bunch of things in stg with it for a while. ;)
18:23:40 <praiskup> ok, that's still not BDR, BDR == master <-> master
18:24:24 <nirik> yep.
18:24:48 <nirik> it is master/master, but we aren't pointing traffic to both masters.
18:25:15 <puiterwijk> nirik: which we totally could do, and maybe should :)
18:25:44 <nirik> we could... except then we would need to make sure our apps could reconnect cleanly...
18:26:09 <nirik> if we have masterA and masterB and spread load, and reboot masterA everything connected to it would need to reconnect to masterB
18:26:27 <nirik> and most of our apps seem poor at reconnects
18:27:14 <nirik> anyhow, from the sysadmin side this is a big win, IMHO, but I am happy to discuss further with anyone with reservations. ;)
18:27:17 <nirik> thats all I had on this.
18:28:52 <nirik> smooge: ... back to you?
18:28:53 <smooge> ok thanks
18:29:16 <smooge> On the i95 freeway we have a major backup
18:29:18 <smooge> #topic Apprentice Open office hours
18:29:27 <smooge> Hello Apprentices
18:29:58 <bt0> Hello
18:30:05 <Skeer> hi
18:30:36 <smooge> any open issues needing help with?
18:31:02 <nb> .hello nb
18:31:04 <zodbot> nb: nb 'Nick Bebout' <nb@nb.zone>
18:31:23 <nirik> Oh, I did some digging on an interesting old ticket... without solving it really.
18:31:38 <nirik> if someone wants to continue digging on it, might be a nice bit of fun.
18:31:41 <smooge> what was the ticket
18:31:43 * nirik looks
18:31:54 <Skeer> I need some pointers on teh jenkiniss cleanup tcket
18:32:06 <nirik> https://pagure.io/fedora-infrastructure/issue/4211
18:32:18 <Skeer> https://pagure.io/fedora-infrastructure/issue/6003
18:32:40 <nirik> Skeer: for that, it just needs a normal logrotate config file... which we could add in ansible, but shold really be added to the package
18:33:44 <Skeer> For teh cleanup?
18:33:54 <smooge> Skeer, I would write the logrotate config file
18:34:08 <nirik> oh, sorry, thinking of the wrong ticket here.
18:34:24 <Skeer> lol
18:34:24 <smooge> Then put in a patch for our ansible for the time being.
18:35:00 <nirik> https://pagure.io/fedora-infrastructure/issue/6010
18:35:43 <smooge> Skeer, I would say if they have not contacted you within a reasonable time they can be killed
18:36:04 <smooge> They can get back when they show they are actually interested
18:36:20 <smooge> but I have a sinus headache
18:36:27 * nirik nods. agree
18:36:30 <bt0> some background about https://pagure.io/fedora-infrastructure/issue/5989
18:36:40 <bt0> i look fine for me
18:37:47 <Skeer> smooge: Gotcha.. oko well I'll give it until tomorrow.  Then update the ticket.
18:38:30 <smooge> Skeer, cool
18:38:47 <smooge> bt0, what do you need on that ticket?
18:38:54 <nirik> bt0: ah yeah, I can add more background there...
18:39:02 * nirik sees that it's kinda vuage.
18:39:06 <Skeer> bowlofeggs: Youo've likely guessed but Im totally lost on how to proceed on ticket https://pagure.io/fedora-infrastructure/issue/5932
18:39:56 <bt0> nirik, yeah please
18:42:00 <smooge> Skeer, for that one I would look at the two template files and see what really is different from them
18:42:32 <smooge> if possible I would syntax it as {% if env == 'stg' ... or whatever is correct text
18:42:41 <Skeer> Im hung up on not knowing the correct formatting to google for help.
18:42:55 <Skeer> Like what lang are thos ein?
18:42:58 <smooge> jinja
18:43:19 <smooge> I believe
18:43:23 <Skeer> Thats what I thought.. I found almost nothing IIRC
18:43:39 <smooge> jinja2 ansible is where I usually start my looking
18:43:51 <smooge> then I go throuhg the existing templates and pull out the logic I need :)
18:44:11 <Skeer> Gotcha.. I need to look for existing, working templates
18:44:29 <Skeer> Jinja2 by chance?
18:44:55 <Skeer> "friendly templating language for Python"
18:46:15 <nirik> Skeer: the haproxy one mgith be good to look at as an example
18:46:23 <Skeer> Im dragging the meeting out.. I'll go spelunking some more and see what I can find on that.
18:46:26 <bt0> Thanks nirik
18:46:37 <Skeer> nirik: noted :)
18:46:38 <smooge> ok thanks guys
18:46:42 <smooge> #topic Open Floor
18:46:50 <nirik> https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/haproxy/templates/haproxy.cfg
18:46:54 <mizdebsk> sorry, but i missed discussion about postgres BDR, could we still come back to this topic for a few minutes? or should just post to the mailing list?
18:47:00 <nirik> bt0: added a comment, hope it made sense. ;)
18:47:29 <nirik> mizdebsk: feel free to add or post to the list or whatever. I'm sure there's going to be more discussion. ;)
18:47:30 <smooge> mizdebsk, it is open floor so if you want to do so for 4 minutes
18:47:46 <mizdebsk> first, i quite dislike the situation we currently have with postgresql servers - staging has much different setup from production, which defeats the purpose of staging environment...
18:47:52 <mizdebsk> something may work in stg but will fail after moving to prod (we already hit this with koschei)
18:48:10 <mizdebsk> second, i have a feeling that BDR is not mature and painful to use
18:48:19 <nirik> well, yes, but I hope to fix that by rolling BDR to prod. ;)
18:48:25 <bowlofeggs> i missed the BDR topic due to some side convos i got pulled in to
18:48:26 <mizdebsk> most importantly from my pov, it does not support some features that i would like to use (such as partial unique index, or materialized view)
18:48:33 <bowlofeggs> i do have other concerns though
18:48:49 <mizdebsk> some apps are not as critical as others - the world won't end if they are not available for a few hours and they can even withstand data loss of a few days, eg. after restoring db from a bit older backup
18:48:56 <bowlofeggs> for example, BDR can get you into a deadlock situation that only human intervention can resolve
18:49:02 <mizdebsk> so i have an idea: what about having different db server for less critical apps? with no BDR and lower HA expectations
18:49:09 <nirik> sure. a dead postgres server can do
18:49:11 <nirik> that also
18:49:13 <mizdebsk> it could also run on fedora instead of rhel, to allow use of newer postgres features
18:49:25 <bowlofeggs> nirik: i mean a data deaklock
18:49:26 <nirik> mizdebsk: thats a thought indeed...
18:49:46 <bowlofeggs> like if a write is accepted by A, A goes down, B takes over and accepts a conflicting write
18:49:53 <bowlofeggs> that can't happen without BDR and it can happen with BDR
18:49:59 <nirik> many of our apps are very simple and don't really need vast features tho
18:50:06 <puiterwijk> bowlofeggs: it can happen with a master-slave HA postgres as well
18:50:15 <bowlofeggs> the slave is read only
18:50:17 <nirik> bowlofeggs: sure, but life is tradeoffs.
18:50:23 <mizdebsk> most of apps are tested with sqlite :)
18:50:24 <bowlofeggs> puiterwijk: i'm advocating to let bodhi be non-HA actually
18:50:25 <puiterwijk> bowlofeggs: not after it's promoted to master because the master is down
18:50:30 <puiterwijk> bowlofeggs: I'm not.
18:50:38 <bowlofeggs> i dont' think we should promote the slave
18:50:39 <puiterwijk> I'd say that bodhi is one of the mission critical apps actually
18:50:45 <bowlofeggs> unless we never bring the dead master back
18:50:56 <bowlofeggs> it's not user facing, it's devloper facing
18:51:09 <nirik> users test and add karma and comments?
18:51:17 <bowlofeggs> and it rarely has an actual outage (it has severe bugs, like right now, but rarely an actual outage)
18:51:17 <puiterwijk> bowlofeggs: fedora-easy-karma?
18:51:51 <bowlofeggs> even fedora-easy-karma i wouldn't consider mission critical
18:52:25 <puiterwijk> I'd call updates mashing/ pushing mission critical.
18:52:31 <nirik> I was dreaming of a world where we would no longer need scheduled outages for updates.
18:52:43 <bowlofeggs> my opinion: i'd rather not trade data safety for bodhi to get HA, when HA isn't a frequent problem for bodhi in the firs tplace
18:52:56 <smooge> bowlofeggs, I think the problem is you are wanting it not to be mission critical and we are being told by outside forces it is mission critical
18:53:26 <bowlofeggs> smooge: who is saying that bodhi is mission critical? obviously, it's not my call - this is just my opinion
18:53:33 <bowlofeggs> but a deadlock will bring it down too
18:53:54 * nirik notes we have seen... none of those (without human error) in stg
18:53:58 <bowlofeggs> so it's still not perfectly HA
18:54:04 <bowlofeggs> nirik: bodhi doesn't work in stg at all
18:54:26 <bowlofeggs> also, i saw another ocmment earlier that wasn't true about fk's
18:54:33 <nirik> sure, but I am just stating a data point
18:54:35 <puiterwijk> bowlofeggs: except it does as I said the other day, just not your account but that's because you have tables without primary key
18:54:47 <nirik> you are making it sound like it hits deadlocks all the time
18:54:48 <bowlofeggs> it's extrememly common for applications to have tables without fk's because that's how m2m relationships are most commonly done
18:54:57 <bowlofeggs> all of bodhi's non-fk tables are m2m tables
18:55:45 <bowlofeggs> nirik: i'm not saying it happens all the time, but i am saying tat bodhi doesn't really have true downtime today and i'd rather have data safety than HA, if it were up to me (not saying it *is* up to me)
18:56:06 <mizdebsk> koschei definitely doesn't need to be HA, so ideally i would like to move it (koschei) to fedora-based, non-bdr postgres; if db server is specific to koschei then we (sysadmin-koschei) can take care of its maintenance
18:56:08 <bowlofeggs> so i feel like i'm giving up a lot and not getting something i  need in the trade
18:56:10 <puiterwijk> bowlofeggs: then you can (and probably should!) still add a primary key on the combination of the two tables - tada, a primary key that also gets you data safety
18:56:13 <nirik> well, I don't want to force things on people. :) I like reaching a consensus. ;)
18:56:38 <bowlofeggs> puiterwijk: BDR means no data safety
18:56:44 <smooge> I think we aren't going to reach it right here
18:56:53 <smooge> so let us move this to the list
18:56:59 <bowlofeggs> the pk doesn't solve the data safety problem, it's just a requirement for bdr
18:57:04 <bowlofeggs> smooge: ok
18:57:06 <puiterwijk> bowlofeggs: I disagree on that
18:57:10 * nirik does too
18:57:16 <bowlofeggs> puiterwijk: their docs say this, not me
18:57:23 <bowlofeggs> the deadlocks are documented
18:57:31 <bowlofeggs> but yes, let's talk more at the fad
18:57:35 <puiterwijk> bowlofeggs: also: the problem is that *you* aren't the one doing the database server maintenance, and needing to suddenly tell everyone everything is down for 15 minutes because you need to increase a disk
18:57:51 <bowlofeggs> again, i'm not sayign it's my call, but my opinion is that bodhi is better off as is,t hat's all
18:57:51 <smooge> I am closing this down.. you can argue more at the fad
18:58:01 <bowlofeggs> puiterwijk: fair
18:58:14 <smooge> #endmeeting