18:00:10 <nirik> #startmeeting Infrastructure (2015-03-26)
18:00:10 <zodbot> Meeting started Thu Mar 26 18:00:10 2015 UTC.  The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:00:10 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
18:00:10 <nirik> #meetingname infrastructure
18:00:10 <nirik> #topic aloha
18:00:10 <nirik> #chair smooge relrod nirik abadger1999 lmacken dgilmore mdomsch threebean pingou puiterwijk
18:00:10 <zodbot> The meeting name has been set to 'infrastructure'
18:00:10 <zodbot> Current chairs: abadger1999 dgilmore lmacken mdomsch nirik pingou puiterwijk relrod smooge threebean
18:00:10 <nirik> #topic New folks introductions / Apprentice feedback
18:00:23 * oddshocks waves
18:00:27 * puiterwijk is here
18:00:36 * lmacken 
18:00:43 * threebean 
18:01:22 * relrod here
18:01:27 * pingou here
18:01:36 <andreasch> here
18:01:50 * adrianr here
18:02:14 <nirik> any new folks like to introduce themselves or apprentices have questions?
18:02:45 <nirik> ok, on to info dump...
18:02:56 <nirik> #topic announcements and information
18:02:56 <nirik> #info Got all our iscsi storage moved from old to new filer with no downtime! - kevin
18:02:56 <nirik> #info Got new rbac-playbook installed and working - patrick and tim
18:02:56 <nirik> #info New FMN release deployed to production.  New features and bugfixes. - ralph
18:02:57 <nirik> #info Hooked some email lists up to fmn (scm-commits, meetingminutes) - ralph
18:02:58 <nirik> #info Turned off old emails from dist-git and pkgdb - ralph
18:02:59 <nirik> #info FAS3.0 status https://fedoraproject.org/wiki/User:Laxathom/Drafts:FAS3.0
18:03:20 <pingou> #info progit became pagure: http://blog.pingoured.fr/index.php?post/2015/03/25/Progit-is-dead%2C-long-live-pagure
18:03:25 <dgilmore> hola
18:03:26 <pingou> #info pagure progresses nicely
18:03:34 <amrzaki> hello
18:03:35 <nirik> threebean: is it worth a reminder about the pkgdb/git/koji no longer sending emails? ie, devel-announce or something?
18:04:03 <threebean> yeah, might as well.  I can send that to devel announce today.
18:04:07 <nirik> ok, cool.
18:04:10 <nirik> welcome amrzaki
18:04:28 <amrzaki> i'm new to infra team
18:04:44 <amrzaki> and i want to be have role in team :)
18:05:02 <nirik> amrzaki: great. Are you more interested in sysadmin? or application development related tasks?
18:05:12 <Corey84> .fas corey84
18:05:13 <zodbot> Corey84: corey84 'Corey84' <sheldon.corey@gmail.com>
18:05:50 <amrzaki> system admin
18:06:14 <nirik> amrzaki: excellent. See me in #fedora-admin after the meeting and I will point you in the right direction to get started...
18:06:32 <amrzaki> ok thanks <nirik>
18:06:50 <nirik> so, we don't have any discussion topics or 'learn about' sign up in the gobby this week. ;) Is there anything anyone would like to discuss and/or would someone like to teach about one of our applications or tools?
18:06:59 <lmacken> pingou: pagure.org and .io are available :)
18:07:11 <adrianr> I wanted to give an update about the MM2 status
18:07:21 <dgilmore> nirik: I did say koji is not weeks ago when I turned it off
18:07:26 <nirik> adrianr: sounds good.
18:07:33 <nirik> #topic Mirrormanager2 status
18:07:42 <pingou> lmacken: oh nice :)
18:07:44 <nirik> dgilmore: yeah, people forget tho...and we did just turn off pkgdb and git.
18:08:02 <adrianr> I would say the crawler is now working
18:08:16 <nirik> adrianr: it's still crashing from time to time tho right?
18:08:23 <adrianr> however, only with libcurl from F21
18:08:36 <adrianr> RHEL7.1 libcurl let's it crash after 1 minute
18:08:44 <pingou> :/
18:08:48 <adrianr> total runtime right now seems to be at least 7 hours
18:09:16 <nirik> thats not too bad.
18:09:16 <adrianr> there is bug report for the libcurl problem
18:09:24 <nirik> do you know what the current time for the current crawler is?
18:09:32 * pingou wanted to ask
18:09:43 <adrianr> no, but it takes some time, let me look
18:09:58 <adrianr> https://bugzilla.redhat.com/show_bug.cgi?id=1204825
18:10:34 <nirik> I'll note that _all_ mirrorlist servers are mm2 now.
18:10:43 <nirik> and have been more stable it seems like
18:10:56 <lmacken> that's great
18:10:56 <adrianr> unfortunately there are no real code changes which could explain why it works with F21 libcurl but not with RHEL7.1 libcurl
18:11:31 <pingou> nirik: :awesome: :)
18:11:42 <nirik> adrianr: yeah, puzzling.
18:11:49 <oddshocks> Question: Is there some reference sheet for the meaning of all the keys found in the MM pickles?
18:11:49 <Corey84> post meeting I need to chat with someone about my login creds they seem to have disappeared
18:12:11 <nirik> Corey84: for fas? or ?
18:12:14 <oddshocks> There's all this stuff like this http://paste.fedoraproject.org/203355/73935231/
18:12:18 <oddshocks> and I have no idea what that stuff means
18:12:22 <adrianr> no I don't know about the pickles
18:12:47 <threebean> oddshocks: I doubt it.  but creating such a doc would be really worthwhile.
18:12:53 <Corey84> i hit the mm page and fas says  no dice on login  (  invalid user or  creds don't allow access to this page)
18:13:04 <threebean> pingou: do you still have your visualization of the mm2 table schema around somewhere?
18:13:11 <pingou> threebean: I do
18:13:22 <nirik> Corey84: odd. Nothing should have changed...
18:13:22 <oddshocks> I feel like I'm kind of floundering on this here, and whatever docs people have would be great
18:13:35 <pingou> threebean: https://github.com/fedora-infra/mirrormanager2/blob/master/doc/mirrormanager.png
18:13:41 <pingou> (as well as the dia file)
18:13:47 <Corey84> nirik,  hence the mention but post mtg  not trying to hijack the mtg
18:13:47 <adrianr> current crawler kills all crawls which do not finish within two hours I thnik
18:14:15 <pingou> adrianr: but you added the time-out right?
18:14:35 <dgilmore> I notice that mm says my mirror was crawled recently. it shouldnt be crawled as its not public and I use report mirror
18:14:52 <threebean> oddshocks: maybe take a look at the mirrorlist daemon code?  it's the thing that reads in the pickles and uses them to assign requests to mirrors.  if a pickle is bad, it's going to fail when loaded by that code.  so if you could read through it and try to extract some of its assumptions... that could serve as the basis for some tests.
18:14:57 <dgilmore> ignore me
18:15:08 <adrianr> there is a time-out in the code but it seems not work yet
18:15:12 <smooge> /ignore dgilmore
18:15:45 <adrianr> but, with F21 libcurl, the MM2 crawler seems to work and is pretty stable
18:15:50 <pingou> oddshocks: some of the cache will change as the data changes, for example host_netblock_cache
18:15:51 <oddshocks> threebean: yesss thank you! that will help immensely
18:16:12 <adrianr> the newest code changes also re-enabled the xmlrpc interface
18:16:21 <pingou> oddshocks: other will changes as the DB changes, I guess asn_host_cache or host_country_allowed_cache (although I would expect this last one to be pretty stable)
18:17:23 <adrianr> looking at the code I found out that report_mirror marks a mirror as up to date as long as all directories are there
18:17:35 <nirik> adrianr: so what big parts are left? backend (ie, scanning dirs and making pkl) and frontend (but thats very minor)
18:17:40 <oddshocks> pingou: So what'd be an example of a change that could break the pickle?
18:17:43 <adrianr> as far as I understand the code it does not care about the content of the directory
18:17:57 <dgilmore> adrianr: not really
18:17:58 <nirik> adrianr: that explains a lot of weird behavior I have seen over the years. ;(
18:18:22 <pingou> oddshocks: maybe we could check that we always get some (stable) values, for example things should not changed for the FP mirrors (or the dell ones), maybe we could check these are always there
18:18:24 <nirik> report_mirror is for private mirrors right? because we can't crawl them? but do public ones use it too?
18:18:32 <dgilmore> adrianr: in the name of making things faster it validates only parts of the tree, and assumes teh rest is right
18:18:40 <dgilmore> nirik: its for all mirrors
18:18:52 <adrianr> but required for private mirrors
18:18:57 <pingou> oddshocks: after the idea (imho) is to set-up a system where we can add tests as we find bad pickle to prevent the same error/problem from happening again
18:19:00 <nirik> IMHO we should reject it for all except private
18:19:08 <dgilmore> nirik: it lets us know when a mirror has synced without crawling them
18:19:15 <pingou> oddshocks: so as long as we have the architecture in place, we can always expand on after :)
18:19:22 <nirik> dgilmore: yes ,but they can (and are) wrong
18:19:27 <nirik> and then users get crappy mirrors.
18:19:27 <adrianr> report_mirror is helpful (kind of)
18:19:28 <dgilmore> its required for private mirrors
18:19:48 <dgilmore> nirik: mm also makes some other assumptions
18:19:51 <nirik> people just stick it at the end of their rsync... but that does not mean it worked.
18:20:02 <dgilmore> and doesnt drop mirrors out if it is unsure of content status
18:20:18 <oddshocks> pingou: Awesome, that was going to be my next question. :) Do you mean some sort of alert system that a pickle has a problem we haven't seen before, and so someone needs to write a new test? Or some sort of automated system?
18:20:27 <nirik> if we can crawl everything in 7 hours, why not move to a 'once the crawler says you have it you do'
18:20:28 <adrianr> MM1 and MM2 has prelimnary code for a canary scan, maybe this would be good to activate
18:20:57 <dgilmore> nirik: its kinda false, and causes mirrors to drop in and out
18:20:59 <pingou> adrianr: +1 on that
18:21:18 <pingou> dgilmore: do you think you could find 1h at one point to write down all this?
18:21:19 <nirik> why? not sure I follow
18:21:32 <threebean> oddshocks, pingou: start small.  just a python function that reads in a pickle and raises an exception if it is a 'bad' one.  later we can integrate it into unit tests, or nagios, or the mm2 code itself if it proves useful.
18:21:40 <dgilmore> pingou: sure, but I do not think any of it is new
18:21:54 <pingou> dgilmore: it is not, but it's also not documented anywhere
18:21:54 <oddshocks> threebean: got it
18:22:02 <dgilmore> maybe I just have a more initimate knowledge
18:22:12 <nirik> perhaps this is all a discussion for the list.
18:22:19 <adrianr> I think the crawler could be distributed on multiple machines and we could crawl each category seperately, this could decrease crawl time a lot
18:22:29 <pingou> oddshocks: threebean my idea was that we could integrate that in MM2 itself so that we avoid pushing bad pickle to the mirrorlist servers
18:22:38 <adrianr> but it would need a lot of resources
18:22:42 <nirik> adrianr: does it keep crawling over and over? or ?
18:22:48 <dgilmore> adrianr: we need to be sure crawling does not overwhelm mirrors
18:22:57 <threebean> pingou: sounds good.  let's write the core of it first before worrying about using it to gate stuff.
18:23:02 <pingou> +1
18:23:02 <adrianr> dgilmore: that is right, yes
18:23:07 <nirik> could we be smart... crawl everything, stop. If new mirror or changes to mirror, recrawl it, if content changes, recrawl
18:23:37 <nirik> dunno... I guess that fails.
18:23:48 <adrianr> the crawler could be much smarter, that is true, but that requires lot of code changes
18:23:50 <dgilmore> perhaps we can do full crawls weekly, but only crawl for known changed content
18:23:53 <nirik> but I do think we can be more strict and make sure users don't get bad mirrors as much
18:24:16 <adrianr> right now the crawler starts every 12 hours (but crashes after the first minute)
18:24:19 <nirik> I think there's a lot of cases where report_mirror is being run, but it's wrong and they aren't up to date.
18:24:22 <pingou> +1 on a smarter crawler but -1 on waiting on it to push MM2 :)
18:24:24 <dgilmore> nirik: is it really a big issue. I rarely get a bad mirror
18:24:47 <adrianr> pingou: I was not thniking of delaying MM2 for that
18:24:52 <dgilmore> pingou: we need working crawling
18:24:56 <nirik> dgilmore: I've had a number of complaints. But yeah, in the end you using get what you want, but it just looks bad to have to hit several mirrors first
18:25:03 <pingou> adrianr: I thought it was running fine with the libcurl from f21?
18:25:03 <nirik> pingou: agreed.
18:25:05 <dgilmore> if it crashes after a minute thats no good
18:25:20 <dgilmore> but we shouldnt hold up moving to mm2 on it
18:25:23 <adrianr> pingou: yes, but only in my home directory ans started manually
18:25:42 <pingou> adrianr: should we try to recompile the f21 on epel7 and install it on the machine?
18:26:03 <nirik> well, we have 2.5weeks until the next window we could deploy it on
18:26:05 <adrianr> pingou: I don't know what the right solution, that or LD_PRELOAD
18:26:12 <nirik> I don't think we are going to get it out before tuesday.
18:26:16 <nirik> so it will need to be after beta
18:26:27 <nirik> so, hopefully curl can be fixed.
18:26:57 <adrianr> so, crawler works, I have not seen errors in umdl (yet) but also not looked very closely,
18:27:07 <adrianr> frontend seems to work and xmlrpc
18:27:33 <adrianr> the next steps would be a new release and then trying to move the installation from staging to the production systems
18:27:34 <dgilmore> will people need to have a newer report_mirror?
18:27:36 <nirik> umdl and making sure it makes good pkls is the last big part IMHO
18:27:38 <adrianr> to be ready to switch
18:27:53 <adrianr> no, I am using a very old report_mirror for my testing on my mirror
18:28:00 <adrianr> never updated it in years
18:28:08 <pingou> dgilmore: we tried to be backward compat on this :)
18:28:14 <dgilmore> adrianr: cool. I think  I use an ancient version
18:28:16 <pingou> adrianr: thank you for that :D
18:28:18 <nirik> adrianr: BTW, thank you a lot for all your work on this. It's really appreciated.
18:28:31 <pingou> adrianr++ for all the testing on MM2, have a cookie :)
18:28:37 <adrianr> thanks
18:28:54 <nirik> adrianr++
18:28:55 <nirik> :)
18:28:56 <dgilmore> pingou: good, there will be mirrors not running rhel/fedora and likely not know/notice they need a new version
18:29:18 <nirik> perhaps we could move report mirror to a fedmsg thing someday.
18:29:31 <nirik> would require giving out certs or something tho
18:29:48 <adrianr> I am personally not sure if it is good to switch before F22, but we can try it and fix errors as they come
18:29:59 <threebean> ...and whitelisting tons of ips in iptables..
18:30:07 <nirik> threebean: yeah, ok, nevermind. ;)
18:30:11 <pingou> a parallel bus?
18:30:15 <nirik> adrianr: well, we can see.
18:30:20 <threebean> -1.  this is a good fit for 'http'
18:30:22 <dgilmore> nirik: yeah, I would like to get to a point where we can have a mirroring script that uses fedmsg to know when to sync and report back
18:30:31 <nirik> dgilmore: we have such a script already.
18:30:31 <threebean> if we want it on the bus we could do the github2fedmsg style http->fedmsg bridge.
18:30:38 <nirik> adrianr wrote one. ;)
18:30:40 <pingou> threebean: I heard about this, it's this new trendy thing, right?
18:30:50 <dgilmore> and would let us automate implementing mirroring tiers
18:30:57 <nirik> https://fedoraproject.org/wiki/Infrastructure/Mirroring#Mirror_Frequency
18:31:01 <dgilmore> nirik: not a complete one
18:31:15 <dgilmore> nirik: what we have is only part of it
18:31:19 <adrianr> I am still interested in getting SSH keys into MM2 to get fedmsg triggered push mirroring
18:31:46 <nirik> dgilmore: whats missing?
18:32:34 <nirik> adrianr: wonder if that would be finally a use case for ssh certs.
18:32:37 <dgilmore> nirik: part of its us, in making messages for releases
18:32:48 <dgilmore> nirik: but having mirrors report back
18:33:06 <dgilmore> nirik: so when tier 1 mirrors are done, tier 2 know to kickoff
18:33:20 <nirik> dgilmore: ah right... releases, yes
18:33:40 <threebean> we could have report_mirror publish a fedmsg on our side.
18:33:59 <nirik> adrianr: then mm generates a cert and signs it with a ca, and mirrors can trust certs with that ca, and we can also restrict what those certs could be used for.
18:34:02 <threebean> i.e., tier1 mirror runs report_mirror, pings the mm2 server.  mm2 server publishes a messages which makes its way to the tier2s.
18:34:11 <adrianr> but there is again the problem we know not for sure what the mirror has
18:34:24 <nirik> true.
18:34:35 <dgilmore> threebean: we could. I really want to get the tier 1 mirrors all using a common setup to sync contents
18:34:38 <adrianr> but that would help for the case that everything worked
18:34:55 <threebean> right.  if we're going to do the work to allow tier1 mirros to publish to our bus.. we could instead invest it in making report_mirror more detailed, or reliable, or however you want to call it.l
18:35:00 <dgilmore> I do not think we can solve it all today
18:35:10 <nirik> threebean: sounds reasonable.
18:35:16 <nirik> dgilmore: nope, but good ideas. ;)
18:35:33 <dgilmore> threebean: assuming that the tier 1 mirrors use report_mirror
18:35:39 * threebean nods
18:35:43 <threebean> dgilmore: there's your common setup.  ;)
18:35:51 <dgilmore> threebean: but we could make that part of the automated mirroring setup
18:36:30 <dgilmore> I think that today a lot of the tier 2 and 3 mirrors pull directly from dl.fp.o and not teh tier 1 mirrors
18:36:38 <nirik> yep. I agree.
18:36:44 <dgilmore> but I do not have cold hard facts to back up my feeling
18:36:55 <nirik> I think it's the case, but yeah.
18:37:04 <nirik> anyhow...
18:37:08 <nirik> #topic Open Floor
18:37:12 <nirik> any items for open floor?
18:37:39 <threebean> just a status report thing.. i'm trying to get a bunch of new-hotness issues solved and ready to deploy before freeze.
18:37:49 <threebean> mostly cleaning up error reporting and related things.
18:37:50 <nirik> cool.
18:38:09 <nirik> I'm going to see about applying non rebooting updates, so we are updated before freeze.
18:38:43 <nirik> ok, thanks for coming everyone. See you in #fedora-admin, #fedora-apps, and #fedora-noc
18:38:46 <nirik> #endmeeting