20:00:24 #startmeeting infrastructure 20:00:24 Meeting started Thu Sep 16 20:00:24 2010 UTC. The chair is mmcgrath. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:24 Useful Commands: #action #agreed #halp #info #idea #link #topic. 20:00:29 #meetingname infrastructure 20:00:29 The meeting name has been set to 'infrastructure' 20:00:33 #topic who's here 20:00:35 * lmacken 20:00:35 who's here? 20:00:38 hey 20:01:30 * mmcgrath waits a bit 20:02:10 mmcgrath: Wanna have open floor first? :-) 20:02:26 abadger1999: actually that would be great, I'm needing to get tickets in place for the beta release. 20:02:33 We can end with it too just in case 20:02:36 * mdomsch 20:02:38 #topic Open Floor 20:02:41 abadger1999: you have something? 20:02:45 yeah 20:02:46 https://fedoraproject.org/wiki/LATAM_Infrastructure 20:03:01 Talked to the latam infra people a few weeks ago and been forgetting to bring it up. 20:03:14 Yeah I had a conversation or two with them as well 20:03:14 I had them list some of the things that they need. 20:03:25 basically got just far enough to tell them not to auth against us except for using ssh keys. 20:03:36 I don't know how we can satisfy them but I figure knowing what the issues are is the first step. 20:03:38 which is not particualrly helpful for what they're trying to do unfortunately :( 20:03:55 here 20:04:06 abadger1999: did they want us to host their DNS? 20:04:26 mmcgrath: They want to get it so that it's not just Rodrigo being the contact. 20:04:44 mmcgrath: I think that they're for us hosting it... figured that would be pretty easy to do. 20:04:56 yeah it's a transfer, and something we've done several times before. 20:05:02 it is, however, time consuming for some reason. 20:05:05 it just takes a whiel. 20:05:14 a looooong while 20:05:23 1 year if you are in Malaysia 20:05:27 abadger1999: can you give a roundup of what all you talked about and what they're wanting to do? 20:05:49 Easy stuff: get away from single points of person failure 20:06:04 Like transfering DNS to fedora project so that one person can't take away the domain. 20:06:25 Social stuff - integrate better into fedora. 20:06:29 do they have a team of sysadmins? 20:06:39 or something similar at least? 20:06:52 ie: right now latam infra and community is pretty isolated from the noramerican/Europeans. 20:06:56 Yes. 20:07:08 All volunteers so they don't have as mch time as we do. 20:07:12 Nor the hardware we do. 20:07:21 * dgilmore turns up 20:07:29 But gomix nushio dbruno are all on the sysadmin team. 20:07:30 are they wanting to make websites for non-latam people? 20:07:51 Not sure -- They want to make web apps for non-latam people. 20:07:54 or are they just focusing on it, but would like better access to the rest of the community for... idea sharing? I'm not sure what word I want to use there. 20:08:02 knowledge pool is probably better. 20:08:12 timpus -- events platform for all of the ambassadors everywhere. 20:08:23 for instance. 20:08:41 So they're more than just websites/documents. 20:09:01 and they're looking to host that for the larger ambassador community? 20:09:07 Right. 20:09:24 I'm generally for that, I know this is probably a tough pill for some to swallow and might look weird. 20:09:39 but if we can properly empower teams like that to host their stuff, it lowers the barriers for them to create those apps 20:09:49 Yep. I agree. 20:09:50 while allowing us to keep the high quality architecture we currently have. 20:10:04 i'm just not sure of how to make it all smooth. 20:10:22 Like how to make the events platform auth against fas in a way that doesn't compromise security. 20:10:25 so we don't end up committing to a bunch of... side apps? I'm not sure how to say that without seeming negative because I'm really for teams being able to provide for themselves where they are able. 20:10:39 abadger1999: yeah, that's the big 'got'cha' right now 20:11:18 gomix and nushio will be at fudcon tempe so it might be good to have some plans around figuring out what we can do there. 20:11:27 yeah 20:11:27 But also figuring out options right now would be good. 20:12:02 abadger1999: one thing I wanted to think about is if there's any sort of auth mechanism where the password itself never leaves the browser. 20:12:05 Like SSL auth for their sites Or something. 20:12:07 but the encrypted form would? 20:12:20 I'm not sure how sensitive encrypted passwords should be considered. 20:12:28 Hello everyone, this is Jason Brown 20:12:31 just something else I thought was worth investigating. 20:12:38 ninjazjb: hello Jason, glad you could make it 20:12:49 mmcgrath: There is -- but you still have to be careful about replay attack or simply, MITM causing something different than you expect to happen. 20:12:59 Thanks 20:13:00 mmcgrath: id feel more comfortable with using ssl auth 20:13:41 abadger1999: yeah, I guess a replay could cause other non-official sites to get jacked at that point 20:13:45 anyway, a conversation for another time. 20:13:49 abadger1999: what else you got? 20:14:11 That's it from me for now -- just wanted to get us thinking about it before fudcon. 20:14:14 not that i think they would do it but there would be the potential to harvest passwords which would take constant code audits tomake sure it doesnt happen 20:14:18 And point out the wiki page with the brainstorming 20:14:48 abadger1999: thanks 20:14:58 Ok, if no one has anything else on that, we'll get down to the F14beta business. 20:15:40 ok, lets do it 20:15:47 #topic Fedora 14 Beta. 20:16:11 https://fedorahosted.org/fedora-infrastructure/report/9 20:16:18 .ticket 2392 20:16:19 mmcgrath: #2392 (New website) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2392 20:16:30 * mmcgrath tries to summon sijis 20:16:57 we can skip that one for now 20:17:00 .ticket 2393 20:17:01 mmcgrath: #2393 (Verify Mirror Space) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2393 20:17:03 I'll get this one 20:17:26 .ticket 2394 20:17:30 mmcgrath: #2394 (Release day ticket) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2394 20:17:46 I'll nab that 20:17:51 actually 20:17:59 smooge: do you want to do the release day coordination this time? 20:18:47 we'll come back to that 20:18:53 .ticket 2392 20:18:54 mmcgrath: #2392 (New website) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2392 20:19:00 sijis: will we have a fancy new beta site? 20:19:16 yep. i think we definitely will 20:19:17 or the old beta site? what's the plan there? 20:19:27 well. for GA == new site 20:19:35 for Beta = existing site 20:19:42 mmcgrath, is that putting in new tickets or doing overall tickets 20:19:43 you guys sure you don't want to release that a bit earlier then the actual release day? 20:19:49 smooge: one sec 20:19:57 np slow typing 20:20:46 sijis: ok, well I do look forward to the new site. Are you going to be point person for this release? 20:20:50 mmcgrath: i don't think we'll have th site completely finished for beta 20:20:56 yup 20:21:11 I'd be happy to see the new website live a few days ahead of the release... 20:21:17 even if it's not done by beta 20:21:30 build momentum for the actual release day 20:21:31 sijis: can you accept that ticket? 20:21:32 a week before release :)? 20:21:34 mdomsch: yeah 20:21:36 ok 20:21:39 and not risk blowing things up on release day 20:21:45 mdomsch: that's my main concern. 20:21:52 mmcgrath: will do 20:21:57 sijis: thanks 20:22:08 you mean ticket 2392 or another one? 20:22:20 sijis: 2392 20:22:25 we'll move on to the next ticket :) 20:22:28 .ticket 2394 20:22:29 mmcgrath: #2394 (Release day ticket) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2394 20:22:36 smooge: would you like to do this? 20:22:53 It's basically just making sure everything gets done prior to us sending the announcement out. 20:22:53 taking 20:23:06 for example, the website should be up and ready, all the links should work 20:23:09 that sort of thing. 20:23:18 smooge: sweet 20:23:20 ok 20:23:25 * sijis will make sure links work this time :) 20:23:52 smooge: the only downside for you is I think you'll have to get up early because release time is 8:00 am your time. 20:24:01 I am of the opinion that for some of our audience a web page with a long list of href's is all we ever need :/ 20:24:06 the website should generally get started around 7:30 your time because it takes a while to sync, that sort of thing. 20:24:20 ok, well it'll be good to have someone else go through that process for a change anyway 20:24:24 ah ok so that day I need to be up at 0400 20:24:24 smooge: any questions? 20:24:33 :) 20:24:33 to get coffee into system 20:24:43 when is it currently planned? 20:24:53 October? 20:25:02 Or are we talking beta 20:25:06 September 28th 20:25:10 crap 20:25:12 this one's the beta 20:25:13 I can't do that 20:25:20 I am in RDU that day for class. 20:25:22 that's ok, that's why we discuss these things :) 20:25:25 I'll grab that one 20:25:32 I will be up though :) 20:25:51 ok, next ticket 20:25:57 .ticket 2395 20:25:58 mmcgrath: #2395 (Verify releng permissions) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2395 20:26:04 smooge: you want to get that one? 20:26:12 taking 20:26:34 .ticket 2396 20:26:35 mmcgrath: #2396 (Add MirrorManager repository redirects) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2396 20:26:38 mdomsch: got it? 20:26:47 yup 20:27:14 excellent 20:27:19 .ticket 2397 20:27:20 mmcgrath: #2397 (Infrastructure Change Freeze.) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2397 20:27:26 I'll accept this one, we're already in the freeze. 20:27:32 enjoy all the new infrastructure-list traffic :) 20:27:42 and last 20:27:44 .ticket 2398 20:27:45 mmcgrath: #2398 (Lessons Learned) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2398 20:27:48 that's for after the release. 20:28:21 ok got it 20:28:36 and that's that 20:28:40 Oxf13: ping 20:28:47 mmcgrath: hi 20:28:49 is he on a plane 20:28:54 no he is on the ground 20:29:03 Oxf13: time for your favorite question. You got any odds for chance of beta slip? 20:29:39 I have no idea, I'm out of the loop 20:30:03 yeah I know you've been busy. 20:30:13 Oxf13: who might know better? 20:30:47 jlaska/adamw of QA, dlehman of Anaconda, dgilmore/notting of releng 20:30:53 adamw: ping 20:30:54 jlaska: ping 20:30:57 mmcgrath: https://bugzilla.redhat.com/showdependencytree.cgi?id=611991&hide_resolved=1 20:31:05 mmcgrath: That might be a pretty good indicator :-/ 20:31:13 mmcgrath: we have 2 bugs left 20:31:17 * jlaska just sent mail to devel list 20:31:27 the last bugs are supposed to be fixed today 20:31:27 jlaska: you don't have to commit to it but you think probably not goign to slip? like 10% chance? 20:31:30 we need someone from kernel to provide guidance on bug#629719 20:31:45 once we compose that and start testing we will know better 20:31:50 mmcgrath: if we can't compose an RC on time, chances aren't good 20:32:09 k, I'll follow up again next week. 20:32:27 there are only 2 bugs remaining ... the installer issue dlehman has a handle on ... but we need kernel guidance for the remaining dmraid issue 20:32:28 sounds like there's still some unknowns. 20:32:34 20:32:43 much better shape than yesterday, but still not 0 20:32:52 jlaska: thanks 20:33:01 ok, anyone have any questions, comments or concerns wrt the beta release? 20:33:17 hmmm ... 20:33:38 alrighty :) 20:33:47 #topic Strange pkgdb / bodhi outages on app5/6 20:34:18 so I was working with abadger1999 and lmacken just before the freeze to try to figure out what on earth was going on with apps on app5 and 6. 20:34:36 for those of you that don't know, basically app5 and 6 are considered backups. they don't get live traffic because they're offsite. 20:34:47 but, if for some reason all the production app servers go down, they pick up the slack. 20:35:08 well, even with no traffic, sometimes bodhi or pkgdb would hang, somtimes for hours. 20:35:14 and then they'd recover on their own. 20:35:17 it was incredibly strange. 20:35:25 the hosts were low load, db access was fine. 20:35:40 and both being tg apps it was extra strange that both of them getting in that state at the same time on the same server was low 20:36:02 I'm still not sure of a root cause, but I believe some of the wsgi processes were hanging, which was causing apache to block new requests from getting in. 20:36:21 So to bandaid that, we increased the number of processes available to each. 20:36:24 and so far. good luck. 20:36:29 I haven't seen any outage 20:36:39 at least not related to that 20:36:50 we have had some from the database filling up 20:36:57 anywah, any questions or comments on that? 20:37:40 alrighty 20:37:44 #topic pkgdb caching 20:37:59 abadger1999: any issues seen since we started caching image content? 20:38:17 It's been smooth. 20:38:20 .headers https://admin.fedoraproject.org/pkgdb/appicon/show/Terminator 20:38:21 mmcgrath: apptime: D=215947, content-length: 3412, x-varnish: 2111766604, age: 0, expires: Tue, 21 Sep 2010 20:38:20 GMT, connection: close, server: Apache/2.2.3 (Red Hat), appserver: app03.phx2.fedoraproject.org, proxyserver: proxy01.phx2.fedoraproject.org, via: 1.1 varnish, cache-control: max-age=432000, date: Thu, 16 Sep 2010 20:38:20 GMT, content-type: image/png, proxytime: D=218092 20:38:29 .headers https://admin.fedoraproject.org/pkgdb/appicon/show/Terminator 20:38:29 mmcgrath: apptime: D=215947, content-length: 3412, x-varnish: 2111766630 2111766604, age: 9, expires: Tue, 21 Sep 2010 20:38:29 GMT, connection: close, server: Apache/2.2.3 (Red Hat), appserver: app03.phx2.fedoraproject.org, proxyserver: proxy01.phx2.fedoraproject.org, via: 1.1 varnish, cache-control: max-age=432000, date: Thu, 16 Sep 2010 20:38:29 GMT, content-type: image/png, proxytime: D=664 20:38:35 mmcgrath: I don't know how much it helped -- need to ask mbacovsk or someone on the other end of a slow pipe from the servers. 20:38:36 hey hey, age. that's what I like to see. 20:38:59 for me I got the time generally cut in half. 20:39:05 but it's still several seconds for a large page list. 20:39:09 20:39:12 expires headers does seem to be working properly 20:39:50 how goes varnish with this? 20:40:28 abadger1999: one thing I've noticed... expires doesn't seem to be working 20:40:31 and i'm not sure why 20:40:38 my browser has these iamges, it shouldn't be re-requesting them. 20:40:45 it could be related to the auth / cookie. I need to research it. 20:41:05 smooge: well, basically we have set aside a part of the pkgdb namespace (/pkgdb/appicon/show) 20:41:10 and we're doing two things with it 20:41:27 when a cookie gets sent, varnish unsets it to request the data, when it does get the data, it unsets the cookie and sends it back. 20:41:39 because cherrypy wants to set a cookie with every request. 20:42:08 ah ok 20:42:15 anyone have any questions or comments? 20:42:25 or ideas as to why firefox is ignoring the expires header :) 20:42:37 abadger1999: etagging would be helpful here. FWIW. 20:42:55 ok, that's all I've got 20:42:59 #topic Open Floor 20:43:05 anyone have anything else they'd like to discuss? 20:43:07 anything at all? 20:43:12 fas 20:43:25 smooge: hit it 20:43:28 we are having issue with the fas servers at the moment 20:43:40 oh right right 20:43:44 that was on my list and I forgot :) 20:43:54 we have an open bugzilla on it and I am trying to get the data to developers as soon as possible 20:44:14 it looks like something with swap space just not working under certain loads 20:44:39 and when swap space quits working.. OOM gets hungry 20:45:07 so we have some interesting OOPS but not much else. 20:45:28 It seems to occur on the servers rather regularly at 03:30-03:50 20:45:30 interesting 20:45:33 but not sure why 20:45:35 I'm surprised we're using swap on there at all 20:45:37 https://admin.fedoraproject.org/collectd/bin/index.cgi?hostname=fas02&plugin=swap×pan=604800&action=show_selection&ok_button=OK 20:45:41 http grows 20:45:45 even still, its not a lot. 20:46:00 no it isnt.. and when the problem occurs it is not like its heavy in swap 20:46:02 smooge: are they still all rebooting at least every 24 hours? 20:46:08 just all of a sudden no more swap for you 20:46:23 well the new kernel has slowed that down a bit 20:46:39 but not sure why. I am expecting tonight to be a hit 20:46:47 k 20:46:53 smooge: thanks for following up and tracking that issue 20:46:57 2 nights ago we had all 3 reboot and looking at the db02 data 20:47:08 we had a TON of fas connections beyond normal at that time 20:47:12 not sure why yet 20:47:54 yeah 20:48:48 EOF 20:48:53 alllrighty 20:48:55 thanks :) 20:49:01 np 20:49:03 if no one has anything else, we'll close in 30 20:49:13 allergies killing me softly with sneezes 20:49:20 bummer :( 20:49:24 .... 20:49:26 that's never fun 20:49:30 As usual, the Cloud SIG meeting starts at the top of the hour for those of you who are interested. ;) 20:49:39 and that's it! 20:49:40 #endmeeting