20:00:16 #startmeeting 20:00:23 * ricky 20:00:26 #topic Infrastructure -- Who's here? 20:00:31 * dgilmore 20:00:31 * ricky (oops) 20:00:32 * dgilmore 20:00:33 * dgilmore 20:00:40 * LinuxCode  20:01:10 * johe here 20:01:14 * nirik is around. 20:01:21 K. lets get started on ticket 20:01:23 s 20:01:30 #topic Infrastructure -- Tickets 20:01:32 .tiny https://fedorahosted.org/fedora-infrastructure/query?status=new&status=assigned&status=reopened&group=milestone&keywords=~Meeting&order=priority 20:01:37 * davivercillo is here 20:01:47 So the first and only meeting item is, again, 20:01:49 .ticket 1503 20:01:54 hello 20:01:55 last time we talked about the the whole metting 20:01:59 * onekopaka was here. 20:02:01 this time lets kill it after 10 minutes max. 20:02:03 abadger1999: around? 20:02:03 is* 20:02:05 mmcgrath: i think we could again 20:02:07 here 20:02:20 * abadger1999 arrives 20:02:36 did we get input on what source we have to make available? and how we have to make it available if we went with AGPL everywhere? 20:02:46 abadger1999: so give us the latest poop 20:02:48 dgilmore, spot is working on it with legal 20:02:50 I assume this is the AGPLv3 issue ? 20:02:59 or I should wait until abadger1999 20:03:00 * skvidal is here 20:03:03 wasnt somebody supposed to be here ? 20:03:20 LinuxCode: They were at OSCON 20:03:27 ohh 20:03:28 spot is talking to legal. So I think we don't have much to say here. 20:03:32 the person spot mentioned ? 20:03:36 Unless people have new questions since last year 20:03:46 abadger1999: so no progress since last week? 20:03:52 * LinuxCode hasnt, but sees both sides of the argument 20:04:15 abadger1999: right. until we clear up the legal requirements we can do anything 20:04:16 mmcgrath: Well, spot has the list of questions now and it has gone from him to legal. But we haven't gotten a writeup yet. 20:04:22 So we can make no progress. 20:04:45 k 20:04:56 Is Bradley Kuhn here? 20:04:59 * mmcgrath notes domsch invited him 20:05:11 I think he was OSCONing, so mdomsch moved it up a week 20:05:22 ah, k 20:05:26 I'd like to go ahead with relicensing python-fedora to lgplv2+ since that won't be affected by whatever we decide regarding agpl. 20:05:28 ah. so he did :) 20:05:49 abadger1999: What all are we accomplishing with that? 20:06:20 mmcgrath: Right now it's gplv2. LGPLv2+ will make it so more people can use it. 20:06:46 for instance, if someone writes code with the apache license, python-fedora will work under the LGPLv2+. 20:06:58 also, mdomsch would like it to change. 20:07:10 k 20:07:16 well, if thats all we have on that I'll move on 20:07:17 mirrormanager is MIT licensed. But when used with python-fedora, the combined work is GPLv2. 20:07:35 together - they fight crime! 20:07:52 with python-fedora LGPLv2+, mirrormanager remains MIT in practice. 20:07:52 hmm. 20:08:03 s/inpractice// 20:08:07 skvidal: that's good, I'm pretty sure smolt is committing some.... 20:08:14 abadger1999: ok, thanks for that 20:08:20 anyone have anything else on this topic before we move on? 20:08:52 * LinuxCode shakes head 20:08:59 k 20:09:13 #topic Infrastructure -- Mirrors and 7 years bad luck. 20:09:19 haha 20:09:20 you broke it 20:09:25 its your fault 20:09:26 lol 20:09:27 So, as far as I know the mirrorsr are on the mend. 20:09:41 +1 can confirm that 20:09:41 I believe so. 20:09:47 had first updates today 20:10:04 jwb just did a push that finished. 20:10:09 so there's a bash update coming out. 20:10:17 we're trying to time how long it takes that update takes to get to our workstations 20:10:25 mmcgrath, i see it on d.f.r.c, but i haven't gotten it via yum yet 20:10:26 so if any of you see a bash update available, please do ping me and let me know. 20:10:37 (and i'm doing yum clean metadata/yum update every little bit) 20:10:55 * mmcgrath verifies he also doesn't have it 20:11:21 * LinuxCode cleans up and sees if it has made it to the UK 20:11:24 yeah, no update yet. 20:11:27 So keep an eye on that. 20:11:32 Here's some of the stuff that's been done 20:11:42 1) we've put limits on the primary mirrors 20:11:56 2) we've started building our own public mirrors system which, for now, will be very similar to the old mirrors system. 20:11:59 but we control it 20:12:22 3) we've leaned harder on various groups that we're blockign on to get our i2 mirror back up and our other primary mirror back up 20:12:26 we're supposed to have 3 of them. 20:12:43 But still no root cause, though it sounds like a combination of thigns 20:12:47 err things. 20:13:05 To me the biggest issue isn't that the problem came up, it's that it took so long to fix and our hands were largely tied for it. 20:13:16 * davivercillo came back... 20:13:18 So we're working hard to build our own mirrors out that we can work on, monitor, etc. 20:13:34 su 20:13:38 su 20:13:42 Password: 20:13:43 yeh, was a bummer that it took that long 20:13:43 mmcgrath: how is that going to work? 20:13:52 Southern_Gentlem: sudio? :) 20:14:07 wrong window sorry 20:14:09 dgilmore: It'll just be rsync servers that mount the netapp 20:14:12 dgilmore: for now we've got sync1 and sync2 up (which are RR behind sync.fedoraproject.org) which we're going to dedicate to our tier0 and tier1 mirrors. 20:14:23 They mount the netapp and basically do the same thing download.fedora.redhat.com did 20:14:28 Long term though... 20:14:33 Have we decided to dedicate it, or just have connection slots reserved for tier 0/1? 20:14:46 mmcgrath: ok what about from other data centres? 20:14:58 RDU and TPA? 20:15:02 rsync's connection limiting allows us to be pretty flexible with how we do that 20:15:03 ricky: right now we're not going to tell others about it and we might explicitly deny access the non tier0/1 mirrors 20:15:05 TPA? 20:15:07 notting: FYI this might interest you. 20:15:08 mmcgrath, so, the other mirrors grab from sync1 and sync2 ? 20:15:08 OK 20:15:13 smooge: Tampa, I think 20:15:14 LinuxCode: only tier0 and 1 20:15:17 k 20:15:21 oh I thought Tampa was gone 20:15:22 dgilmore: so the future of that is going to look like this. 20:15:26 The other mirrors should technically grab from tier 0 or 1 20:15:42 TPA's mirror has been offline since February but it is physically in PHX2 now just not completely hooked up. 20:15:55 ah ok 20:15:59 They're going to get it setup, get the snapmirror working again, then we'll have some servers there that mount that netapp and share. 20:16:07 it'll be similar if not identical to what we have in PHX1. 20:16:15 for me the concern is 1) is the limiting factor bandwidth or disk space. 20:16:33 and if it's bandwidth, we might need additional servers in PHX2 which I understand has a much faster pipe. 20:16:38 That's all regular internet stuff. 20:16:45 mmcgrath, what about failure ? 20:16:45 on the I2 side we're going to get RDU setup 20:16:56 And will we get access to the rsync servers on the non-PHX sites? 20:17:01 mmcgrath: so the same thing in RDU? 20:17:01 LinuxCode: well we'll have one in PHX and one in PHX2 so we'll be redundant in that fashion. 20:17:08 k 20:17:10 dgilmore: similar in RDU, though probably not a whole farm of servers. 20:17:13 couple of boxes in front of teh netapp? 20:17:15 we'll have proper I2 access there. 20:17:16 * SmootherFrOgZ is around btw 20:17:26 but one thing I'm trying to focus on there is using ibiblio as another primary mirror. 20:17:43 Or at least work it in to our SOP so it can be pushed to very quickly and easily instead of pulled from. 20:17:52 we see one sign of problems from our primary mirrors and that can be setup and going. 20:17:55 we were lucky this last week. 20:18:03 no ssh vulnerabilities were actually real for example :) 20:18:40 so that's really what it's all going to look like. 20:18:52 smooge has some concerns about IOPS on the disk trays. 20:19:04 and we may have to take a more active role in determining what kind of trays we want in the future. 20:19:05 mmcgrath: cool 20:19:13 this one was done between the storage team and netapp and months of their research. 20:19:34 mmcgrath: its all sata right? 20:19:40 yes.. the trays and how they are 'set' up were based on if they were FC. 20:19:41 how big are the disks? 20:19:47 and now they are 1TB SATAs 20:20:02 the issue is that the SATAs perform at 1/3 the rate FC would 20:20:13 so we have one shelf in each location? 20:20:19 smooge: I'd have hopped that months of research would have shown that though. 20:20:21 but the FC would cost 8x more 20:20:27 I think their thoughts were that our FC rates were very underutilized. 20:20:42 smooge: right id expect that kind of decrease in performance 20:21:06 So the longer term future on all of this is still in question. 20:21:09 mmcgrath, for the cost, you could make more mirrors 20:21:12 mmcgrath, it could have been but more like 3/5's of capacity 20:21:26 and I'm pretty sure our problems, caused kernel.org's problems last week as well. 20:21:31 and maybe raid6+0 them 20:21:32 and his' machines are f'ing crazy fast. 20:21:40 ehh raid5+0 20:21:52 raid6 be slow 20:21:58 LinuxCode, the issue comes down to the number of spindles either way 20:22:07 hmm 20:22:08 and the bandwidth of the controllers 20:22:09 LinuxCode: raid6 and raid5 with lots of disks have nearly identical read performance. 20:22:19 true that mmcgrath 20:22:36 But still, no ETA on any of that. 20:22:38 even with stripping applied too ? 20:22:43 anyway.. it is what it is or whats done is done or some other saying 20:22:57 smooge, hehe 20:22:58 I have a meeting with Eric (my primary RH contact) to find out about funding for new servers and what not for all of this. 20:23:05 and the scary part is we had these issues with just 1T of storage. 20:23:15 these trays were purchased so we could have closer to 8T of storage to use. 20:23:32 hmmm 20:23:39 If we find the trays can't handle it.... then I don't know what's going to happen but I know the storage team won't be happy. 20:23:49 So anyone have any additional questions on any of this? 20:23:59 are these san trays ? 20:24:16 netapp 20:24:21 k 20:24:50 K, so that's that. 20:25:01 #topic Infrastructure -- Oddities and messes 20:25:22 So have things seemed more fluxy then normal to anyone else or is it just me? 20:25:33 We've largely corrected the ProxyPass vs RewriteRule [P] 20:25:34 thing 20:25:49 but I still feel there's lots of little outstanding bugs that have crept in over the last several weeks that we're still figuring out. 20:25:58 of particular concern to me at the moment is smolt. 20:26:12 but there were other things like the openvpn issue ricky discovered yesterday. 20:26:23 it seems like nagios has been having moments 20:26:31 where we get alot of alerts 20:26:31 Are we sure that the smolt changes were necessarily from the merge? 20:26:45 smolt was one of the ones whose proxy config was complex enough that I didn't touch it much 20:27:08 ricky: I actually think the smolt issues were discovered not because of the change but because of a cache change you made. 20:27:15 Ahhh, yeah. 20:27:22 I think nagios had been checking a cached page the whole time so even when smolt went down, nagios just didn't notice. 20:27:40 hehe 20:27:46 or at least didn't notice it unless things went horribly bad. 20:27:52 I'd like to have more people looking at it though 20:28:01 onekopaka has been doing some basic hits from the outside. 20:28:07 basically a "time smoltSendProfile -a" 20:28:10 mmcgrath: I have. 20:28:12 and the times were all over the place. 20:28:13 * sijis is sorry for being late. 20:28:18 including about a 5% failure rate. 20:28:42 Of course I hate to be spending time on something that is clearly not in Fedora's critical path, but we've got to knock it out 20:28:59 does smolt provide some debugging output thats useful ? 20:29:13 LinuxCode: it's almost entirely blocking on db. 20:29:13 as to network, dns issues 20:29:16 we even have the queries. 20:29:18 hmm 20:29:24 weird 20:29:40 mmcgrath: I can try to help you with this ... 20:29:44 Do we know which queries are causing the locked queries though? 20:30:05 ricky: not really 20:30:12 I still don't even understand why they're being locked 20:30:19 and why does locktime not mean anything? 20:30:20 is there a conn limit set up on the db end for the smolt unit ? 20:30:28 locktime? 20:30:35 LinuxCode: it's not that weird, it's got 80 million rows :) 20:30:40 ricky: yeah in the slow queries log 20:30:41 The time column on processlist is the that the query has been in its current state 20:30:42 Do we have any reproducers? I can try with postgres but we'd need to know whether we've gained anyhting or not. 20:30:51 Hm, I remember looking the slow queries one up 20:31:11 davivercillo: how's your db experience? 20:32:01 mmcgrath, so queries get processed or a connection passes to the db server, but it doesnt handle it, correct ? 20:32:02 mmcgrath: not so much yet... but I can learn fast ! :D 20:32:16 LinuxCode: the queries take several seconds to complete 20:32:18 for example 20:32:55 hmmm 20:33:01 I don't even have an example at the moment. 20:33:06 np 20:33:10 Ah, lock_time is the time the query spent waiting for a lock 20:33:11 but they're there. 20:33:27 So for the queries in the lock state with high times in processlist, they should have high lock_time if they're in the slow query log 20:33:33 ricky: so if a query is running on a table for 326 seconds... does that mean it was locked that whole time? 20:33:47 Depends on where the 326 number came from 20:34:26 ricky: in the slow queries log, do you see any queries that have a Lock_time above 0? 20:34:49 oh, there actually are some. 20:35:13 only 56 of 2856 though 20:35:16 So anyway 20:35:24 davivercillo: how's your python? 20:35:25 could it be that smolt sends some weird query, that then causes it to hickup ? 20:35:40 LinuxCode: nope, it's not weird queries :) 20:35:48 just a wild though 20:35:50 t 20:35:51 it's just the size of the db 20:35:51 mmcgrath: I think that is nice... 20:35:52 hehe 20:36:01 joins + size = slowness 20:36:22 well and that's something else we need to figure out, we've spent so much time optimizing render-stats (which is still pretty killer) 20:36:29 but we haven't looket at optimizing the sending profiles. 20:36:30 mmcgrath, yeh but if you do something funky + huge db = inefficient 20:36:31 mmcgrath: I did that script checkMirror.py, do u remember ? 20:36:37 huge db == inefficient :) 20:36:42 :P 20:36:48 mmcgrath, haha of course 20:36:50 davivercillo: yeah but that was smaller :) 20:36:57 but there is no way around that 20:36:58 davivercillo: ping me after the meeting, we'll go over some stuff. 20:37:01 mmcgrath: yep, I know... :P 20:37:03 if any of you are curious and want to poke around 20:37:07 mmcgrath: Ok ! 20:37:09 you can get a sample db to download and import here: 20:37:21 https://fedorahosted.org/releases/s/m/smolt/smolt.gz 20:37:24 It's about 500M 20:37:27 mmcgrath, yes! thanks! 20:37:40 * thekad has been waiting to load something like that 20:37:59 Ok, I don't want to take up the rest of th emeeting with smolt stuff so we'll move on. 20:38:20 #topic Infrastructure -- Open Floor 20:38:27 Anyone have anything they'd like to discuss? 20:39:03 importing meat pies from australia? 20:39:18 * mdomsch invited Bradley Kuhn to a future meeting to talk about agplv3 20:39:25 mmcgrath, actually, about this smolt stuff, is there a ticket where we can track? 20:39:28 we may have it cleared up by then, maybe not. 20:39:33 dgilmore: :) 20:39:33 mdomsch: have at it 20:39:38 thekad: not actually sure. I'll create one if not. 20:39:45 mmcgrath, Id just like to know when you guys have time to help me do that new mapping of infra 20:39:57 mdomsch: yeah we were talking about it a bit earlier. I saw your first email but not th esecond email :) 20:39:57 it will probably take a few weeks, if not longer 20:40:01 dgilmore, are they mutton meat pies? 20:40:03 I've seen this topic pop up several times, but we start from scratch every time, I think we could benefit there :) 20:40:09 smooge: no 20:40:11 if that ticket still even exists 20:40:17 dgilmore, then no thankyou 20:40:31 smooge: four'n'twenty pies 20:40:37 smooge: best ones ever 20:40:59 Ok, anyone have anything else they'd like to discuss? 20:41:03 * thekad is being dragged away by his 2yo daughter bi5 20:41:04 dgilmore, as long as they don't have raisins and such in them 20:41:14 mmcgrath, see above 20:41:19 to replace this 20:41:22 https://fedoraproject.org/wiki/Infrastructure/Architecture 20:41:27 was in the talk some time ago 20:41:46 LinuxCode: yeah you were going to add docs to git.fedorapeople.org 20:41:48 there was a ticket, but not sure what happened to it 20:41:54 err git.fedorahosted.org/git/fedora-infrastructure.git :) 20:41:57 k 20:42:15 well I will have time now, but need you guys to explain to me exactly whats where 20:42:20 .ticket 1084 20:42:29 so I just ask some stupid questions now and then 20:42:39 LinuxCode: Do you have some time to work on it this afternoon? 20:42:50 its kinda late now 20:42:51 ;-p 20:42:57 21:42 20:43:01 LinuxCode: yeah I'll add some stuff. 20:43:17 k 20:43:20 for those docs I think it's less important on where stuff physically is, and more important on how the pieces fit together. 20:43:21 a list be ok 20:43:26 that give me a starting point 20:43:33 that's really what people are talking about when they do architecture 20:43:34 yah of course 20:43:40 to give people a better idea 20:43:40 LinuxCode: i'll update that ticket shortly actually 20:43:45 excellent 20:43:53 I have an open floor question 20:43:53 I think starting on the Proxy servers first would be a good way to go. 20:43:57 smooge: have at it 20:44:00 def 20:44:09 we talk another time 20:44:31 Someone was working on a inventory system earlier. Does anyone remember who it was , where it was, etc? 20:44:37 I can't find any reference versus IRC :) 20:44:44 inventory.... 20:44:51 kinda rings a bell.... 20:44:51 * nirik thinks it was ocsinventory. Not sure who was doing it tho. 20:45:12 smooge: I think it was boodle 20:45:14 i saw something on the list about ipplan. is that it? 20:45:15 .any boodle 20:45:20 boodle is a tool? 20:45:25 mdomsch: you work with boodle right? 20:45:25 boodle is a person? 20:45:35 boodle is a dude(le) 20:45:40 Heh 20:45:42 http://publictest10.fedoraproject.org/ocsreports/ 20:45:44 thats in my ticket 20:45:48 not sure if that helps 20:45:51 mmcgrath, yes 20:45:53 the machine aint up 20:45:55 LinuxCode, what ticket 20:46:00 mdomsch: he was working on the inventory stuff 20:46:02 https://fedorahosted.org/fedora-infrastructure/ticket/1084 20:46:05 scroll to bottom 20:46:12 03/16/09 20:36:44 changed by boodle 20:46:13 mmcgrath, I remember; I haven't seen anything on that in a bit 20:46:16 ha 20:46:22 LinuxCode, thanks.. my browser skills FAILED 20:46:22 yeah, since about then 20:46:32 smooge, haha 20:46:33 mdomsch: I just didn't know if he was still working on it or what 20:46:34 ;-D 20:46:41 butI think smooge has an itch to get it going. 20:46:48 and it's probably best to let him scratch it :) 20:46:56 smooge, go for it 20:47:07 just put a note in that ticket so he knows 20:47:08 that be something useful to me actually 20:47:15 bump the ticket 20:47:16 ok cool. mdomsch can you send me an email address so I can contact him too 20:47:19 to make those updated diagrams 20:47:24 smooge: What was the software you had experience with again? 20:47:33 exactly what he was using 20:47:35 I swear there was an inventory ticket he was working on 20:47:38 Oh 20:47:48 oscinventory? That might have ended... a bit poorly 20:47:50 mmcgrath, probably I have epic fail this week with searching 20:48:04 I remember one of the ones he was trying, I found bad security problems on a quick lookover 20:48:16 ricky, with the app ? 20:48:23 ricky: do you know what happened with pb10? 20:48:24 Yeah, grepping my IRC logs now 20:48:24 err pt10 20:48:34 * mdomsch has to run; later 20:48:37 mdomsch: laterz 20:48:46 http://publictest10.fedoraproject.org/glpi/ 20:48:49 there is that one too 20:48:55 also kinda rings a bell 20:48:57 yeah.. they tie into one another 20:48:58 I have no idea, it might have just not gotten started on reboot 20:49:05 smooge, kk 20:49:06 ocsng is the tool that polls the boxes 20:49:17 glpi is the perty front end where you can enter data 20:49:22 .ticket 1171 20:49:30 smooge: see that ticket as well 20:49:49 mmcgrath, that's the one 20:49:49 Yeah, OSC was the security hole one 20:50:27 geez I really failed 20:50:28 Like I was able to delete a row from some table without logging on or anything 20:50:30 .ticket 1084 20:50:37 I went looking for GLPI 20:50:41 that's the next one 20:50:43 I didn't look much closer at the security stuff after an initial look at it though. 20:50:44 ricky: nasty. ;( It's in fedora, you might note that to the maintainer. 20:50:44 ricky, did you report that ? 20:51:11 well 20:51:13 just the same 20:51:21 smooge: you want to open up an "inventory management" ticket? 20:51:26 mmcgrath: Looks like publictest10 just didn't get started on a reboot - should I start it again? 20:51:42 mmcgrath, put that down as an action please 20:51:46 ricky: sure, smooge might be able to use it 20:51:48 I will start on it right away 20:51:55 #action Smooge will create a ticket and get started on inventory management 20:52:14 ricky, we will see if the updated version has the bug and then work it out 20:52:31 * davivercillo need to go home now ! See you later ! 20:52:39 OK. I just remember getting a really bad impression from that and the other code, but hopefully some of this is fixed. 20:52:46 Good Night ! 20:52:54 davivercillo: ping me when you get time later 20:52:57 or tomorrow :) 20:52:59 or whenever 20:53:00 mmcgrath: for sure 20:53:13 bye 20:53:32 So we've only got 7 minutes left, anyone have anything else to discuss? 20:54:05 * ricky wonders if sijis wanted to say anything about blogs 20:54:09 sijis: anything? 20:54:13 abadger1999: or anything about zikula? 20:54:27 yeah, as you saw, the authentication part on the blogs is working. 20:54:54 Thanks for working on that plugin 20:54:55 mmcgrath: When should we get docs people started in staging? 20:55:01 we are able to also verify that minimum gropu memberships are met before allowing a login 20:55:05 I think they have all of the packages in review. 20:55:17 But htey're not all reviewed yet/some are blocking on licensing. 20:55:28 sijis, which groups are those? cla_done? 20:55:45 a person has to be in cla_done an another other non-cla group 20:55:46 Is http://publictest15.fedoraproject.org/cms/ really as far as they're going to take the test instance? Not trying to complain, but I'm just used to seeing slightly more complete setups in testing first 20:55:46 abadger1999: how long till the licensing is resolved do you think? 20:56:59 there are few minor things to work out.. but it should be ready to be tested. 20:57:12 ricky - we need to spend more time on pt15 - we largely haven't done anything with it in months. specifically we need to get all of the pieces that we have packaged, and beat on it 20:57:23 mmcgrath: I encountered problems in both packages I reviewed. One has been resolved (I just need to do a final review) the other is waiting upstream. docs has contacted several people related to that 20:57:27 Ah, cool, so maybe not quite staging-ready yet 20:57:46 ricky: hopefully not far off 20:57:55 ianweller also encountered some major problems in one that he reviewed -- but I think it might have been optional. 20:57:56 Cool, thanks 20:58:24 ke4qqq and sparks would know for sure. 20:59:01 we still have three (and maybe four) though that includes the one thats waiting abadger1999's final approval, that are blocked on licensing probs 20:59:01 abadger1999: hmm 20:59:08 abadger1999: what are the odds they won't be resolved? 20:59:36 ke4qqq: Want to field that? And any contingency if that happens? 20:59:59 mmcgrath: I think we'll workaround - upstream is pretty committed to fixing stuff 21:00:04 there is just a ton of stuff 21:00:15 21:00:23 Ok, so we're at the end of the meeting time, anyone have anything else to discuss? 21:00:31 just want to say hi before we end. Julius Serrano here. 21:00:40 jayhex: hello Julius! 21:00:43 thanks for saying hey. 21:00:43 welcome jayhex 21:00:48 jayhex: Hey, welcome! 21:01:06 jayhex: welcome. 21:01:29 Ok, if no one has anything else, we'll close in 30 21:02:06 #endmeeting