20:02:37 #startmeeting Infrastructure 20:02:37 Meeting started Thu Jan 14 20:02:37 2010 UTC. The chair is mmcgrath. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:02:37 Useful Commands: #action #agreed #halp #info #idea #link #topic. 20:02:39 damnit 20:02:41 Who's here? 20:02:47 * a-k is 20:02:48 * wzzrd is here 20:02:57 * nirik is hanging around in the back 20:02:58 * PhrkOnLsh . 20:03:07 * skvidal is here 20:03:28 Ok, lets get started 20:03:32 #topic Infrastructure -- Tickets 20:03:51 Looks like no meeting notes so that's good. 20:04:03 I'll go over just a couple of things that happened this week worth mentioning 20:04:07 #topic Fedora hosted 20:04:11 hosted got it's new memory replaced. 20:04:22 it wasn't as smooth as it could have been but it did get done and within the outage window time. 20:04:30 #topic SPOF for koji and bastion 20:04:36 koj and bastion are still SPOF right now. 20:04:47 mostly because we've not re-configured heartbeat. 20:04:54 but I'm also trying to re-think our vpn setup. 20:05:07 to be more robust with outage scenarios. 20:05:14 not allowing outbound udp makes that difficult. 20:05:32 though I suspect specific outound udp requests could be allowed. 20:05:37 #topic /mnt/koji 20:05:57 im waiting on the try-n-buy to be approved 20:05:57 * ricky is here for a bit 20:05:59 So this one's kind of happened behind closed doors, not intentional just lots of "do you think this would work?" "how much budget?" etc, etc. 20:06:15 but yeah, dgilmore's been working on trying to get what will be the new /mnt/koji 20:06:25 and I've been working to figure out what will be the new backup of /mnt/koji. 20:06:31 this is all good for several reasons. 20:06:52 1) it'll allow us to use all of our tapes for *everything* else and do daily, weekly and monthly backups as we should be doing. 20:06:52 mmcgrath: what's the plan for the storage for the backup? Still tape or are you switching to disks? 20:06:56 2) /mnt/koji will hopefully be much faster. 20:07:16 skvidal: switching to disks. I'm hoping we'll be able to make snapshots of it for testing. 20:07:35 but dgilmore and I also talked about using bacula to backup to disk. 20:07:52 in the end though I think it will be a better solution because if /mnt/koji does die right now a horrible death. 20:07:57 it'd take a long time to get a new /mnt/koji up 20:08:10 whereas if we have disks to backup to I think the impact would be lessoned. 20:08:21 I do, however, have concerns just because it's so easy to wipe disks. 20:08:31 mmcgrath: i agree 20:08:33 blowing away the /mnt/koji backup right now is a significantly more difficult task I think. 20:08:41 but at the end of the day I think it'll be worth it to us. 20:09:12 we'll be spending something like 5X what we spent for the /mnt/koji now so hopefully it'll all work out :) 20:09:26 ehh actually that's not quite true, probably closer to 4X 20:09:30 but still. 20:09:36 and ive requested extra budget for next yera to grow it and hopefully help it more 20:09:39 here's to hoping it's all fast and usable and safe and backed up :) 20:09:59 anyone else have any questions or concerns on that? 20:10:29 alllrighty 20:10:34 #topic ssh_known_hosts 20:10:42 smooge: around to talk about this? 20:11:18 the smooge is likely busy :) 20:11:32 After the move our ssh_known_hosts got way out of wack and instead of fixing it, he's been redoing it from scratch 20:11:41 and including more possible names so it should be even more useful then it was. 20:11:48 here 20:11:49 sorry 20:11:55 one sec 20:12:04 smooge: no worries, just wanted a quick blurb about ssh_known_hosts so people know we're working on it. 20:12:16 do our shells automatically use it, or do we need to do something in .ssh/config for it? 20:12:28 ok sorry. I have updated ssh_known_hosts to all systems I could find in DNS and 20:12:32 mdomsch: they'd use it automatically but you should remove your .ssh/known_hosts 20:12:36 removed ones that were no longer available 20:12:56 mdomsch: worst case, you'll get a conflict in known_hosts and you can just remove it. 20:13:07 ok 20:13:12 I have tested it on app01.stg and was able to go to hosts inside fedora. I have commited and pushed 20:13:16 but the search order is ~/.known_hosts then /etc/ssh/ssh_known_hosts 20:13:21 The only things that aren't in it are IPV6 addresses 20:13:26 smooge: excellent, thanks. 20:13:37 smooge: have you updated the SOP to match the new search order? 20:13:49 new search order? 20:13:57 and which SOP 20:14:04 thats the default search order 20:14:09 http://fedoraproject.org/wiki/Infrastructure/SOP/ssh_known_hosts 20:14:19 smooge: I think right now we just have short hostname, and actual IP of the host. 20:14:36 ah ok will fix 20:14:39 in the SOP I mean, you expanded on that so it'd be good to get it in the example :) 20:14:42 smooge: thanks 20:14:46 Ok, any other questions on that? 20:15:02 zodbot: hey look, fedbot's a big quitter. 20:15:02 ok 20:15:06 #topic Search Engine 20:15:08 a-k: take it! 20:15:19 The two search candidates I wanted to put in public test are Xapian and Nutch 20:16:00 Xapian installed fine and crawled the wiki for 90 minutes, then died 20:16:00 I think it's fixable, so I still have some hope for Xapian 20:16:01 a-k: has the pt instance worked out for you so far? 20:16:01 For Nutch, I've got Tomcat installed, but not configured yet 20:16:01 This is pt3, BTW 20:16:10 I haven't checked if Tomcat's ports are open 20:16:14 If you need more ram or disk let me know, sometimes we're stuck but sometimes not 20:16:31 Yeah, pt is working fine 20:16:44 I intended to reverse proxy Tomcat through Apache, so no need for extra open ports 20:16:46 a-k: I've never used any of them, what are the key differences? 20:17:25 Xapian is pretty flexible and customizable, so the custom keyword requirement should be satisfied, wherever that goes 20:17:38 then died a good death or a bad death? 20:17:46 Nutch is very pre-packaged, so not so flexible 20:18:05 Xapian didn't like a long URL, plus it may not like non-UTF8 so much 20:18:13 ahhh 20:18:20 interesting 20:18:26 The URL thing I think is fixable with how I do the crawl 20:18:37 20:18:45 Plus, I still intend to keep looking at other candidates on my list 20:18:51 maybe not the non-UTF8 thing though huh? 20:18:57 So these aren't necessarily the only two choices 20:19:01 sure 20:19:05 Both java solutions? 20:19:07 * mmcgrath is happy to hear progress is being made. 20:19:15 a-k: did we look at the one archive.org uses 20:19:16 ? 20:19:23 Yeah, UTF-8 could be the thing that kills most of the candidates, if that's going to be a real requirement 20:19:27 a-k: yeah what was the wiki link again? 20:19:52 why wouldn't utf-8 be a real requirement? 20:19:53 dgilmore: archive.org uses arhiving, not indexing 20:20:04 Or xapian is the C++ solution. /me was thinking lucene. 20:20:08 skvidal: I think non-utf-8 is the possible requirement 20:20:13 .link http://fedoraproject.org/wiki/Infrastructure/Search 20:20:24 #link http://fedoraproject.org/wiki/Infrastructure/Search 20:20:32 a-k: thanks 20:20:36 a-k: they have a crawler http://crawler.archive.org/ 20:21:00 Xapian is C, with a little Perl 20:21:08 Nutch is Java, hence Tomcat 20:21:25 dgilmore: I can look at archive.org again 20:21:47 I think that's it, unless there are more questions 20:21:54 a-k: just because I know people will ask, can you make sure that those that have been eliminated have a specific "eliminated because: " section? 20:22:14 I can do that 20:22:30 * nirik wonders if any solutions here will tie into the mailman/lists archives? 20:22:38 eliminated because: It kills kittens and makes you eat them 20:22:40 to replace pipermail. 20:22:57 why would a search engine replace pipermail? 20:23:07 smooge: he means a new archiver 20:23:20 nirik: I don't think any of the others are any more maintained 20:23:26 a-k: if you want additional help feel free to ask on the list and recruit :) This is a pretty massive project. Especially for your first one for Fedora. If you need anything feel free to ask :) 20:23:46 Sure. Thanks. 20:23:53 anyone have anything else right now? 20:24:02 sure, personally I think pipermail is ok, but there is a ticket wanting us to replace it with monharc or something. 20:24:20 * dgilmore is ok with pipermail 20:24:46 nirik: mhonarc is more trouble than its worth, ime 20:25:03 mmcgrath: logging? anyone interested in working on it? 20:25:08 * nirik nods. perhaps 3.0 will come out someday with pipermail improvements. 20:25:19 I'm kinda working on it, fwiw 20:25:20 nirik: perhaps monkeys will fly out of my arse 20:25:36 skvidal, I am interested in doing it 20:25:46 the logging that is 20:25:46 I was starting to do it after I finished DNS/NTP 20:25:52 skvidal: sure 20:25:56 #topic logging 20:26:03 wzzrd, what are you doing 20:26:05 smooge: okay 20:26:07 skvidal, what did you see 20:26:11 there are a number of things to work on 20:26:13 So logging is a pretty large topic with lots of sub parts 20:26:16 first - cleaning up the logs we have 20:26:19 skvidal, or would I be too many cooks 20:26:24 lets start on what I gave wzzrd a week or two back. 20:26:29 okay 20:26:33 which is log analysis. 20:26:40 which is the wrong place to start 20:26:41 wzzrd: have you had a chance to look at some of the suggestions on th elist? 20:26:44 until you have logs under control 20:26:55 skvidal: none of which will get done while this meeting is going on :) 20:27:15 i've checked out epylog a bit further 20:27:16 umm - if you don't want to discuss logging, that's fine 20:27:26 * skvidal is sorry for making the meeting longer 20:27:54 skvidal: keep your pants on we're talking about logging right now, I'm just trying to do so in a way that doesn't discourage wzzrd, a new potential sysadmin member. 20:28:20 wzzrd: any luck with it? 20:28:24 but i want to raise the matter of realtime parsing vs. cron-based once-a-day parsing, before i dive into something head over heels 20:28:35 I have very little experience with it but I know skvidal has used it as has Jeff_S and some others. 20:28:45 I don't think we need realtime parsing at this time. 20:28:51 wzzrd: realtime parsing and analysis are seldom done by the same tool 20:28:51 at least I don't think it would buys us much. 20:29:11 epylog is nice, but i think it woulde require the logs from a group of servers going into *one* file on the loghost if you want a single report for that group 20:29:17 wzzrd: and realtime parsing is handy if there are specific triggers you know to look for - but only useful insofar as they can raise a warning in nagios 20:29:21 but yeah, I haven't heard anyone really argue for realtime so we can just assume non-realtime for the moment :) 20:29:38 mmcgrath, skvidal: ok, no real-time 20:29:40 just making sure 20:29:40 epylog can't look at multiple files? 20:29:50 mmcgrath: yes, it can - but it requires editingits configs 20:30:06 mmcgrath: remember what I was saying on the infrastructure list about mimicing the file structure of /var/log 20:30:11 mmcgrath: I wasn't making that up :) 20:30:12 skvidal: just edits though? not like some crazy bastadrization? 20:30:21 mmcgrath: significant edits 20:30:26 skvidal: I still have no idea what you're talking about with that? 20:30:37 look in /var/log on your laptop/desktop 20:30:38 you mean I should have a /var/log/messages that has all messages from all of our hosts going int o it? 20:30:43 no 20:30:53 so I should have a /var/log/hosts/bastion/messages? 20:31:01 I know what is in /var/log/ on my laptop 20:31:03 you understand there are certain files that commonly exist in /var/log 20:31:05 good 20:31:10 yeah? 20:31:12 like funcd 20:31:16 which won't exist on log1 20:31:21 and those files are expected to havecontent consistentwith a lot of log parsers 20:31:35 skvidal: epylog.conf doesn't allow it afais, and I don't think it comes with a module that allows for the parsing of multiple files 20:31:36 funcd doesn't log via syslog in that way 20:31:45 isn't that creating just the opposite of what wzzrd is talking about though? 20:31:45 wzzrd: It really does. 20:31:48 mmcgrath: no 20:31:54 you're talking about creating more files and he's needing less 20:31:59 no 20:32:01 I'm not 20:32:09 if y'all would let me explain 20:32:19 instead of peppering me with remarks 20:32:21 It might help 20:32:31 * wzzrd shuts up 20:32:31 we want logs per-host 20:32:42 but we also want logs per-by-service/group 20:32:54 so let's say all of the app servers belong to the appgroup 20:33:47 we can setup rsyslog so that if a log comes in from app01 (for example) that it gets sent to /var/log/hosts/app01/2010/01/14/ AND to /var/log/groups/app-servers/2010/01/14 20:34:10 then if we want to do log analysis for the appservers we tell epylog to look at /var/log/groups/app-servers/2010/01/14 20:34:32 if we want it to do analysis for a specific app server we tell it to look at: /var/log/hosts/app01/2010/01/14/ 20:34:47 epylog doesn't understand /var/log/hosts/app* ? 20:34:52 inside each of those dirs will be the syslog files normally generated by /var/log 20:35:23 mmcgrath: not when it is parsing log files - it expects the log files to be in the normal location relative to the base log path 20:35:52 I guess I'm not understanding how what is in my laptop in /var/log isn't what's in /var/log/hosts/app01/2010/01/14 20:36:02 okay 20:36:05 let's look at an example 20:36:29 login to bastion 20:36:41 cd /var/log 20:36:43 ls *log 20:36:47 ls *log 20:36:47 anaconda.log boot.log ha-log ldirectord.log sa-update.log yum.log 20:36:47 anaconda.syslog faillog lastlog maillog tallylog 20:37:14 the files syslog is writing are the only ones we can deal with for a remote logging server 20:37:33 so messages, maillog, spooler, boot.log and cron 20:37:38 that's all we have access to 20:37:41 now 20:37:47 if you look on log1 20:37:51 in 20:37:54 /var/log/hosts/bastion01/2010/01/14 20:37:55 for example 20:38:00 cron.log kernel.log mail.log messages.log secure.log 20:38:04 you have all of those files 20:38:21 do you see how those files do not match the filenames and separation that are normally in /var/og? 20:38:26 ie mail.log vs maillog 20:38:30 kernel.log existing AT ALL 20:38:36 messages.log vs messages 20:38:47 oi 20:38:51 yeah, so you're talking, mostly, about renaming the files? 20:38:52 that's what I mean by the difference 20:39:04 k 20:39:09 mmcgrath: and what content goes into them 20:39:19 ie: kernel.log shouldn 20:39:22 't exist at all really 20:39:28 it's content should be in messages 20:39:28 skvidal: crap you were right, i was looking in the wrong place... i didn't quite grasp how epylog's internals worked yet, i suppose... 20:39:32 k, I don't see any problem with that. 20:39:47 that's what I mean about fixing the structure of our remote logs 20:39:51 then once we do that 20:40:01 and we log by 'type of server/service' 20:40:04 wzzrd: k, you want to continue working and learning epylog? 20:40:12 then we can run generic log tools like epylog and generate lovely results 20:40:19 mmcgrath: sure, eager to help out 20:40:33 w/o having to beat our brains out modifying epylog to access logs we don't want 20:40:42 skvidal: so in theory the amount of logs we store is going to about double. Which do you think we should keep longer? the host level logs or the service level logs? 20:40:54 mmcgrath: it's not going to double 20:41:08 right now we've made the mistake of doing *.* from syslog.conf on our logclients 20:41:13 instead of trimming the crap out 20:41:22 no one needs spooler.debug sent remotely 20:41:23 how do you know what to include and what not to? 20:41:31 years of experience? 20:41:32 :) 20:41:44 seriously- you keep warning and above 20:41:51 and drop a lot of the info and debug crap 20:41:59 skvidal: but still, lets say we only sent what warning and above right this second. 20:42:01 we can get rid of a lot of crap that's not helpful 20:42:09 when we start storing services too, we're storing all logs twice right? 20:42:25 sure but it's just not that much content 20:42:26 skvidal: so, for example, we wouldn't be sending mail logs? 20:42:37 I think we should send maillogs 20:42:56 unless you want to do mailog analysis ON the mailservers 20:43:06 which seems like a bad use of their cpu time 20:43:09 I've never setup a central logger that didn't store everything, skvidal do you happen to want to take lead on trimming that stuff down? 20:43:15 i should just stop getting email and there would be alot less log data 20:43:17 I'd prefer to keep log analysis on the logger. 20:43:24 dgilmore: that's very true :) 20:43:28 mmcgrath: sure 20:43:35 mmcgrath: smooge you wanna work w/me on this? 20:43:45 I am happy to. I love log analysis 20:43:51 skvidal: yup yup. 20:43:52 smooge: I'm glad I'm not alone :) 20:44:09 so on this same topic... there's still one thing I'd like to get converted. 20:44:14 most of our hosts are still not using rsyslog 20:44:22 I'd like to convert them to rsyslog, some have been but not most. 20:44:25 mmcgrath, on my list of things to fix after ntpd 20:44:25 * mmcgrath is just mentioning that. 20:44:29 mmcgrath: for rhel5? 20:44:31 it could be as easy as yum install rsyslgo 20:44:35 did rhel5 switch to rsyslog? 20:44:36 skvidal: yeah 20:44:48 okie doke 20:44:48 mmcgrath, skvidal it is basically 5 commands 20:44:49 skvidal: it does for fresh installs, but if you updated, it didn't do a replace. 20:44:58 1) yum install rsylog 20:45:04 as of 5.3 I think. 20:45:05 smooge: no problem - I just hadn't heard the switch was official in rhel5 20:45:14 its an alternate 20:45:32 syslogd is still prefered because of age 20:45:51 oh wait.. I missed the new isntall part 20:46:14 smooge: is this something that's going to be possible / easy in puppet or are we goign to have to get func involved? 20:46:42 func migth be easiest 20:46:47 * mmcgrath thinks perhaps he's been overthinking it. 20:46:54 smooge: what about new installs? 20:47:04 well, we can figure that after the meeting I guess :) 20:47:08 yeah.. 20:47:20 wzzrd: any other questions on your side? 20:47:31 does anyone have any thoughts about exactly what we're looking for in these reports? 20:47:47 well, im not sure whether this ok to ask 20:48:14 you can ask whatever you want. If it's for the root password though we probably won't answer. 20:48:31 i think it would be easy to have some sort of mentor, 20:48:40 you know, to ask some questions to 20:48:41 mmcgrath: for mail- I'm looking for errors and to make sure we don't have too much overrun/disk/cpu issues, for the rest of systems I'm looking to start getting a baseline on what 'normal' looks like and then fixing up problems 20:48:44 wzzrd: ask me 20:49:01 skvidal: great, thanks! 20:49:07 I'm around often and I know a fair bit about the epylog code base 20:49:18 and I know the author of epylog personally and am willing to annoy him 20:49:19 skvidal: you seem to be pretty well informed in this logging business :) 20:49:36 lol 20:49:43 wzzrd: yeah for epylog ask skvidal I have no experience with it. If you have questions about fedora or how we're doing something just ask anyone in #fedora-admin, skvidal smooge and I are almost always in there. 20:49:56 ok appreciate it 20:50:07 wzzrd: skvidal, mmcgrath, myself, ricky, smooge and others will be more than happy to answer questions 20:50:20 Ok, anyone hav any other questions on logging? 20:51:01 not me 20:51:08 ok, with that I'll open the floor for anything and everything 20:51:18 #topic Infrastructuer -- Open FLoor 20:51:21 jds2001: you around? 20:51:27 our newest -main member has been pretty quiet. 20:51:49 mmcgrath: i wore him out on Saturday 20:52:01 hehe 20:52:03 oh! i know one thing. 20:52:04 dgilmore: umm - that sounds 20:52:05 umm 20:52:06 wrong 20:52:26 I'm still in the process of getting our secondary,alt,archive stuff to download.fedora.redhat.com 20:52:27 skvidal: sure. 20:52:32 :) 20:52:40 skvidal: i made him work alot on saturday while migrating the lists 20:52:41 ;) 20:52:43 After the move we have root squashed for all /pub content (which is good) 20:52:45 that better 20:52:53 but now I can't do wnything with my dirs that are there becaues they're root owned. 20:53:14 Heh, yow :-) 20:53:23 opps 20:53:27 yeah 20:53:33 I do still have several concerns. 20:53:36 same as smooge 20:53:41 will the disks be able to keep up? 20:53:59 will the snap-mirror process work correctly with the new load? 20:54:45 Ok, anyone have anything else? 20:54:51 If not we'll close the meeting in 30 20:54:52 concerns? I have no concerns.. I have reality 20:55:01 cold clear that it probably wont keep up 20:55:07 smooge: yeah, I have a feeling we're in for a ride here. 20:55:13 but who knows, we might get surprised :) 20:55:17 mmcgrath: will this change what I have to do to push the nightly live composes? 20:55:24 yeah.. I am buying lotto tickets just in case 20:55:55 nirik: it'll just change were you write to 20:56:07 nirik: hopefully for the better because it'll be all mirrored and stuff 20:56:11 cool. 20:56:47 ok, with that! 20:56:49 #meetingend 20:56:53 #endmeeting