We recently ran across a problem in production that we could not replicate in lower environments. Since this is not only a high use application, but an exceptionally “chatty” app, searching the logs was an excersise in futility (*one* of yesterday’s production logs was 6,975,291 lines long, with multiple logfiles per app, multiple apps and multiple servers).
So how do you find a needle in the haystack? Get a smaller haystack. In the quickest window possible, perform the following three steps
- tail log1 log2 log3 log4 >combined.log
- reproduce error
- ctrl-c tail process as quickly as possible
Doing so reduced our 11,480,799 lines (with 780,527 errors) to 1200 lines.
My employer is currently looking for a sysadmin. If you’re interested, contact me for details.
SR SYSTEMS ENGINEER ROLE IN FARMINGTON HILLS, MI
We are looking for someone who will administer web hosting Linux systems infrastructure, including server hardware, operating system, enabling software, and application software/data for Internet-facing application systems. Direct other departments’ work on dependent systems such as network, firewall, load balancer, and external storage systems. Provide consultative expertise for our businesses to provide technical guidance, standards, knowledge and understanding of business and technology processes, and integration of technologies to deliver Internet-facing learning products and services.
The position will be responsible for systems configuration, implementation, administration, maintenance, and support, along with application integration, and troubleshooting for our eLearning systems.
The role encompasses daily operational systems support in development, QA, and production tiers. It also encompasses project work with business units, developers, test labs, end users and other groups involved in the planning, development, integration, testing, and problem solving for applications, content, and data.
- Ensure maximum uptime of hosted environments, including production, staging, testing, authoring, and development environments. This includes, but is not limited to ensuring the HW is configured properly; is secure; is networked properly; is backed up per company standards; is monitored accordingly; is tested to ensure operability; and is built to company standards.
- Act as a consultative resource for our businesses to provide technical guidance, standards, knowledge and understanding of business and technology processes, and integration of technologies for content management and delivery.
- Assist with integration efforts, including planning and coding where necessary in Apache, Tomcat Java, and MySQL database technologies, and scripting languages.
- Assume lead role in complex problem solving in hosted environments, offering meaningful solutions and implementation strategies. Engage other departments and direct their work on supporting systems such as network, firewall, load balancers, and external storage. Engage application teams with analysis from logs and data on the servers, and provide recommendations for problem resolution.
- Be part of an on-call rotation schedule that includes carrying a pager/email device 24/7. Respond to all alerts immediately and inform management of issues and work being performed to remedy the problem. Direct escalation to engage additional resources if required to troubleshoot and resolve a problem.
- Monitor, analyze, and report performance statistics for web hosting environments. Troubleshoot hosting environment failures and manage / assist in the development of solutions to these problems. This includes not only overall environment / platform problems, but also includes problems affecting individual client accounts (i.e. data integrity, reporting, security, etc.).
- Analyze web hosting environment averages and peak workloads / throughput compared to existing capacities and plan required accommodations to address environmental growth. Take necessary corrective actions (both scheduled and unscheduled) to proactively address potential problems before they become operational / environmental problems. Notify Manager of projected needs and actions taken.
- Ensure security of systems, including standard server build and lock-down procedures, and monitoring security access to systems.
- Review system logs regularly, report and research warnings and errors. Review system logs for backup completions and report any discrepancies.
- Execute implementation/migration of new software and application versions across the development and staging and production environments and prepare back out plans on all platforms to be updated. Ensure adherence to established Change Management and QA procedures. Verify results with appropriate parties.
- Work with peers and other departments to analyze ongoing processes and procedures. Where relevant, propose / design improvements to operational processes.
- Keep up to date with developments in the e-Learning / web-based information technology field through educational and other information resources and make management aware of possible applications for new technologies.
- For new web hosting infrastructure projects, act as technical lead for planning and implementation. Mentor and train junior team members in all areas of IT expertise.
- Bachelor’s degree in Information Systems, Computer Science, Business or Engineering or equivalent job related experience.
- Must have an excellent command of:
- Red Hat Linux Operating System
- Apache Web Servers
- Tomcat application environment running Java
- MySQL Database Server
- MarkLogic Content Management Systems
- Must possess experience designing, building, maintaining, migrating, tuning, administering, and supporting three-tiered web/application/database server environments
- Experience with Internet access and security for servers residing within a DMZ
- Must have excellent written and oral communications, including technical documents, and process documents.
- Must possess excellent problem-solving and analytical skills and be able to translate business requirements into information systems solutions.
- Able to translate business requirements into technical recommendations for information systems solutions.
- Must possess excellent problem-solving and analytical skills; ability to assist with network, system, and application troubleshooting required.
- This position demands a well-organized, action-oriented team player with the ability to prioritize daily work, change directions quickly, coordinate geographically dispersed team members and work on multiple projects simultaneously.
- Comprehensive knowledge of problem analysis, structured analysis and design, and programming techniques.
- Coding and scipting skills for a RedHat/Apache/Tomcat/MySQL environment, clustering and other high-availability architectures, TCP/IP, along with various server management and administrative tools.
- Ability to work with minimal supervision, engaging peers and other departments to accomplish assigned goals and effectively manage projects in a cross-functional environment.
Administer web hosting infrastructure, including server hardware, operating system, enabling software, and application software/data for content management systems. Direct other departments’ work on dependent systems such as network, firewall, load balancer, and external storage systems. Provide consultative expertise for our businesses to provide technical guidance, standards, knowledge and understanding of business and technology processes, and integration of technologies to deliver Internet-facing learning products and services.
The position will be responsible for systems configuration, implementation, management and support, along with application integration, and troubleshooting for our MarkLogic-based Content Management Systems. The role includes installation, configuration, administration and maintenance of the content management environment and integrating new systems and products into the platform.
The role encompasses daily operational support of the content management systems and application environment in development, QA, and production tiers. It also encompasses project work with business units, developers, test labs, end users and other groups involved in the planning, development, and testing of products, content, and workflows in the content management systems.
New job, new laptop. Many utilities here are windows only, so it requires a bit of… effort… to get myself up and running efficiently. The solution to the windows problem is VirtualBox. I had set this up on my last laptop with little effort, but this time around required a bit more effort. Hopefully the instructions below will help others get up and running quickly.
Disclaimer– your laptop may catch on fire and explode (or worse) if you attempt this… or something.
We’ll be presuming that you’ve already resized your windows partition and have both a working Windows and Linux partition.
Log into XP, grab MergeIDE.zip from Virtualbox’s site, extract and run it. It should be a quick flash and be done. (Note: I am not 100% sure this step is needed)
Create a new hardware profile and name it virtualbox. Make sure to set it as a choice during boot. Try rebooting into native windows once to ensure that it does offer you profile options.
You’ll need the following packages installed (May differ for non-ubuntu systems):
mbr, virtualbox-ose, virtualbox-ose-qt
Create a stand-alone mbr file to use for booting (yes, you need the force flag):
install-mbr ~/.VirtualBox/WindowsXP.mbr --force
We’re presuming that your windows partition is /dev/sda1. In the below command, we are defining
- a vmdk file (WindowsXP.vmdk)
- which raw disk to read (/dev/sda)
- which partition (1)
- the new MBR file we just created
VBoxManage internalcommands createrawvmdk -filename ~/.VirtualBox/WindowsXP.vmdk -rawdisk /dev/sda -partitions 1 -mbr ~/.VirtualBox/WindowsXP.mbr -relative -register
Note that you’ll need read/write access to that drive as your user, so you may want to figure out a cleaner/securer way to implement this, rather than adding your user to the disk group (which is very dumb and insecure). I would, but it’s working and I have more important things to do at the moment.
Another issue- apparently thinkpads report the drive heads and cylinders oddly (T410 for me and T60p in article), so we have to add some vmdk settings before virtualbox creates them incorrectly. Open ~/.VirtualBox/WindowsXP.vmdk and add the following at the bottom:
The biosHeads appears to be the magic value- it seems to work if it’s set to 240, but the default is 255 (which fails).
Once you add those, start up virtualbox and check the virtual media manager, your new vmdk should be listed there. Once it’s confirmed, create a new virtual machine. Rather than creating a disk, select your vmdk as an existing disk.
After you finish, go the the VM settings->system and make sure the motherboard tab as io-apicÂ enabled (I also had PAE/NX enabled under processor and VT-x enabled under Acceleration).
Start the VM
There are several errors that could pop up. I’m sure there are plenty more that I stumbled across, but these were the two big ones:
- a disk read error occurred, press ctrl+alt+del to restart – Caused by incorrect biosHeads- check and make sure it’s set to 240 (this was the fix for me, results may vary).
- Complaint about kvm/vmx – Virtualbox does not like kvm. Uninstall qemu-kvm.
If things go well, it should flicker mbr in the corner, then go to the hardware profile selection screen. Select the virtualbox profile, and continue, then log in.
What follows is a half-hour of installing generic drivers and dealing with hardware specific auto start apps complaining that they won’t work on this installation. Windows will warn that the new drivers are not blessed, so be forewarned.
Once completed, at the top of the VM windows select Devices-> Install Guest Additions. This will download and mount an ISO, and windows will pop open a folder with the addition executables. Select the one best for you and run the installer. It will prompt you for video and mouse drivers (and trust me, you want them).
The final step is to shut down the windows VM, then reboot into the native windows partition to make sure it still works.Â I did receive a few blue-screens before logging in at the beginning, but they appeared random and haven’t happened since.
And that’s all there is to it- simple, eh? Your windows partition should now run in native mode and vm mode.
So, I’m home sick again. Fourth time this year I’ve been sick. Cold, flu, cold, Bronchitis. Awesome. Will I get to rest today? No, of course not.
My server (Unicron) has been up and running for 2 years now- I got the parts right after Ian was born. I set up a nice software raid array at the time that’s served me well. I’d never set up a raid array like this before, so I wasn’t really sure how to monitor it. The raid array has been running fine for 2 years, so I just sorta let it slide.
Over this past weekend, I did some work resizing the lvm partitions and had to poke around with the raid stuff. I found not one, but two ways to monitor it one was to set up a monitor with the mdadm tools and have it email me if there was a problem, and the other lead me to create a simple nagios monitor. I set both up sunday night.
Flash forward to this morning- jackie wakes me up, asks me if I’m going to work (I’d come home sick the day before and was still out of it.) My intention was to wake up long enough to IM my manager and supervisor and let them know I was gonna be sick. I do so, then minimize the im stuff. Staring me in the face was the following:
Wait, wait, wait- my script must be crappy, there’s no way the raid array choked right after setting up the monitoring. I sorta go into denial and check my email:
This is an automatically generated mail message from mdadm
running on unicron
A Fail event had been detected on md device /dev/md2.
It could be related to component device /dev/sdc2.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md2 : active raid5 sda3 sdd2 sdc2(F) sdb3
1461633600 blocks level 5, 64k chunk, algorithm 2 [4/3] [UU_U]
md3 : active raid1 sdc1 sdd1
979840 blocks [2/2] [UU]
md1 : active raid1 sda2 sdb2
979840 blocks [2/2] [UU]
md0 : active raid1 sda1 sdb1
192640 blocks [2/2] [UU]
So here I am, praying it was a hiccup while I reboot and rebuild my raid array. it looks like sdc2 went nutty about 45 minutes after I went to bed. I restarted the server and sdc2 reappeared, and I’m rebuilding now to see what happens,
md2 : active raid5 sdc2 sda3 sdd2 sdb3
1461633600 blocks level 5, 64k chunk, algorithm 2 [4/3] [UU_U]
[=================>…] recovery = 85.0% (414436248/487211200) finish=26.8min speed=45182K/sec
One thing is for sure, I need to get a auxiliary drive in case this one goes kaput for real. I said I’d buy one in june… 2007. suppose I better get one that, huh?
My apologies of this was nonsensical, I’m really tired.
So I’m testing a nifty new wordpress plugin… check this out:
I wonder if it’ll work?
update: no, no it will not.
So I’ve been pretty quiet since I hit 100k words- what’s been going on?
- Round of layoffs at work
- Friend diagnosed with cancer
- Another round of layoffs at work.
- Jackie became a pampered chef consultant
- Finances have been wiped out from christmas and getting her PC stuff off the ground.
- 10% paycut at work
- Guitar lessons are now done because no one can afford them.
- Have been reading Manuscript Makeover for ways to improve my book
- Decided to do an initial cleanup of the first draft of my script, then rewrite the outline before starting draft #2
- started yet another opensource project- this time it’s a collection of Nagios Plugins.
So I’ve been pretty busy. I’ve finished the cleanup of the first two chapters of book 1; hopefully I’ll finish the rest shortly, but it’s very slow going. We’ll see where things head in the next few months- I expect more crappiness.
anyone know of any good jabber clients for the blackberry? I’ve tried a couple with little luck, and most of them cost more than I can afford for this test. Features required
- Must run on BlackBerry 8703e v4.1.0x
- Connection server can be configured differently than jid address (i.e. email@example.com for jid, jabber.morgajel.net for connection server.) This rules out Mobber as far as I can tell
- Requires SSL/TLS
- Non-strict cert checking
Let me know if you have any suggestions.
My internet connection.
SO here’s the scoop
5 days until cutover:
I call AT&T, tell them I’m moving and need to transfer my Static IP DSL service on the 30th(Monday). Tech says no problem it’s all set. I am pleasantly surprised at how little of a hassle it was and that it was way smoother than any other interaction I’ve had with them.
Saturday, 2 days until cutover:
We’re planning on doing the actual moving Sunday morning and plan to spend Saturday packing and planning. However at 3am Saturday morning, the internet connection drops, leaving me unable to contact many of the people who may be able to help us move. It sucks, but ok, we can work around it. I still have enough people to get by with and have ways to contact most of them. Since we asked to be connected on Monday, maybe they had to disconnect the old line the day before in order to get their stuff in place. Maybe they cut it on Saturday rather than Sunday because nobody works on Sunday. I get that, I can understand it. While annoying, it’s still better than my previous interactions with them.
Sunday,move day, day before cutover:
We move on Sunday and realize that we never actually checked to see if the house had any phone cables in it. It didn’t. Fortunately my father-in-law knows a bit about phone installation and was able to help me wire up a stub for the AT&T guy to connect to.
Monday, 1 day after move:
AT&T shows up, runs cable, says service will be enabled withing X hours. yippie.
Tuesday, 2 days after move:
Connection is there, but my ip address has changed. “crap,” I think, “now I gotta update dns entries for our sites.” But I can understand this, perhaps my old static ip was tied to the network near my old apartment and didn’t reach this area. I can buy that. So I change my DNS entries… and they don’t work. I look again and I apparently mistyped the IP because the new DNS entry doesn’t match the external IP on the router. So I change it again. and 20 minutes later the external IP has changed again.
They had me on a fricking dynamic IP. For the non techies out there, large ISPs only have a limited number of ip addresses, and more often than not don’t have one for every customer. Since few customers stay online 100% of the time, they can take addresses away from people not using them and redistribute them as needed. This is called a Dynamic IP account. For people who run servers from their homes, keeping the same IP is important, so when your computer goes to connect to morgajel.com, it needs to be able to find the right IP address. That’s why I pay extra for AT&T to guarantee me the same IP address. That is why I am pissed. While there are ways to get around this (dyndns) but they’re a pain in the ass an not an option for me since I run an IRC server as well.
So I tinker around, thinking maybe *I* did something wrong- maybe my router was reset and it cleared the static info. I dig around with Jackie’s help and find the original documentation and try to set up the networking listed manually. No dice. Then I remember that yes, they did manage the info via the PPOE settings, and that just required a user name and password, which is what I was originally using. I switch it back and get yet another dynamic IP. I should point out that my static IP range was 75.x.x.x, while the dynamic stayed in the 66.x.x.x range- this made it easy to keep track of what was going on.
So I call them up and surprise surprise, they screwed up. See, they don’t really transfer accounts so much as shut off the old one and create a new one. The tech didn’t bother to notice I had a static account and replaced it with a dynamic account. I’m livid at this point, and tell them that it needs to be switched back. “Ok, I’ll put in the order. It’ll be ready in 10 days.” Now, this should NOT take 10 days from a technical point of view, this is all red tape causing the delay. But WTF can I do, so I say hell with it and go along with it.
At some point my father-in-law comes back over to help with the baby gate and notes that the technician illegally ran the line through the neighbor’s yard. While I’m half tempted to yell at them to fix it, I just wanna get a connection up and running again so I can actually write about the house.
Saturday, 5 days after cutover (timeline gets a little fuzzy here)
Connection is still flaky, but generally working. I call to check on the status of the static IP order, and find out it was never placed. They’ll get right on that.
Sunday, 6 days after cutover
Connection goes down at 7:37am. Completely. It does not come back. Jackie calls tech support this time. Flames, brimstone cries of the undead ensue. Eventually I take the phone and find out there’s still no mention of a static order of any sort for our account. Guess what? They can’t do anything about it because “orders” isn’t open on weekends. They agree to send out a tech to look at the line since they can’t see the modem from their end. He should be out between 8am and noon on monday
Monday, 7 days after cutover:
Connection begins working again around 7am- I think to myself “great, maybe they just took it down to switch over to the static IP- finally I can get my stuff up.” Nope, still a dynamic IP address. I call AT&T to get the static IP address set up and let them know the connection is up. They say hold off until the technician confirms it’s not an issue. ok. I’ll call back later. I spend my time waiting for the technician looking for any other ISPs in the area on dslreports.com
Technician comes out, nice guy, doesn’t see anything wrong, says he’s seen this behavior before when switching from dynamic to static, but the business won’t fess up to it. Whatever. At least the wiring was good, presuming that both the installer and the inspecting tech were both competent. While he was tooling around, I found out that Cyberonic, my ISP from DC, covers this area (they didn’t in grand rapids or rochester hills). They resell business class Covad lines to residential customers. I contemplate switching over to them, but figure it would be too much effort since I’ve gotten this far. I’m not even sure they’d have a decent plan in this area.
So he leaves and I call AT&T back and get the static all set up. She also said the static IP would be in place tomorrow. Just as we’re finishing she informs me that since I don’t have a contract, my payment will go up to $70 a month from $55. “WTF, this isn’t my screwup- you guys said you could transfer service, then you pooch it, then you want to charge me for it??”
“Oh, no,” she says, “When we transfer service, we don’t transfer contracts. If you want the original rate, you’ll have to sign up for another year of service.”
This is where Jesse snaps.
“You know what? Fine, make it the month to month price, because it’ll take me about 3 weeks to get covad in here.” She was a bit shocked by that statement, and the conversation ended awkwardly. I think she was supposed to ask if I was please with my experience but she knew the answer.
I then spent 10 minutes looking through DSL reports for ISPs in the area and narrowing down their plans- turns out that Cyberonic offers the same plan I had in DC for $60. Lets compare the plans side by side:
|IP address||5 static||5 static|
I call up cyberonic, phone is picked up on the 3rd ring. I tell the technician that I’m interested in their plan, I get signed up, cc infos taken, etc. The entire call lasted 22 minutes and 28 seconds. I was never transferred once, my call was never dropped, the technician never once said “I don’t know,” and they were going to do a hotswap on the line and cancel the AT&T DSL for us since we obviously can’t have 2 DSL services on the same line. The transfer should take place in the next 7-14 business days.
I’d like to point out that AT&T still hasn’t got their act together as of this morning (Thursday), and dropped my connection while I was beginning a deployment for work. That was real awesome btw. Thankfully my neighbor is allowing us to use his wireless connection until we get it straightened out. If the issues aren’t resolved by switching to cyberonic, I’ll have the neighbor report the cable crossing his yard and they’ll have to come out and redo it (this is my backup plan).
The good news is we’ve moved our blogs to gopedro.net. I’m still in the process of converting them, but expect to be done by next Monday. The only site that will still point to my static IP is morgajel.com, for my streaming music, IRC server, etc. We’ve also decided to move all of our pictures to flickr, so expect to see broken images for a while.
I really want to thank gopedro.net for in all of this. I highly recommend them for any domain name purchases or hosting. They’ve been handling our domain names for years now, and their service is outstanding. I’d also like to thank our new neighbor Bobby for being one hell of a cool guy.
I’ll keep you updated on how things go. Hopefully I’ll start writing about the house soon.
Cybronic called and told me they’d be sending a technician out tomorrow to verify the lines. Hopefully I should have a working connection soon.
My bad, it was wednesday. Connection is up now and I’m back online with a static IP!