Archive for September, 2019
My current employer has a problem with managing scale. Bad habits and lack of consistency have led to an environment of never-ending one-offs that result in extended downtime, employee burnout, and loss of productivity. To fully grasp the scope of the current situation, We must look at the issues we currently suffer from, and the cost incurred by them.
Two issues: Builds and …Everything Else
Builds have been a sore point for our for our team for some time. Common complaints involve:
- Reliance on a proprietary tool (HP RDP), which is windows based and owned by another team
- Reliance on DNS entries for the build process, which may take days to go through
- Lack of Tribal knowledge of the build process (only 2 team members are fully educated in it)
- Lack of visibility and documentation of the process and details
- Lack of centralized account management ownership
- Slow to resolve issues with build (no default jdk install, ulimit)
- Newly built servers are not up to date (patched)
- Aged distributions (SLES 9, SLES 10) require hardware-specific drivers on newer hardware.
Beyond our build problems, we have further issues:
- Lack of centralized, Tiered, or Channeled patching.
- Unreliable naming conventions.
- Heavy ramp-up time
While we have done our best to address some of these non-build issues, only a full revamp of the build process will address the underlying problems.
Resulting Costs: Time and Money
The repercussions of our build issues have both obvious and indirect costs.
Things that Cost Time
- Builds require DNS Changes: RDP requires DNS entries, which require Change Request windows. This can roadblock a project for up to two days.
- Inconsistency: Tracking down simple production issues require intimate domain knowledge due to the sheer number of one offs.
- Lack of Visibility: Without domain knowledge, the steps to tracking down an issue requires extensive sleuthing to fight the right servers, pools, projects, irules, etc.
- Lack of Auditing: With no mechanism within the team to “circle back” and clean up after ourselves, unresolved issues sit for months, resulting in confusion later.
- Lack of up-to-date Documentation: Much of our documentation is woefully out of date, leading to poor decisions based on bad intel.
- Lack of Instrumentation: Applications consist of multiple layers, but due to firewall, code, authentication and DNS constraints, Applications cannot easily be tested at all layers.
- High Ramp-up time for New Employees: Time is wasted for both the new employee and trainer to learn all of the nuances.
- Context Thrashing: Humans aren’t nearly as good at multitasking as they think. The constant thrash of interruptions reduce efficiency.
Things that Cost Money
- Licensing: Only a small minority of our servers have valid SLES licenses, making update costs somewhat dubious. Updates via OpenSuse/CentOS are a viable option, but places us in a hybrid environment.
- Suse quoted around $260k to fully license and support
- Red Hat quoted significantly more to fully license and support
- Support: Hardware support, software support, offshore support are not cheap.
The suggested solution to this predicament is a ground up redesign of our environment, starting with our baseline installation and building on our recently introduced conventions. Simplification and refactoring are the targets, since they will allow for better management at scale. Whenever a design decision is made, the ops team should be involved to discuss it.
Baseline Build: Commercial/Community Hybrid model
Two things prevent us from going with a completely community-supported build- Business Insecurity and third-party support.
- Business Insecurity is an internal requirement to “call someone if something breaks,” which may or may not be used (or even helpful). Finding a solution is often quicker and easier through community support via online chat, google searches and social networking.
- Third-party support is an external requirement where a company like Oracle will only support their product on a blessed distribution, despite the difference being in name only. As long as you are running on a licensed distribution, you are usually supported, regardless of the individual packages installed, meaning a RHEL-licensed server could pull packages from a CentOS source.
The primary differences between SLES/OpenSuse and RHEL/CentOS is the source of the packages and the trademarks. Regardless of distribution, maintaining our packages via an internal centralized source is possible, with licensing only used when “Vendor support” is required by a third party application.
RHEL/CentOS is suggested for baseline build for a number of reasons:
- Market Penetration: RHEL has a 60-70% market share, meaning third party support will be better and sysadmin skills will be more commonplace (hence cheaper).
- Larger Community Support: based on Support channels and various other sources, RHEL has the larger community.
- Owns JBoss: RHEL could provide support and training at discounted rates.
- Clean Slate: Switching distributions forces a clean-slate re-evaluation of our practices.
Base Package Set and Base Configuration Overlay
Conventions over Configuration
How This Reduces Costs and Man-Hours
Implementation Examples to resolve outstanding issues
This article is from sometime in 2008. I was kicking around the algorithms for combat. While it didn’t go anywhere, it’s interesting to see where my mind was.
Battle mechanics are always fun… but how to calculate battle and/or damage…
|stats||Fighter||Snapper||Snake||Worg||Fighter (lvel 2)||Fighter (level 20)|
lvl 1: main stats(str,atk) +2, +5 points 27
lvl 2: main and std stats(str,atk,def,con) +1, +5 points 38
lvl 3: main A,std A, secondary A(str,def,eva) +1, +5 points 49
lvl 4: main B,std B, secondary B(atk,con,res) +1, +5 points 50
lvl 5: maj,eva +1, +5 points 61
Chance to Hit = (atk + str*.1)/(def + eva*.1)*.5
chance for crit = atk/eva*.1
damage = rand(weapon-dmg) * str/def * ifcrit(1+str/def)
lvl 1 Fighter Vs. Snapper
(5 + 12*.1)/(10 + 9*.1)*.5 = 28% Chance to hit
5/9*.1= 5% Chance for crit
(3 to 4) * 12/10 = 3.6 min
(3 to 4) * 12/10 = 4.8 avg
(3 to 4) * 12/10 = 4.8 max
(3 to 4) * 12/10 * (1+12/10) = 7.92 min crit
(3 to 4) * 12/10 * (1+12/10) = 10.56 avg crit
(3 to 4) * 12/10 * (1+12/10) = 10.56 max crit
(12 + 12*.1)/(12 + 9*.1)*.5 = 51% Chance to hit
12/5*.1= 24% Chance for crit
(1 to 3) * 12/12 = 1 min
(1 to 3) * 12/12 = 2 avg
(1 to 3) * 12/12 = 3 max
(1 to 3) * 12/12 * (1+12/12) = 2 min crit
(1 to 3) * 12/12 * (1+12/12) = 4 avg crit
(1 to 3) * 12/12 * (1+12/12) = 6 max crit
(2 to 6) * 12/12 = 2 min
(2 to 6) * 12/12 = 4 avg
(2 to 6) * 12/12 = 6 max
(2 to 6) * 12/12 * (1+12/12) = 4 min crit
(2 to 6) * 12/12 * (1+12/12) = 8 avg crit
(2 to 6) * 12/12 * (1+12/12) = 12 max crit
(4 to 8 ) * 12/12 = 4 min
(4 to 8 ) * 12/12 = 6 avg
(4 to 8 ) * 12/12 = 8 max
(4 to 8 ) * 12/12 * (1+12/12) = 8 min crit
(4 to 8 ) * 12/12 * (1+12/12) = 10 avg crit
(4 to 8 ) * 12/12 * (1+12/12) = 16 max crit
This article was originally written on July 19th, 2010, but never published.
Documentation is another topic where there appears to be disagreement in the sysadmin world. When to document, what to document, who do document for, and where to store that documentation always seem to be subjects of contention. Everyone likes documentation, but no one has the time to document, and the rules for documentation often feel arbitrary. I’d like to open this up for discussion and figure out some baselines.
Should I Document?
If you have to ask then probably; but it’s much more complex than that. Documentation is time-consuming and rarely of value at first, so few want to invest the effort into it unless it’s needed. There are several questions here that need to be answered:
- Why should I Document? What is the purpose of the documentation? Are you documenting a one-off process that you’ll have to do 10 months from now? Are you providing instructions for non-technical users? Perhaps you’re defining procedures for your team to follow. Whatever the reason, focus on it, and state it up front. There are few things worse than reading pages of documentation only to find out that it’s useless. Documentation for the sake of documentation is a waste of time.
- What should I Document? It’s very easy to ramble when writing documentation (as many of my articles prove). Step back and review what you’ve written, then remove any unneeded content. Find your focus and document only what needs to be explained, leave the rest for footnotes and hyper links.
- When should I Document? As soon as possible. Ideally you’d document as you worked, creating a perfect step-by-step record. Realistically, pressure to move quickly causes procrastination, but the truth of the matter is that the longer you wait, the less detail you’ll remember. Write down copious notes as you go, and massage it into a coherent plan after the fact.
- Who should I Document for? Write for your audience- a non-technical customer requires a much lighter touch compared to a seasoned techie. The boss may need things simplified that a coworker would instinctively understand. Pick your target audience and stick to it. Anything that falls outside of the audience interests should be flagged as “[Group B] should take note that…” Also remember that the person who requests the documentation may not be the target audience.
- Where should I Document? Where you keep documentation is often more important than the quality of your document. You can write the most compelling documentation in the company, but if it’s stored in a powerpoint slide on a shared drive, it’s of no use to someone searching a corporate wiki. Whatever your documentation repository may be, be it Alfresco, Sharepoint, Confluence or even Mediawiki, everyone has to be in agreement on a definitive source. The format should be searchable, track revisions, prevent unwanted access, and be inter-linkable.
Now that we’ve set some boundaries, let’s delve a little bit deeper into the types of documentation.
Types of Documentation
Documentation can take many forms. Over the course of any given day, you’ll see proposals, overviews, tutorials, standards, even in-depth topical arguments.
. Each type of documentation has its own rules and conventions- what’s required for one set may not be needed for another. That said, here are a few general rules to follow.
- Be Concise –
- NO: thoughtfully contemplate the reduction of flowery adjectives and adverbs for clarification;
- YES: remove unneeded words. Over-explaining will confuse the reader.
- Be Clear – Make sure your subject is obvious in each sentence. Ambiguity will destroy reader comprehension.
- Be Accurate – Incorrect documentation is worse that no documentation.
- Keep it Bite-sized – Large chunks of data are hard to process, so keep the content in small, digestible chunks that can be processed one at a time.
- Stay Focused – Keep a TODO list. Whenever you think of an improvement, make a note of it and move on.
- Refactor – The original structure may not make sense after a few revisions, so don’t be afraid to reorganize.
- Edit for Content -Make sure your topics are factually correct and the content flows properly.
- Edit for Grammar – Make sure your punctuation is correct and your structure is technically sound.
- Edit for Language – Make sure the text is actually interesting to read.
- Link to Further Information – If someone else has explained it well, link to it rather than rewrite it.
- Get Feedback – Feedback finds mistakes and adds value. The more trusted sources, the better off you are.
Proposals can be immensely rewarding (or mind-numbingly frustrating), depending on if they’re accepted or not. That’s not to say you shouldn’t write them; even a failed proposal has value. The point of a proposal is to communicate an idea, a way to tell your team or supervisor “this is what I think we should do.” If you’re successful, the idea will be implemented. If you’re unsuccessful, you may find out a better way to do it. The overall goal should be to improve team performance. Here’s what a proposal should include:
- The Problem – What problem are you trying to solve? Why is it a problem?
- The Solution – A simple overview of the solution
- The Benefits – what benefits it will provide?
- The Implementation – How to implement it.
- The Results – Explain the intended results
- The Flaws – What issues are expected, and if there is currently a solution
- The Timeframe – When should this project be started and completed? How long and how much effort will it take?
Lets presume you write a knockout proposal. Everything is perfect, and with 2 days of effort you’ll reduce a 2 hour daily task to a 15 minute weekly task. Regardless of the benefits, the response will be one of these:
- Complete Apathy – the worst response, because it shows how little you are valued. No response, approval, or denial. If this happens, run your idea past an uninvested third party. Perhaps a critical set of eyes may reveal the problem.
- Denied – perhaps the benefit isn’t worth the cost, the risk is to high, there’s not enough resources, or some other issues not addressed. Try to get specific reasoning as to why it won’t work, and rework your proposal taking that into account.
- Feigned Interest, no Support – Be it plausible deniability or lack of interest, the response is weak. Push for a yes or no answer, ask what the concerns are with it.
- Delay – It’s a good idea, but not right now. There might be hesitance due to a minor issue. Find a way to calm their fears, then push for an implementation date, create a checklist of conditions that need to be met.
- Conditional Agreement – It is a good idea, but conditions must be met first. Create a checklist and verify that it’s complete.
- Full Agreement – This should be your end goal. Full agreement means support from the boss and the team on implementation. Without support, your efforts may be wasted.
You can’t account for everything in your proposal, so make sure not to paint yourself into a corner. A method for dealing with problems is more valuable than individual solutions. It doesn’t need to be perfect, but does need to be flexible.
The most important thing a proposal needs is buy-in. If your team and management aren’t behind an idea, implementation will be a struggle. The final thing to keep in mind is that not all proposals are good. If there is universal apathy for your idea, it might just be bad and you’re oblivious to it.
Introductions and Overviews
Introductions are the first exposure someone may have to whatever you’ve been working on, be it a JBoss implementation, Apache configuration, or new software package. A clear understanding of what “it” is can help with acceptance. A bad introduction can taint the experience and prevent adaptation. So, how can you ensure a good introduction to a technology?
- Explain the Purpose – Why is the user reading this introduction? A new Authentication system? Messaging system? Explain why the reader should care.
- Define your Terms – Include a glossary of any new terms that the user needs to understand. Remember, this may be their first exposure to the topic. Don’t overwhelm them, but at the same time don’t leave them in the dark.
- Don’t Drown in Detail – An introduction should not cover everything in perfect detail, but it should give you references to follow up on.
The tone should be conversational- you need to draw the reader in, befriend them, and convince them that this new thing is not scary. This can be a tough task if the subject is replacing something that the reader if
Document a Process (Installation, Upgrade, Tutorials, How-to, Walk Through)
Documenting a process serves three purposes- it trains new employees in proper technique, ensures consistency, and covers your rear should something go wrong. That last point may sound a bit cynical, but you never know when you’ll need it. The process itself should be clear enough that any qualified user can follow it. Process documentation should have the following traits:
- Steps – Well defined tasks that need to be performed.
- Subtasks – any moderately complex task should be divided up.
- Document Common Problems – Surprises can derail a new user. Acknowledgement and fixes for issues can help ease new users into the process.
Dry runs are essential in documenting a process- test the process yourself and have others test it as well. Continual runs will expose flaws and allow you to address deficiencies. Keep testing and refining the process until a sample user can follow it without issue.
Topical guide (Feature-based)
Topical guides are both the most useful and yet the hardest documentation to write. They need to be thorough, both fully covering the material but not burying the user in frivolous details. So what should you cover in a topical guide?
- Be specific on the topic – Document a feature and all related material. If it’s not related, don’t include it.
- Cover Relevant Tangents –
- Be comprehensive – Cover everything a user needs to know, but remember it’s not intended to be a reference book.
Document a Standard (How Something Should be Done)
Inconsistency is the bane of system administration, and consistency can only be had when everyone is in agreement on how things should be done. There must be agreement not only on theory, but also in practice. As such, standards should be documented. What should a standard entail?
- Dynamic – Not the first word when you think of standards, but something you have to face; your standard will become out of date quickly. Document it and give it a revision number. Soon enough you’ll realize
- Audit – It’s not enough to document a standard, you also need to enforce it. Periodic verification can spot issues before they become problems. If configuration files are identical, md5sums can be used to find inconsistencies.
Annotation (Config Commenting)
One of the most common types of documentation is never published, yet often the most crucial in day-to-day operations. Comments within configuration files can explain what steps were taken and why.
- Explain Why – When you make changes, explain why you made the change.
- Keep it Simple – Comments should not overshadow the configuration. Leave over-documentation to sample configs.
- Consider Versioning – The best configuration documentation is a history of changes. Configurations that are both critical and fluid (for example, Bind zone files) are perfect candidates for versioning.
- Sign and Date Changes – When you make a change, leave your name and a datestamp. While versioning comments may be more permanent, inline comments provide instant context This is important when the change is revisited and no one remembers making it.
This is another article that sat in the drafts folder for far too long- Last edited Feb 21st, 2006.
I fear writing about tar, and that is why I’m determined to finish it in this sitting, so it won’t fester and scare me off of this series. Why am I scared of writing about tar? Well, this is their flags list verbatim from the man page:
[ --atime-preserve ] [ -b, --blocking-factor N ] [ -B, --read-full-records ] [ --backup BACKUP-TYPE ] [ --block-com- press ] [ -C, --directory DIR ] [ --check-links ] [ --checkpoint ] [ -f, --file [HOSTNAME:]F ] [ -F, --info-script F --new-volume-script F ] [ --force-local ] [ --format FORMAT ] [ -g, --listed-incremental F ] [ -G, --incremental ] [ --group GROUP ] [ -h, --dereference ] [ --help ] [ -i, --ignore-zeros ] [ --ignore-case ] [ --ignore-failed-read ] [ --index-file FILE ] [ -j, --bzip2 ] [ -k, --keep-old-files ] [ -K, --starting-file F ] [ --keep-newer-files ] [ -l, --one-file-system ] [ -L, --tape-length N ] [ -m, --touch, --modification-time ] [ -M, --multi-volume ] [ --mode PER- MISSIONS ] [ -N, --after-date DATE, --newer DATE ] [ --newer-mtime DATE ] [ --no-anchored ] [ --no-ignore-case ] [ --no-recursion ] [ --no-same-permissions ] [ --no-wildcards ] [ --no-wildcards-match-slash ] [ --null ] [ --numeric-owner ] [ -o, --old-archive, --portability, --no-same-owner ] [ -O, --to-stdout ] [ --occurrence NUM ] [ --overwrite ] [ --overwrite-dir ] [ --owner USER ] [ -p, --same-permissions, --preserve-permissions ] [ -P, --abso- lute-names ] [ --pax-option KEYWORD-LIST ] [ --posix ] [ --preserve ] [ -R, --block-number ] [ --record-size SIZE ] [ --recursion ] [ --recursive-unlink ] [ --remove-files ] [ --rmt-command CMD ] [ --rsh-command CMD ] [ -s, --same- order, --preserve-order ] [ -S, --sparse ] [ --same-owner ] [ --show-defaults ] [ --show-omitted-dirs ] [ --strip-com- ponents NUMBER, --strip-path NUMBER (1) ] [ --suffix SUFFIX ] [ -T, --files-from F ] [ --totals ] [ -U, --unlink- first ] [ --use-compress-program PROG ] [ --utc ] [ -v, --verbose ] [ -V, --label NAME ] [ --version ] [ --volno-file F ] [ -w, --interactive, --confirmation ] [ -W, --verify ] [ --wildcards ] [ --wildcards-match-slash ] [ --exclude PATTERN ] [ -X, --exclude-from FILE ] [ -Z, --compress, --uncompress ] [ -z, --gzip, --gunzip, --ungzip ] [ -[0-7][lmh] ]
So it’s a bit overwhelming. The good news is there are two common uses for tar- creating tarballs and opening tarballs. This will be a majority of your interaction with it. You get all sorts of fun options with tar, such as using different compression libraries, but it’s still pretty straight forward.
Tar produces tarballs, which in its simplest form is a bunch of data files run together into a larger file. in the following instance, -c means create, and -f means “create the following as a file called foo.tar”
tar -cf foo.tar bar/
This takes the bar directory and throws it all into a single file called foo.tar. Apart from some binary mojo to mark the separators between files, it’s almost as it all of the files were pasted end-to-end inside another file. if foo.tar is copied to another machine or place, you could untar the file with the following command:
tar -xf foo.tar
Again you see the -f flag, but the -c flag has been replaced by the extract flag, -x. This will create a directory called bar/ which will contain the contents identical to the original.
You also have the option of compressing tarballs in the process of creating them. There are three types of compression built into the version of tar I’m using: -Z, which uses the compress utility (ancient?); -z which uses gzip (old standard); and -j, which uses b2zip, which is good for compressing binaries (appears to be the new standard).
When creating a tarball that is compressed, it’s generally expected that you label it as such by appending the type to the filename, for example:
tar -cZf foo1.tar.Z bar1/ tar -czf foo2.tar.gz bar2/ tar -cjf foo3.tar.bz2 bar3/
Unless you have a specific reason, you’ll probably want to use bz2. You’ll probably never deal with a tar.Z file, but if you do, you’ll know how to deal with it. To uncompress these puppies, switch out the -c flag for the -x flag like we did in the previous example.
tar -xZf foo1.tar.Z tar -xzf foo2.tar.gz tar -xjf foo3.tar.bz2
One last option you may want to look at is -v. It will show you files as they’re being processed, and can be good for troubleshooting.
As I prepare to switch to Hugo, I’ve decided to go back through my drafts and publish unfinished works that have some value. This article was last edited Jan 22nd, 2013.
The Moose is a special prize within the programming and IT communities. It is claimed, not awarded. The way it works is that you will catch yourself doing something stupid (by your standards), and you will then “claim The Moose.” When you do so you must announce that you are in custody of The Moose, so the next person that takes it knows where to go to find it. The Moose should be displayed in an area of high visibility on or near your workstation.
Notice that the Moose is claimed, it is not awarded. If you catch something that is so stupid as to be spectacular, and it affects the whole team (for example, somebody breaks the build AND then commits the broken code into the repository) then the person is AWARDED a different prize: The Albatross. The moose hunts you. You try and try to evade it but the moose stalks you like fog in the night.
“Listen, and understand. That Moose is out there. It can’t be bargained with. It can’t be reasoned with. It doesn’t feel pity, or remorse, or fear. And it absolutely will not stop, ever, until you are exposed.”
This article was originally written back on Feb 21st, 2006. While never completed, I thought it was worth sharing.
Cat is a very simple utility- so simple I debated added it to this list. There are however three really useful flags. I’ll try to write as much as I can about it so you don’t feel ripped off by this article. hrm… did that last sentence sound like filler? I swear it wasn’t meant to- that’s completely on accident.
So what is cat? Cat is a utility for printing the contents of a file or files to the screen. for example:
morgajel@FCOH1W-8TJRW31 ~/docs $ cat path.txt paths database admin system admin network admin management morgajel@FCOH1W-8TJRW31 ~/docs $
You can also specify several files as well if you want to train them all together and pipe them to another utility.
morgajel@FCOH1W-8TJRW31 ~/docs $ cat foo.log bar.log baz.log |grep "Invalid user"> invalid_users.txt
So there are three useful flags for cat. the first one is -n, which adds a linenumber to the output, like so:
morgajel@FCOH1W-8TJRW31 ~/docs $ cat path.txt -n 1 paths 2 3 4 database admin 5 6 system admin 7 8 network admin management morgajel@FCOH1W-8TJRW31 ~/docs $
This can be useful when debugging source files. The next option is somewhat related; the -b option adds a line number, but only to non-blank lines. if you’re wanting to figure out for some reason what the 5th item is, not including blank lines, this is the way to go. Here’s an example of what it would look like:
morgajel@FCOH1W-8TJRW31 ~/docs $ cat path.txt -b 1 paths 2 database admin 3 system admin 4 network admin management morgajel@FCOH1W-8TJRW31 ~/docs $
Notice how it only counted to 4? There were only 4 text lines. The final option that may or may not be of use is the -s flag, which smushes (that’s a technical term) blank lines together- it leaves single blank lines alone, but if there’s more than one blank line next to each other, it removes all except one. using our file above, watch what happens between “paths” and “database admin” in our example:
morgajel@FCOH1W-8TJRW31 ~/docs $ cat path.txt -s paths database admin system admin network admin management morgajel@FCOH1W-8TJRW31 ~/docs $
Notice how there is only one blank line? That’s what -s does. if you’ve ever had a file where you’ve systematically removed text but not newlines and end up with a 500 line file with 20 lines of text, this can be useful for making it readable on a single page.
Well, that’s all I can really say about cat. If you have anything else to add, do so in the comments.