My current employer has a problem with managing scale. Bad habits and lack of consistency have led to an environment of never-ending one-offs that result in extended downtime, employee burnout, and loss of productivity. To fully grasp the scope of the current situation, We must look at the issues we currently suffer from, and the cost incurred by them.
Two issues: Builds and …Everything Else
Builds have been a sore point for our for our team for some time. Common complaints involve:
- Reliance on a proprietary tool (HP RDP), which is windows based and owned by another team
- Reliance on DNS entries for the build process, which may take days to go through
- Lack of Tribal knowledge of the build process (only 2 team members are fully educated in it)
- Lack of visibility and documentation of the process and details
- Lack of centralized account management ownership
- Slow to resolve issues with build (no default jdk install, ulimit)
- Newly built servers are not up to date (patched)
- Aged distributions (SLES 9, SLES 10) require hardware-specific drivers on newer hardware.
Beyond our build problems, we have further issues:
- Lack of centralized, Tiered, or Channeled patching.
- Unreliable naming conventions.
- Heavy ramp-up time
While we have done our best to address some of these non-build issues, only a full revamp of the build process will address the underlying problems.
Resulting Costs: Time and Money
The repercussions of our build issues have both obvious and indirect costs.
Things that Cost Time
- Builds require DNS Changes: RDP requires DNS entries, which require Change Request windows. This can roadblock a project for up to two days.
- Inconsistency: Tracking down simple production issues require intimate domain knowledge due to the sheer number of one offs.
- Lack of Visibility: Without domain knowledge, the steps to tracking down an issue requires extensive sleuthing to fight the right servers, pools, projects, irules, etc.
- Lack of Auditing: With no mechanism within the team to “circle back” and clean up after ourselves, unresolved issues sit for months, resulting in confusion later.
- Lack of up-to-date Documentation: Much of our documentation is woefully out of date, leading to poor decisions based on bad intel.
- Lack of Instrumentation: Applications consist of multiple layers, but due to firewall, code, authentication and DNS constraints, Applications cannot easily be tested at all layers.
- High Ramp-up time for New Employees: Time is wasted for both the new employee and trainer to learn all of the nuances.
- Context Thrashing: Humans aren’t nearly as good at multitasking as they think. The constant thrash of interruptions reduce efficiency.
Things that Cost Money
- Licensing: Only a small minority of our servers have valid SLES licenses, making update costs somewhat dubious. Updates via OpenSuse/CentOS are a viable option, but places us in a hybrid environment.
- Suse quoted around $260k to fully license and support
- Red Hat quoted significantly more to fully license and support
- Support: Hardware support, software support, offshore support are not cheap.
The suggested solution to this predicament is a ground up redesign of our environment, starting with our baseline installation and building on our recently introduced conventions. Simplification and refactoring are the targets, since they will allow for better management at scale. Whenever a design decision is made, the ops team should be involved to discuss it.
Baseline Build: Commercial/Community Hybrid model
Two things prevent us from going with a completely community-supported build- Business Insecurity and third-party support.
- Business Insecurity is an internal requirement to “call someone if something breaks,” which may or may not be used (or even helpful). Finding a solution is often quicker and easier through community support via online chat, google searches and social networking.
- Third-party support is an external requirement where a company like Oracle will only support their product on a blessed distribution, despite the difference being in name only. As long as you are running on a licensed distribution, you are usually supported, regardless of the individual packages installed, meaning a RHEL-licensed server could pull packages from a CentOS source.
The primary differences between SLES/OpenSuse and RHEL/CentOS is the source of the packages and the trademarks. Regardless of distribution, maintaining our packages via an internal centralized source is possible, with licensing only used when “Vendor support” is required by a third party application.
RHEL/CentOS is suggested for baseline build for a number of reasons:
- Market Penetration: RHEL has a 60-70% market share, meaning third party support will be better and sysadmin skills will be more commonplace (hence cheaper).
- Larger Community Support: based on Support channels and various other sources, RHEL has the larger community.
- Owns JBoss: RHEL could provide support and training at discounted rates.
- Clean Slate: Switching distributions forces a clean-slate re-evaluation of our practices.
Base Package Set and Base Configuration Overlay
Conventions over Configuration
How This Reduces Costs and Man-Hours
Implementation Examples to resolve outstanding issues
This article is from sometime in 2008. I was kicking around the algorithms for combat. While it didn’t go anywhere, it’s interesting to see where my mind was.
Battle mechanics are always fun… but how to calculate battle and/or damage…
|stats||Fighter||Snapper||Snake||Worg||Fighter (lvel 2)||Fighter (level 20)|
lvl 1: main stats(str,atk) +2, +5 points 27
lvl 2: main and std stats(str,atk,def,con) +1, +5 points 38
lvl 3: main A,std A, secondary A(str,def,eva) +1, +5 points 49
lvl 4: main B,std B, secondary B(atk,con,res) +1, +5 points 50
lvl 5: maj,eva +1, +5 points 61
Chance to Hit = (atk + str*.1)/(def + eva*.1)*.5
chance for crit = atk/eva*.1
damage = rand(weapon-dmg) * str/def * ifcrit(1+str/def)
lvl 1 Fighter Vs. Snapper
(5 + 12*.1)/(10 + 9*.1)*.5 = 28% Chance to hit
5/9*.1= 5% Chance for crit
(3 to 4) * 12/10 = 3.6 min
(3 to 4) * 12/10 = 4.8 avg
(3 to 4) * 12/10 = 4.8 max
(3 to 4) * 12/10 * (1+12/10) = 7.92 min crit
(3 to 4) * 12/10 * (1+12/10) = 10.56 avg crit
(3 to 4) * 12/10 * (1+12/10) = 10.56 max crit
(12 + 12*.1)/(12 + 9*.1)*.5 = 51% Chance to hit
12/5*.1= 24% Chance for crit
(1 to 3) * 12/12 = 1 min
(1 to 3) * 12/12 = 2 avg
(1 to 3) * 12/12 = 3 max
(1 to 3) * 12/12 * (1+12/12) = 2 min crit
(1 to 3) * 12/12 * (1+12/12) = 4 avg crit
(1 to 3) * 12/12 * (1+12/12) = 6 max crit
(2 to 6) * 12/12 = 2 min
(2 to 6) * 12/12 = 4 avg
(2 to 6) * 12/12 = 6 max
(2 to 6) * 12/12 * (1+12/12) = 4 min crit
(2 to 6) * 12/12 * (1+12/12) = 8 avg crit
(2 to 6) * 12/12 * (1+12/12) = 12 max crit
(4 to 8 ) * 12/12 = 4 min
(4 to 8 ) * 12/12 = 6 avg
(4 to 8 ) * 12/12 = 8 max
(4 to 8 ) * 12/12 * (1+12/12) = 8 min crit
(4 to 8 ) * 12/12 * (1+12/12) = 10 avg crit
(4 to 8 ) * 12/12 * (1+12/12) = 16 max crit
As I prepare to switch to Hugo, I’ve decided to go back through my drafts and publish unfinished works that have some value. This article was last edited Jan 22nd, 2013.
The Moose is a special prize within the programming and IT communities. It is claimed, not awarded. The way it works is that you will catch yourself doing something stupid (by your standards), and you will then “claim The Moose.” When you do so you must announce that you are in custody of The Moose, so the next person that takes it knows where to go to find it. The Moose should be displayed in an area of high visibility on or near your workstation.
Notice that the Moose is claimed, it is not awarded. If you catch something that is so stupid as to be spectacular, and it affects the whole team (for example, somebody breaks the build AND then commits the broken code into the repository) then the person is AWARDED a different prize: The Albatross. The moose hunts you. You try and try to evade it but the moose stalks you like fog in the night.
“Listen, and understand. That Moose is out there. It can’t be bargained with. It can’t be reasoned with. It doesn’t feel pity, or remorse, or fear. And it absolutely will not stop, ever, until you are exposed.”
I set up my first Proxmox implementation on my rebuilt gaming PC. The goal was to run proxmox on bare metal, then run a windows VM with hardware passthrough so I could play Elite Dangerous in windows with only a 1-3% performance loss. This would also give me a platform to work on automation tools and containerization.
So how did I go about doing it? Well, I started by reading this article: https://techblog.jeppson.org/2018/03/windows-vm-gtx-1070-gpu-passthrough-proxmox-5/
That did most of the heavy lifting, but it was specific to intel processors. Here’s what my final changes looked like:
I needed to enable 3 main things:
- WHQL for windows 10
- UEFI Bios
- enable virtualization under the Overclocking-> CPU Features panel
/etc/default/grub needs to have the following DEFAULT line:
GRUB_CMDLINE_LINUX_DEFAULT=”quiet amd_iommu=on iommu=pt video=efifb:off”
/etc/modprobe.d/blacklist.conf needs the following entry:
QEMU Host config
agent: 1 bios: ovmf bootdisk: scsi0 cores: 8 cpu: host,hidden=1 hostpci0: 1c:00.0,x-vga=on,pcie=1 hostpci1: 1c:00.1 hostpci2: 1d:00.3 hostpci3: 1e:00.3,pcie=1 ide2: local:iso/virtio-win-0.1.141.iso,media=cdrom,size=309208K machine: q35 memory: 12000 name: gamey net0: e1000=DE:F7:85:97:FF:22,bridge=vmbr0 numa: 1 onboot: 1 ostype: win10 scsi0: local-lvm:vm-101-disk-0,size=100G scsihw: virtio-scsi-pci smbios1: uuid=d0e62ae5-0939-4544-aa2e-7e92f872cc39 sockets: 1 usb0: host=1-2 usb1: host=0c45:7605 usb2: host=046d:c332 virtio2: /dev/disk/by-id/ata-CT500MX500SSD1_1817E1395213-part1,size=476937M vmgenid: fa74f2e1-46d1-444b-963a-1f0417d18fd0
options vfio-pci ids=10de:1b81,0de:10f0
I apologize that this is super rough and poorly formatted, but I figured that was better than nothing.
Found this here, which fortunately fixed my issue with 3 lines:
sudo mv /opt/Citrix/ICAClient/keystore/cacerts /opt/Citrix/ICAClient/keystore/cacerts_old sudo cp /opt/Citrix/ICAClient/keystore/cacerts_old/* /usr/share/ca-certificates/mozilla/ sudo ln -s /usr/share/ca-certificates/mozilla /opt/Citrix/ICAClient/keystore/cacerts
I bought Xenoblade Chronicles 2 as a fluke- I’d heard the first one was good, and there was an article prior to it coming out suggesting that it was the game to play after Breath of the Wild. Well, I’ve put a week or so into it so far and here are the takeaways.
- The battle system is an over-complicated mess where you don’t actually battle, you just wait for permission to press buttons. It’s completely chaotic and near impossible to follow and you feel like a spectator rather than a participant.
- Once a battle is done, all damage is healed. There’s no consequences. other than dying and having to “try again”
- Oh, each of these battles takes an eternity to finish. Walk from point A to point B, and have 30 battles. But if you die half way through, you get to go back to the beginning and do it all over again.
- The map system sucks, as does the fast travel. You can’t scroll the overlay map to figure out where you need to go, just follow the stupid compass arrow and hope it’s leading you the right way (it’s led me to solid walls already, resulting in me giving up on that side quest. The fast travel screen is just unintuitive, and the map it shows doesn’t correlate with the overlap map in any meaningful way.
- The voice acting. My god- I was embarrassed when the first mustashe-twirling govenor guy showed up because it sounded like… I don’t know, like a horrible person doing a Scotty from Star Trek impression.
I’m on chapter 3, and at this point it feels like a trudge. to get through the game. I keep hoping it’ll get better, but it isn’t. and to top it off, I bought the digital download like a fool so I can’t even resell it. I just spent 3 hours grinding my way to the next section only to die and start over.
What a disappointment.
Because I don’t know when to stop, I’m going to start working on upgrades for my printer.
1. Filament Guide
Apparently one of the common problems is that slack in the filament can cause tangles- the best way to work around this is a filament guide. The first filament guide I printed was loose- too loose to use by itself. The second style just didn’t print properly, even trying to print it 2 different ways. I ended up using a command strip to stick the first one in place, and that seems to be working for the time being. Perhaps later I can modify the model and make it a little better fit.
Another common problem is that the all-metal thumbwheels will jiggle free over time, causing the bed to unlevel. The Solution is to use nylon locking nuts (nylock nuts) , but they’re so tiny you wouldn’t be able to adjust them- that’s where the 3d printed thumbwheels come in. The nylocks go on the underside of the printed thumbwheel, allowing better control and a more coarse texture than the metal thumbwheels. So far they’re working well.
While it’s not a direct mod, I printed a 3d case for a raspberry pi and loaded the pi with a custom OS called Octoprint. It controls the printer over USB so you’re not constantly inserting and removing sdcards. In addition, it gives you a nice web interface where you can upload your gcode files, track the print progress, and tweak configurations. It even lets you time-lapse control a pi camera to see the status and verify things haven’t went off the rails.
4. Allen Wrench and Scraper Hook Support
This is more of a utility modification than anything- with the 3d prints, you usually need to scrape the print off the bed when it’s complete, which means you have a standard scraper always laying around. This gives you a hook to store the scraper on, as well as slots to place the allen wrenches.
5. Fun Fan cooler
My original intention was to go with the Dii cooler, but after some investigation I came across the fun fan cooler, which looks like an earwig’s behind. it has a few print flaws which I’m going to attempt to fix and re-release it on thingiverse. So far it’s greatly improved the quality of my prints. Update: My attempt to fix the model failed miserably. I still have a lot to learn about organic modelling.
6. Pi Cam Arm
I’ve found a decent arm/camera holster for my raspberry pi camera, which should allow me to create timelapse videos. I still don’t have a great base due to the short cable I’m working with, but that should be remedied tomorrow. In the mean time, here’s a video: https://goo.gl/photos/AiX6PCX5Z45nR1Bu8 This was my second print of the Earwig vent/ Fun Fan Cooler.
7. Glass Bed
My glass bed has arrived, but the thermal pad won’t be here until Saturday. Between now and then I’ll have to print clips.
Right now I’m planning on the following upgrades:
- Z braces. I saw the tower shake a surprising amount during quick y axis movements- Z braces basically add a hypotenuse to the intersecting structure of the printer. The ones I’m looking at will have levelling feet. Update: Unfortunately, these are for the maker select, not the select plus, so they won’t fit. I’ll need to design my own.
- Metal Hotend with slotted block. Microswiss makes a nice hotend that supposedly works much better.
- Hardened steel nozzle. Another Microswiss upgrade that’ll let me work with a wider array of materials and temperatures.
- Machined lever and extruder plate. The existing level that holds the filament in place will warp over time- this one won’t.
Overall this has been an interesting diversion so far.
After finally getting my 3d printer, I thought I should start keeping track of what I’m doing.
Printer: Monoprice Maker Select Plus
Standard Filament: MP Select PLA Plus+ Premium 3D Filament (white)
After Unboxing it and getting everything aligned, I printed 1.gcode and 2.gcode from the SD card that came with it using the yellow PLA filament that came with it. The first was a small elephant, the second was a swan.
I had played a bit with FreeCAD while waiting for the print and had followed a tutorial for creating a “lego.”
As you may or may not know, There are 2 steps in designing a 3d part
- designing the regular 3d object in 3d modeling software like 3DSM, Maya, Blender, FreeCAD, etc to create an STL file.
- converting the STL with a slicer program like Cura into a gcode file.
The Gcode is basically a set of assembly-like instructions for controlling the printer- move 2mm, extrude, move 3mm, retract, travel 10mm, etc. What’s important to note is that Cura needs to be configured for your specific printer model.
- The good news is that Monoprice ships with a free copy of Cura
- The bad news is that they only include the exe version
- The good news is you can run it with wine
- The bad news is that it’s not only in chinese(?), but fails to install with an error (that is also not in english).
This makes it really hard to configure Cura properly. My first attempts did not go great, but after doing a bit of research, I found that the “Prusa i3 Mk2” model was “close enough” with some minor modifications:
Back to the Real Story
After some tinkering and trial and error, I was able to print my self-designed lego sliced with my own copy and configured version of Cura, however somewhere along the way it became supersized. It fits roughly 3 regular lego pegs to every 2 on my block. I’m not sure where things fell apart, but I need to re-examine the FreeCad file and get the calipers out to figure out if the instructions were wrong or if I did something incorrect.
Anyways, the Lego used up almost the last of my sample yellow, so I opened my new standard filament, the white PLA from monoprice.
The Drow Wizard
The first thing I printed was a Drow Wizard from Shapeways. it was fairly complex, and so-far the printer is completely untuned, so it’d give me a good idea of what I’m working with.
It was pretty rough. There were a lot of strings between the staff and the figure, and the face had no detail. After a bit of cleanup, it’d be passable for kids, but it was still lower quality than I was hoping for
The Filament Guide
The next thing I printed was the filament guide upgrade for the printer itself. This was my first time using a support, and man did it waste a lot of filament. After some cleanup, it came out decent, but still had some print flaws- namely a hole in the top of the guide arm where the top layer wasn’t think enough and inside the “C” at the top, the edges pulled away from the rest of the print. It’s probably still usable, but I’ll eventually print a better one.
The first “real use” part was a Raspberry Pi 3 case I found on Thingiverse. The Top came out rather nice (but still has some flaws), and I’m waiting for the bottom to finish as I type this.
While waiting, I’ve done a bit of research on some of the flaws I’ve noticed and am coming up with a list of things to try. Before I make any further adjustments, I’m going to print a 3dSketchy boat that is commonly used for calibration tests. Once I do that, I’ll probably print 3 or 4 more, trying different configurations and tweaks.
I’ve finally gotten the go-ahead to get a 3d printer. It’s something I’ve wanted for a long time, but I’m just now at the point where I can get into it. As I wait for my tax return, I’ve started learning how to use Freecad.
So far I’ve finished the following tutorials:
- https://www.freecadweb.org/wiki/Sketcher_tutorial (2017-03-26)
It’s taken a bit of time, but I’m slowly getting there. With any luck I’ll be fabricating parts with relative ease, then can move on to sculpting with blender.
I’m currently investigating the best ansible module to manage redis for my server. The good news is that ansible galaxy has plenty of options; the bad news is that most of them are terrible. This is my first attempt to find the best of the bunch.
For the sake of simplicity, I’m limiting my search to roles that support Enterprise_Linux (e.g. Redhat, Centos, etc). In addition, I’m going to be examining the github repos rather than the galaxy entries.
It’s important to note that I’m not judging the authors, only their usefulness to me.
Last Commit: Sept 15th, 2015
Commits: 2 Contributors: 1
Branches: 1 Releases: 0
- Default values used
- Remi repo used
- config templatized
- vars used
- Installs its own Remi repo config
- docker stuff included
- extensive template hardcodeds content
- README example is limited.
Last Commit: May 25th, 2016
Commits: 15 Contributors: 3
Branches: 1 Releases: 0
Redis versions supported explicity: 2.4, 2.6, 2.8
- Extensive defaults
- simple tasks and template
- Estensive README
- overly simplistic module, complex variables
- uses default redis package
Last Commit: September 8th, 2016
Commits: 5 Contributors:1
Branches: 1 Releases: 0
- includes spec file
- enables remi and epel repos
- includes docker for tests
- doesn’t include repos as requirements
Last Commit: September 27th, 2016
Commits: 7 Contributors: 1
Branches: 1 Releases: 3
- Good Defaults
- Excellent README
- multilayer vars configuration
- includes test playbook and inventory
- Supports multiple distributions
- complex vars configuration
- default packages only, no repo support
Last Commit: June 20th, 2016
Commits: 5 Contributors: 3
Branches: 1 Releases: 3
- includes good repo dependencies
- Poor defaults
- Bad formatting with redirects
- Bad README
Last Commit: June 7th, 2016
Commits: 18 Contributors: 1
Branches: 1 Releases: 0
- includes performance tweaks
- includes docker file
- bad defaults
- mentions epel, no include or dependencies
- no repo dependencies
- Poor vars
Last Commit: March 10th, 2016
Commits: 36 Contributors: 1
Branches: 2 Releases: 6
- includes build status
- No repo dependencies
- Weird tasks layout
- Configuration not really EL specific (more debian than Redhat)
Wow…. that was, uh, painful. The good news is a lot of them are still active, though the number of commits is relatively low. across the board. The low commit numbers could mean one of two things:
- Ansible roles are easy to get right the first time, or
- they’re slapped together and not really polished.
There’s a few we can rule out straight away: mrlesmithjr, dgnest, AerisCloud- there just wasn’t a lot of useful content.
That leaves hostclick, jtyr, officel, and sbaerlocher with useful content. I think the right answer will be to roll my own taking parts from each. I’ll give it a closer look tomorrow.
Update: AAAND I feel dumb. I didn’t notice during my first search that those were the first 10 results- 3 rows of 3 and one row of 1 made it look like that was the end of the list.
I’ll have to re-evaluate, probably based on “most downloaded.”