Stemming the flow of evincible Ignorance. We must try to understand for the sake of understanding.
Archive for March, 2012
The Limoncelli Test
11 years
by Jesse Morgan
in Uncategorized
For grins, I went through Tom Limoncelli’s sysadmin questionaire to see how “a team I have worked with previously” fares:
- A. Public facing practices:
- *1. Are user requests tracked via a ticket system? No, a high estimate would be 1/3rd of their requests are tracked.
- *2. Are “the 3 empowering policies” defined and published? No.
- 3. Does the team record monthly metrics? No. Outages are tracked by management, but that’s it. No alert stats, system-usage stats, etc.
- *4. Do you have a “policy and procedure” wiki? Yes, although admittedly it is missing quite a bit.
- 5. Do you have a password safe? No.
- 6. Is your team’s code kept in a source code control system? Most if not all, yes.
- 7. Does your team use a bug-tracking system for their own code? No.
- 8. In your bugs/tickets, does stability have a higher priority than new features? N/A
- 9. Does your team write “design docs”? No. We have a few, but it’s not S.O.P.
- 10. Do you have a “post-mortem” process? Yes, each week we do one for the previous week’s oncall.
- *11. Does each service have an OpsDoc? No.
- *12. Does each service have appropriate monitoring? No, probably only 60-70% coverage.
- 13. Do you have a pager rotation schedule? Yes. one out of every 9 weeks we are oncall.
- 14. Do you have separate development, QA, and production systems? Yes, we have dev, qa,stage and prod.
- 15. Do roll-outs to many machines have a “canary process”? No.
- 16. Do you use configuration management tools like cfengine/puppet/chef? No, but I am working on implementing Puppet for our new builds.
- 17. Do automated administration tasks run under role accounts? No.
- 18. Do automated processes that generate email only do so when they have something to say? No, but this has greatly improved.
- *19. Is there a database of all machines? Yes, LDAP Inventory
- 20. Is OS installation automated? Yes, and we are improving it.
- *21. Can you automatically patch software across your entire fleet? No.
- 22. Do you have a PC refresh policy? No, presuming we’re talking about Servers.
- *23. Can your servers keep operating even if 1 disk dies? Yes (as far as I know).
- 24. Is the network core N+1? Unknown.
- *25. Are your backups automated? Unknown.
- *26. Are your disaster recovery plans tested periodically? Never to my knowledge.
- 27. Do machines in your data center have remote power / console access? Yes, HP ILO.
- *28. Do desktops/laptops/servers run self-updating, silent, anti-malware software? No.
- *29. Do you have a written security policy? No.
- 30. Do you submit to periodic security audits? Yes, but they are very rudamentary.
- 31. Can a user’s account be disabled on all systems in 1 hour? No, too many one-off systems.
- 32. Can you change all privileged (root) passwords in 1 hour? No, We can change 90%, but not a handful of oneoffs, which are difficult to identify.
Wow… that was… depressing. 10/32= 31% That I could answer yes with some degree of confidence.
NTP Querying
11 years
by Jesse Morgan
in Uncategorized
Setting up a new NTP client? Can’t tell if it’s syncing properly? Use
ntpq -c lpeers
to figure out if things are syncing properly.