Archive for March, 2012

The Limoncelli Test

For grins, I went through Tom Limoncelli’s sysadmin questionaire to see how “a team I have worked with previously” fares:

  • A. Public facing practices:
  • *1. Are user requests tracked via a ticket system? No, a high estimate would be 1/3rd of their requests are tracked.
  • *2. Are “the 3 empowering policies” defined and published? No.
  • 3. Does the team record monthly metrics? No. Outages are tracked by management, but that’s it. No alert stats, system-usage stats, etc.
  • B. Modern team practices:
    • *4. Do you have a “policy and procedure” wiki? Yes, although admittedly it is missing quite a bit.
    • 5. Do you have a password safe? No.
    • 6. Is your team’s code kept in a source code control system? Most if not all, yes.
    • 7. Does your team use a bug-tracking system for their own code? No.
    • 8. In your bugs/tickets, does stability have a higher priority than new features? N/A
    • 9. Does your team write “design docs”? No. We have a few, but it’s not S.O.P.
    • 10. Do you have a “post-mortem” process? Yes, each week we do one for the previous week’s oncall.
  • C. Operational practices:
    • *11. Does each service have an OpsDoc? No.
    • *12. Does each service have appropriate monitoring? No, probably only 60-70% coverage.
    • 13. Do you have a pager rotation schedule? Yes. one out of every 9 weeks we are oncall.
    • 14. Do you have separate development, QA, and production systems? Yes, we have dev, qa,stage and prod.
    • 15. Do roll-outs to many machines have a “canary process”? No.
  • D. Automation practices:
    • 16. Do you use configuration management tools like cfengine/puppet/chef? No, but I am working on implementing Puppet for our new builds.
    • 17. Do automated administration tasks run under role accounts? No.
    • 18. Do automated processes that generate email only do so when they have something to say? No, but this has greatly improved.
  • E. Fleet management practices:
    • *19. Is there a database of all machines? Yes, LDAP Inventory
    • 20. Is OS installation automated? Yes, and we are improving it.
    • *21. Can you automatically patch software across your entire fleet? No.
    • 22. Do you have a PC refresh policy? No, presuming we’re talking about Servers.
  • F. “We acknowledge that hardware breaks” practices:
    • *23. Can your servers keep operating even if 1 disk dies? Yes (as far as I know).
    • 24. Is the network core N+1? Unknown.
    • *25. Are your backups automated? Unknown.
    • *26. Are your disaster recovery plans tested periodically? Never to my knowledge.
    • 27. Do machines in your data center have remote power / console access? Yes, HP ILO.
  • G. Security practices:
    • *28. Do desktops/laptops/servers run self-updating, silent, anti-malware software? No.
    • *29. Do you have a written security policy? No.
    • 30. Do you submit to periodic security audits? Yes, but they are very rudamentary.
    • 31. Can a user’s account be disabled on all systems in 1 hour? No, too many one-off systems.
    • 32. Can you change all privileged (root) passwords in 1 hour? No, We can change 90%, but not a handful of oneoffs, which are difficult to identify.

    Wow… that was… depressing. 10/32= 31% That I  could answer yes with some degree of confidence.

    NTP Querying

    Setting up a new NTP client? Can’t tell if it’s syncing properly? Use

    ntpq -c lpeers

    to figure out if things are syncing properly.

    Go to Top