When many people think of systems people – for example, systems administrators, or IT staff – they wonder what we spend our time doing, when we’re not sitting with our customers and actively working to help fix what’s broken.
While in larger organizations there are people whose job descriptions read “IT Support”, many of us have significant other responsibilities. What are we doing when the alarm bell (or telephone, or ticket system) isn’t ringing?
Often, the answer is that we’re building things. This time breaks down partly to projects that we’re tasked to do – say, installing new hardware or software, or network upgrades. But I’ve had a nice opportunity to consider what happens without tickets, phone calls, or support tasks.
What I’ve concluded is that I think about two things:
First, there’s what’s new to learn. Arthur C. Clarke once wrote that “the well-stocked mind is safe from boredom,” and a corollary to this might be that one who is hungry to learn can never be sated. It’s amazing to me how much of the important technology in the IT world is available for free, either because it’s open source, or because there are formal programs allowing access to technology for free or at significantly reduced costs though developer/support programs. I’m thinking of much Oracle technology (free) and Microsoft systems (available at a nominal cost) – for purposes of evaluation, testing, etc. (effectively, “non-production use”, depending on the specifics of the licenses).
In addition, with the commoditization of the vast majority of computer hardware, courtesy of Intel, along with virtualization platforms, it’s pretty easy for a curious technologist to get exposure to a wide variety of applications and operating systems. There’s also “the cloud” – in which one can “rent” a virtual server for $.08/hour (Amazon) or $19.95/month (Linode). There are others, of course, but these are examples of what I personally use.
The second thing that I think about is the question of how to understand the ways in which things break – or to put it another way, what lessons one can learn from outages, errors, breakdowns, and failures of hardware, software, networks, applications, security, or human organizations. What this means in practical terms is an emphasis (in my time and effort, at least) in thinking about how to monitor systems, traffic, the environment and myself. What can I instrument? What can I learn about a complex collection of systems and applications by collecting data? What patterns can I see, and what can I predict?
“Paying attention” is essential to understanding what we’re doing. Having the time to improve how I do this has been an experience that has given me a better perspective on how to be better at the work I do.