Wednesday, March 21, 2007

Jerry Landsbaum's Metrics - Measuring and Motivating Maintenance Programmers

Jerry Landsbaum wrote the book of his life in 1991/2. It reported on about 6 years of work at Monsanto - where they took a 'backwater' - the CIS maintenance programming area where new work was 'frozen' - and not only kept it viable, but got their management to appreciate them and their efforts.

Quote (p58, Section 3.3: What we Accomplished)

If we had no improvement in productivity, and had simply experienced the increases in salaries over that period, the cost to support a module in the fifth year would have been $1,232.

If we multiply this by the number of modules supported in Year 5, the budget would have been $11,400,000. This amount less the actual budget of $3,200,000 is a saving to Monsanto of $8,200,000. This alone proved our worth to our management, but we went further.

As a result of giving the staff greatly increased accountability, they generated over $600,000 in savings due to various projects they initiated.

CIS evaporated when Monsanto was taken over.
Jerry retired then and pursued his love of Jazz Music before passing away in 1999.

Jerry's work life started as a Mechanical Engineer for a railroad company - he knew what it took to deliver safe, sound, reliable and cost-efficient systems. He understood that Quality is a mindset, that it pervades every activity and comes before anything else. There's no point in building something if it's not going to stand up! There's no point in writing programs if they don't produce correct results. There's no point in offering business systems if they're not available when they're needed...

Jerry defined a set of metrics, in business terms, they provided to their management in their annual report. Excerpts follow.

Section 3.3 "What we Accomplished"

PATHVU Software Complexity Scores

Average Score
ComplexArchitectureMaintainNo PgmsAverage Lines of Code
Year 51049315735211372
Year 6969214544821465
Percent change-7.6%-1%-7.6%27.00%6.80%
MSA adds761091454561190
New programs76501005151500
AOB Year 51901272351743,418
AOB Year 6158971911983.25
Percent change-17.00%-24.00%-19.00%14.00%-5.00%

CIS Statistics

Basic Statistics

Year 4Year 5Year 6
Programmer-Analysts (P-A)353131
Total staff size443938
Expenses, w/o 0/H(2), $MM/year(3)$3.80$3.50$3.20
Overhead, $MM/yearNA$1.80$0.80
Total expenses$3.80$5.30$4.00
Number of modules supported8,950(4)9,2509,775
Number of programs supported4,3004,9504,950
MM lines of code5.45.45.75
$MM of inventory @ $25$135$135$144
Average lines per module(5)604584590
Programs moved to production4,4005,5256,760
Compiles/year, batch40,00030,16325,342
Compiles/program/year, batch9.36.15.1
Batch processings/year45,00043,76359,864
Abends/year, batch(6)2,4001,7151,971
Systems supported10610090
Jobs supported2,1002,2501,825
On-line programs1,0001,2001,255
Transactions/year (MM)39.64546.5
Repairs (emergency)1,4001,000977
Staff utilization percent64.90%73.80%69.70%

2 w/o 0/H = without Overhead (the cost of upper management, flowed to all appropriate departments below).
3 MM = Million.
4 1 stack of paper 67 feet high, stretches out to 25 miles.
5 Modules = programs + .JCL files + data files.
6 Abend = abnormal program termination (did not run right)
7 Professionals = Programmer-Analysts + Managers.
8 P/A = Programmer-Analyst.

Quality Indicators
Year 4Year 5Year 6
Abends/batch production run0.0530.0390.033
Abends/month/production module0.0230.0150.017
PATHVU maintainability scoreNA157145

Productivity Indicators
Year 4Year 5Year 6
Modules supported/staff member203237257
Batch processings/P-A(8)128614121931
Transactions/year (MM)/P-A1.131.451.5
Repairs (emergency)/P-A403232
Production moves/module0.50.60.7
Programs to production/month/P-A10.514.918.2
Compiles per P-A, batch1,150973817
Compiles/program moved, batch9.15.53.7
Maintenance cost/year/module$425$378$327
$MM inventory/professional$3.3$3.6$4.0

Basics - Percentage Changes
Programmer-Analysts (P-A)-11.40%0.00%11.4%
Total staff size-11.40-2.6-13.6
Expenses, w/o 0/H, $MM/year-7..9-8.6-15.8
Overhead, $MM/yearNA~-44.4NA
Total expensesNA-24.5NA
Number of modules supported3.45.79.2
Number of programs supported15.1015.1
MM lines of code06.56.5
$MM of inventory @ 82506.76.7
verage lines per module-3.3-1.0-2.3
Programs moved to production25.622.453.6
Compiles/year, batch-24.6-16.0-36.6
Compiles/program/year, batch-34.4-16.4-45.2
Batch processings/year-2.736.833
Abends/year, batch-28.514.9-17.9
Systems supported-5.7-9.0-14.2
Jobs supported7-18.9-13.1
On-line programs204.625.5
Transactions/year (MM)13.63.317.4
Repairs (emergency)-28.6-2.3-30.2
Staff utilization percent13.7-5.67.4

Quality Indicators - Percentage Changes
Abends/batch production run-26.4-15.437.7
Abends/month/production module-34.813.3-26.1
PATHVU maintainability scoreNA-7.6NA
Modules supported/staff member16.78.426.6
Batch processings/P-A9.836.850.2
Transactions/year (MM)/P-A28.33.432.7
Repairs (emergency)/P-A-20.00-20.0
Production moves/module2016.740
Programs to production/month/P-A41.922.173.3
Compiles per P/A, batch-15.4-16.0-29.0
Compiles/program moved, batch-40.032.759.3
Maintenance cost/year/module-11.113.523.1
$MM inventory/professional9.111.121.2

CIS Annual Report


A. Organization Profile and Metrics

StaffMonthly average
Per Diem1.00

Staff movementfor the year
Promotions, In4
Promotions, Out1
New Hires5
Lateral, Out10
Lateral, In4
Co-op Terms10
Charge-out Rate96.60%

B. Business Profile and Metrics

Major Business Groups Supported65
All Modules9,250
Lines of Code5,400,000
Replacement Cost$135,000,000
On-line Transactions45,000,000
Asset Dollar Responsibility$26,000,000,000

C. Business Results and Metrics

Number of OccurrencesMan-daysDollars
Emergency Calls1,000600$270,000
Miscellaneous items2001,050$560,000
CISS-initiated Cost Savings33$570,000


Saturday, March 17, 2007

When companies fail - The Stench of Corporate Death.

[First posted on Who Killed Howard Johnson by Jerry Gregoire.]

I've worked as a contract SysAdmin in a bunch of places in decline - and found I had a knack for pulling the I.T. systems out of crisis. Only as much as one admin can, and only for the short time of my contract.

After the third or fourth company, I realised they are easily recognised... Literally in under 5 minutes!
Just how, later on.

Why do I claim I.T., especially IT Admin/Operations, is an accurate 'mine budgie' or early warning system of a company's 'death spiral'?

I.T. reaches through every part of a business and it's processes. In 2007, after 55 years of commerical I.T., it is still the single biggest point of leverage for most, if not all, businesses. Returns are 3-4 times higher than anything other investment according to a recent NYTimes survery/article. Full ITIF Report - 69pp PDF.

Not only is I.T. everywhere and embodies most of the business processes, but it radically improves staff productivity over the whole business. And once large companies have been successful, with slick (enough) processes and highly productive/effective staff. As well, the management has been at least "good enough" and probably much better.

I.T. Ops and Admin is where the rubber hits the road - where all those glorious I.T. benefits and 'force multiplier' effects are delivered. It is also the poor cousin of everything. Every year savings can be made by not fixing stuff, not replacing old equipment, not upgrading networks, power and cooling. And by reducing 'unnecessary' staff - a few each year. Budget death by a thousand slashes. At TNT, the transport company, they had reduced I.T. staff by 75% after two massive sackings on top of incremental layoffs - and were then building the numbers back up. Of course, the staff that are most needed will be gone early. The best staff are never the last left.

First the staff fall behind a bit on system updates, then doco, then they are busy fixing faults, and then they are flat-out firefighting - and all the time the systems are degrading and efficiency and effectiveness across the whole business is slowly eroded. More large scale outages, each a 'one of a kind', occur. Capacity may fall behind demand, but if the business is shrinking maybe not :-) The 'meta-functions', like Problem/Change/Config management, that prevent or fix problems before they occur and increase I.T. staff productivity, improve system stability and improve Service Quality are long gone or completely subverted.

It takes a while, and it is never anyone's job to detect and correct these global/systemic effects. Nobody measures the 'business benefits' of IT systems - so how could the decline in them be noticed?

And then they hire me - one in a long line of contractor admins - because their Ops staff turnover rate has gone through the roof and they either have young/inexperienced staff or "retired-in-place" barely competents or malapprops. The extra cost of contractors forces budget costs elsewhere, like maintenace, to be shaved - and the spiral intensifies.

The irony here is that all those companies have big, expensive I.T. projects rolling out systems "that will solve all problems" when they deliver. Of course they rarely finish, under-deliver and make no difference... That is, if they could even run on the infrastructure.

Be clear here - the decline of I.T. Operations is a symptom, not a cause, of the Corporate Death Spiral.

What are the symptons, this 'stench of Corporate Death', in the I.T. Admin and Operations area?

- frantic, busy staff - often literally running
- phones ringing off the hook
- fragile, antiquated systems
- messy cabling, machine rooms, lunch rooms
- unlabelled equipment, cabling, media
- backlogs of urgent jobs
- no progress on non-urgent tasks
- one or two 'technical despots'
- disrepectful inter-staff communication
- distant bosses. Don't talk down the heirarchy, only up.
- Only reprimands and upbraiding about mistakes and poor performance.
- No budget for process improvement. Includes refusing any staff initiative, no matter how well justified.
- Poor, incomplete record keeping. Especially software licences.
- Absence of teams and team meetings.
- Trivial, irrelevant incidents escalted to highest levels. Major incidents ignored.
- no staff 'think time', poor designs, ...
- no doco, no systems guides/maps, ...
- no audit or review processes, even informal
- no revision control, config mgt.
- absent or poor backups
- no induction or new starter processes
- lax security and widely shared administrator passwords
- and lots 'perks' for the annointed few.

Why is this so?
What's my take on the specifics of 'poor management' that produce these results?

Obviously complacency and smugness/self-statisfaction can contribute.

What's essential is an upper management 'team' who don't work as a team and where the magic triplet, "ignorance, arrogance and self-delusion/incompetence" reigns.

Ordinary people are very pragmatic - they need to feed their families and pay the bills. If the Corporate Culture says "don't rock the boat", "Bring No Bad News" or "Do Nothing New" - then they won't. And there's nothing in their job description that says they must or should.

Robert E. Kelley identified "The 10 strategies of Star Performers" - the first of which is initiative. Kelley notes 'the stars' are 10-20 times more effective than the average - these are people you really don't won't to lose.
A culture that censures change & improvement is anathema for initiative - these people leave - forcibly or volantarily.

And all these organisations are "highly political", especially the management.

What's political?
When individuals take decisions where there is a conflict between the company interests and their personal interests, whose interests prevail? 'Political' is taking decisions that result in personal benefit at the expense of the Company.
This can be as simple an innocuous as maximising your Frequent Flyer benefits.
In extreme cases, personal benefits prevail even in highly visible/public decisions - like buying expensive, unnecessary luxuries or extravagent displays of wealth/conspicuous consumption.

All the worker drones know the place is on the skids, sigh and just plod along dispiritedly. A few may even emulate the 'leaders' and take whatever they can get...

Not only is worker morale low, this will be a Blame Culture.
Human Beings are brilliant at Game Playing - everyone works out very quickly that it's a far, far better thing to Do Nothing, than to have ever tried and 'failed', even minutely. This bring the wrath of the Blame Daemons raining down on your head, and a indefinite posting to Alaska.

This is expressed as:
- meetings, bloody meetings. "The practical alternative to work and decision making".
- Decision by Committee - no-one's to Blame. Everything is compromised.
- Excessive/needless Bureaucracy.
- Micro-Management and extensive management 'correction/editing' of all work.
- Focus on Irrelevant and Trivial issues, even pursuit of hobbies/play at work.
- Extensive use of 'blame hound' consultants , with no follow-up/follow-through on recommendations/reports.
- Rampant Fifedoms, bullying, harassment and abuse.
- Re-organisations, massive layoffs, cost-cutting drives. Especially irrational ones.
- Ceasation of research and marketing "to reduce costs"

And when did you ever see a managers primary output, decisions, ever reviewed and assessed?
It just "Not done"!
And if it was, there would have to be a system to create 'consequences' - and that requires strength of character and resolve. Which, if it was common in the culture, would not have allowed the situtation to develop in the first place.

Lou Gerstner, who turned IBM around after 1991, talks about 'Execution' as critical.
Death Spiral companies can't 'Execute'. Neither do they have the will to execute.

Poor Management is the unwillingness to take decisions, execute plans, hold others accountable, seek and listen to feedback, communicate clearly and fully, foster and encourage teams, focus time/energy on business, not personal, issues and to consistently place company interests above personal benefits.

All this shows up quickly in I.T. Operations, 'the dispensible cost centre'.

And the Death Spiral builds, feeding on itself until it's impossible to pull-out. Every pilot knows about this!

When disaster is inevitable, the brave rational thing to do is salvage whatever is possible, take a nice severance cheque and pass what's left to new people. What often happens is "Crash and Burn" - nothing left, awful losses and many injured innocents.


Friday, March 16, 2007

CMDB - Not a DB, not 'definitive master'

Ok - this was a real insight to me. Hope it's not Old News :-)
[25-Jan/15-Feb-2007. This PDF from the "CMDB Federation [BMC, CA, Fujitsu, HP, IBM and Microsoft] supports a similar view - "Federated CMDB's".]

Was talking to a friend today who does config, change & release mgt for a large, secure Government Department.
He's struggling with Data Quality issues - surprise, surprise.

They import equipment details from a logistics system, they do a *lot* of shipping to/from 100+ overseas offices.
It doesn't enforce strict product id's - there could be 20 different versions of a single PC product name.

But talking to him further, it's axiomatic that CMDB's will not and cannot ever be the master or definitive source of all information.

CMDB's are, by definition, a join of existing, disparate databases mastered by other products/applications.
The only thing they can be is an 'intelligent collective repository [just made that up].

It reminds me of comments by Jerry Gregoire

When he was CIO of Dell he said :
"you don't want to use an ERP" - takes away your distinctive business process, costs to get in and out of, and ties you into one monster database. And slows you down - you can't change your business quickly if the ERP vendor doesn't support your new function/process.

Dell used an object broker and connected into existing apps & databases.
They implemented a brand-new Inventory system with *zero* new databases. The place has gone downhill since he's left - I sometimes wonder if that's linked.

Jerry also pens pretty straightforward advice which I heartily agree with.


Thursday, March 15, 2007

Why is learning ITIL so hard?

Back from the first week of ITIL Service Managers' Training... Took me 4 days to recover - part of which could be the driving [only 7-8 hours each way].

Realised I needed email access whilst away - and my Palm PDA with 802.11 wireless doesn't cut it for email. Have acquired a laptop, and am creating 'dual boot' setup. Don't trust MS-Windows - especially those in Internet Cafes. Need 'ssh' to access mail.

So why was I exhausted?

Six of us doing the course - and all of us suffered the same. Lack of sleep, 'exam nerves' each day and extreme psychological reaction. At least a number of us seriously thought & discussed ditching the course - a seriously expensive move.

I don't have a good reason...

Everyone [all men] found the experience "intense". We are all used to change, acquiring new information, reading long tracts, writing, solving problems, creating/giving presentations and attending talks/lectures... And doing the odd test.

It's not like the ITIL material is 'deep' or 'difficult' like Queueing Theory [thanks Neil!]
It is broad - there is a lot to cover. Not that many Powerpoint slides [50 a day?]

Still don't know why I came back so wrung out. Not sure if that's a universal experience.

At this point, just have to take a note of the effect and look for other stories/experiences - and keep pondering over it.


Friday, March 2, 2007

The end of the "Silicon Revolution"

This seems to be one of the biggest IT stories not making news and not being actively addressed by Professionals and the Industry.

Neil Gunther writes about the change in the Moore's Law CPU speed constant. Which is why we have "multi-cores"... In 2000 and 2001, Intel released articles flagging thermal effects could be the next barrier - and predicted in 2010 a single CPU consuming 18 kilo-watts. More than ten times the average household power consumption!

Herb Sutter in The Free Lunch is over makes an aside, illustrated with a graph, on the inflection in CPU speed growth curve - in January 2003. Herb was talking about the insidious problems that true, ubiquitous concurrent programming incur.

Commercial IT has been going more than 55 years. It is becoming quite mature. The next big event horizon is end of the Silicon Revolution - when all the physical limits are met for CPU's (speed), memory size, disk size/speed/transfer rate and network bandwidth.

What will over IT Services look like then? IT groups will no longer be able to rely on the back of the rampant technology improvement. They will have to work, hard, to keep improving their figures.

Design will matter. Real talent, skill and understanding will become important. When large companies are spending 12-15% of their Operating Expenses on IT, the ones that can maintain service levels and business effectiveness and shave 1-3% off costs will have a substantial competitive advantage. The savings go straight to the bottom line, adding directly to Nett Profits.

Nett Profits usually lie between 1% - 10% of turnover. The IT savings above will boost whole company profits by 5%-30%.
Which will impress the market analysts.

This definition of Engineering gives the reason: "An Engineer does for a dollar what any fool can do for 10."


How does the "2% Rule" apply to IT and ITIL?

ITIL is about aligning IT with the business needs. IT acts as an internal business supplying customers - who may be captive.

Sometimes IT is the "2% Rule". Price can be (nearly) no object.

People may need a business result and price is not the main criteria. Recognise these situations and react accordingly.

The "utilisation" and "efficiency" of particular IT assets is not of prime importance - the Total Business Result is.

That's why we don't fret over CPU under-utilisation on desktop computers. And why we do fret over 'poor response' on those same machines.

That 'captive audience' of yours can fire you as a supplier: it's called "outsourcing".

It's all about 'perspective'.


The 2% Rule, or 98:2 Solution

You know the '80:20 Rule' - the Pareto Principle - that 80% of faults are caused by 20% of problems. Here's a similar rule: Items under 2% of budget get different rules.

Every organisation has it's "core business" and will/should actively monitor and control those inputs/supplies/services to remain profitable. They will be very sensitive to their major selection criteria on those.

But for the 'incidentals', different economics and criteria apply.

As a supplier or consumer, recognising which situation applies will help you greatly by being able to tailor your services to the consumers needs and maximise your profits.

If you're a builder and you can buy the same building products for 20% less with no other penalties, then there has to be a very strong reason to not change. Inertia isn't a strong business reason. "Family connections" probably are.

But what about those inputs/supplies/services that you don't use every day? The one and two percenters? The necessary 'noise'.

My plumbers charge $160/hour. If they need some printing done occasionally, how much time can be spent looking for 'the best deal'? The savings have to beat $160/hr spent. Taking a half-day to save 20% on a $1,000 printing job is a nett loss of $450!. (gain: $200, cost: 4*$160 = $640)

For the "1-2%" inputs, cost price is the least important determinant. Total Cost, including opportunity losses/forgone revenue etc, has to be used for a realistic economic comparison.

For the little things, the incidentals, most people put first one of:

  • quick or available
  • good or high-quality
  • close
  • reliable

For some people, it is always about the price. People on fixed-incomes are "time rich, money poor". They usually fall into this category. Others may be wealthy and always be "careful" or "tight" - there are no set rules for behaviour.

When the hotwater system has failed or the roof has flooded, you need someone Now! Someone who's going to do a Good Enough job, Real Soon. Even if they charge double you're probably happy to pay the money. The downside is not trading for one or more days - way more expensive.

If it's a service, like an accountant, that you are going to be using over an extended period and it's critical to your business, you may take some time over the decision and be very particular in your criteria and trade-offs. It's worth an hours' travel and 30% more to get the best advice and save a lot more!

The "2% Solution" has two business impacts:
  • In your marketing analysis, decide how much of your business is not decided on price alone - and build package of service and price accordingly.
  • Relationships are why do/don't return for repeat business. It's also why people ask for and give 'recomendations'.

The "2% Solution" can also inform what business segment you decide to be in.
If you are an Engineering and Construction firm and specialise in (steel) piplelines, you can choose to enter the "high volume/low margin" end of the business - supplying and laying the long straight bits, or enter the "low volume/high margin" end - building the complex valve/joiner units.

You can make good money at both ends.
Low Margins arise because of fierce competition - everyone can do the technical side.
High Margins are allowed when there is little competition - because the technicalities of the job are demanding/exacting.
Both ends need good management and tight fiscal control to remain profitable.

Technology has the horrible habit of quickly making the esoteric into the ordinary - being a technology leader as a point of differentiation means you can't stay still.

In 1991, it cost $10,000 for a CD-Writer and about as much for a disk that would store those 600Mb.
15 years later, writers were under $100 and disks over 20 times bigger for $150.