SSI Embedded Systems Trust Facts

Fact: Over 50% of SSI Staff has 8 or More Years Experience

"You can count on the work being done right and on time." ~ SSI Client

Fact: 30% of SSI staff has twelve or more years experience

"The people blended in and became part of the design team."

Fact: Experienced Team Lead manages all projects

"I was impressed with the quality of SSI's work throughout this contract. I appreciated their ability to work with minimal guidance, and minimal design details. And I greatly appreciated their suggestions and alternate design proposals. Their willingness to work overtime when necessary to meet deadlines was greatly appreciated." ~ SSI Client

Fact: All SSI staff is trained on the Quality Development Process

"Work is spectacular. More exceptional than I thought it would be. Sensors work great. They can detect the slightest pressure accurately. It is WAY beyond my expectations." ~ SSI Client

Fact: Long term staff retention rate at SSI is > 95%

Fact: SSI's Internal Software process operates as CMM Level 3

"The experience was a good one" ~ SSI Client

Fact: SSI has been in business 16+ years

"The best thing about working with SSI is that the people genuinely care about the success of the overall project. Typical contractors stop when they have merely met the letter of the contract that they are bound to; where as, SSI is willing to go above and beyond to drive a project to completion and ultimately success." ~ SSI Client

Fact: Over half of SSI Business is from returning clients

"I would use SSI again" ~ SSI Client

Fact: SSI client relationships last several years

"The best thing about working with SSI is that the people genuinely care about the success of the overall project. Typical contractors stop when they have merely met the letter of the contract that they are bound to; where as, SSI is willing to go above and beyond to drive a project to completion and ultimately success." ~ SSI Client

Fact: SSI's Customers report a consistent 4.5 out of 5 rating for satisfaction

"Keep finding people who can get to the root of the issue and resolve it as specified. [Your Engineer] is golden!." ~ SSI Client

Fact: 100% of clients surveyed said they would recommend SSI to others

"SSI will deliver what is promised on a timely basis." ~ SSI Client

Fact: SSI Supports Continual Employee Training

"Consistent high quality engineers who perform very well. SSI is the only company -- contracting or consulting -- who consistently provides the best talent who are not only technically accomplished but have excellent verbal and personal skills. I have not seen anything like it before." ~ SSI Client

SSI Embedded Trust Facts

Real-World Firmware Disasters!
By Jack Gannsle

Mars Polar LanderConsider the Mars Polar Lander, a 1999 triple failure. The MPL’s goal was to deliver a lander on Mars for half the cost of the cost of the spectacularly successful Pathfinder mission launched two years earlier. At $265 million Pathfinder itself was much cheaper than earlier planetary spacecraft.

Shortly before it began its descent, the spacecraft released twin Deep Space 2 probes which were supposed to impact the planet’s surface at some 400 MPH and return sub-strata data.

MPL crashed catastrophically. Neither DS2 probe transmitted even a squeak.

The investigation board made the not-terribly-earth-shaking observation that tired people make mistakes. The contractor used excessive overtime to meet an ambitious schedule. Mars is tough on schedules. Slip by just one day past the end of the launch window and the mission must idle for two years. In some businesses we can dicker with the boss over the due date, but you just can’t negotiate with planetary geometries.

MPL workers averaged 60 to 80 hours per week for extended periods of time.

The board cited poor testing. Analysis and modeling substituted for test and validation. There’s nothing wrong with analysis, but testing is like double-entry bookkeeping – it finds modeling errors and other strange behavior never anticipated when the product exists only as ethereal bits.

NASA’s mantra is to test like you fly, fly what you tested. Yet no impact test of a running, powered, DS2 system ever occurred. Though planned, these were deleted midway through the project due to schedule considerations. Two possible reasons were found for Deep Space 2’s twin flops: electronics failure in the high-g impact, and ionization around the antenna after the impacts. Strangely, the antenna was never tested in a simulation of Mar’s 6 torr atmosphere.

While the DS2 probes were slamming into the Red Planet things weren’t going much better on MPL. The investigation board believes the landing legs deployed when the spacecraft was 1500 meters high, as designed. Three sensors, one per leg, signal a successful touchdown, causing the code to turn the descent engine off. Engineers knew that when the legs deployed these sensors could experience a transient, giving a false “down” reading… but somehow forgot to inform the firmware people. The glitch was latched; at 40 meters altitude the code started looking at the data, saw the false readings, and faithfully switched off the engine.

A pre-launch system test failed to detect the problem because the sensors were miswired. After correcting the wiring error the test was never repeated.

Then there’s the twin Mars Expedition Rovers, Spirit and Opportunity, which at this writing have surpassed all mission goals and continue to function. We all heard about Spirit’s dispiriting shutdown when it tried to grind a rock. Most of us know that the flash file system directory structure was full. VxWorks tossed an exception, exactly as it should have and tried to reboot. But that required more directory space, causing another exception, another reboot, repeating forever.

Just as in unlamented DOS deleted files still consumed directory space. A lot of old files accumulated on the coast phase to Mars still devoured memory.

Originally planned as a 90 day mission, the spacecraft were never tested for more than 9 days. In-flight operation of motors and actuators generated far more files than ever seen during the ground tests. The investigators wrote: “Although there was limited long duration testing whose purpose was to identify system memory consumption of this type, no problems were detected because the system was not exercised in the same way that it would later be used in flight.”

Test like you fly, fly what you tested.

Exception handlers were poorly implemented. They suspended critical tasks after a memory allocation failure instead of placing the system in a low-functionality safe mode.

A source at NASA tells me the same VxWorks memory allocation failure has caused software crashes on at least 6 other missions. The OS isn’t at fault, but it is a big and complex chunk of code. In all cases the engineers used VxWorks incorrectly. We seem unable to learn from other people’s disasters. We’re allowed to make a mistake – once. Repeating the same mistake over and over is a form of insanity.

It’s easy to blame the engineers, but they diagnosed this difficult problem using a debugger 100 million miles away from the target system, found the problem, and uploaded a fix. Those folk rock.

Launch Failures

Titan IVBIn 1999 a Titan IVb (this is a really big rocket) blasted off the pad, bound to geosynchronous orbit with a military communications satellite aboard. Nine minutes into the flight the first stage shut down and separated properly. The Centaur second stage ignited and experienced instabilities about the roll axis. That coupled into both yaw and pitch deviations until the vehicle tumbled. Computers compensated by firing the reaction control system thrusters… till they ran out of fuel. The Milstar spacecraft wound up in a useless low elliptical orbit.

A number of crucial constants specified the launcher’s flight behavior. That file wasn’t managed by a version control system… and was lost. An engineer modified a similar file to recreate the data but entered one parameter as -0.1992476 instead of the correct -1.992476. That was it – that one little slipup cost taxpayers a billion dollars. At least there’s plenty more money where that came from.

We all know to protect important files with a VCS – right? Astonishingly, in 1999 a disgruntled programmer left the FAA, deleting all of the software needed for on-route control of planes between Chicago O’ Hare and the regional airports. He encrypted it on his home computer. The feds busted him, of course, but FBI forensics took 6 months to decrypt the key.

Everyone makes mistakes, but no one on the Centaur program checked the engineer’s work. For nearly 30 years we’ve known that inspections and design reviews are the most powerful techniques known to prevent errors.

The constant file was never exercised in the inertial navigation system testbed, which had been specifically designed for tests using real flight data.

Test like you fly, fly what you tested.

A year later Sea Launch (check out the cool pictures of their ship-borne launch pad at www.sea-launch.com) lost the $100 million ICO F-1 spacecraft when the second stage shut down prematurely.

The ground control software had been modified to accommodate a slight change in requirements. One line of code, a conditional meant to close a valve just prior to launch, was somehow deleted. As a result all of the helium used to pressurize the second stage’s fuel tanks leaked out. Pre-flight tests missed the error.

Test like you fly, fly what you tested.

This failure illustrates the intractability of software. During countdown, ground software monitored some 10,000 sensors, issuing over a million commands to the vehicle. Only one was incorrect, a 99.9999% success rate. In school a 90 is an A. Motorola’s famed six sigma quality program eliminates all but 3.4 defects per million. Yet even 99.9999% isn’t good enough for computer programs.

Software isn’t like a bridge, where margins can be added by using a thicker beam. One bit wrong out of hundreds of millions can be enough to cause total system collapse. Margin comes from changing the structure in sometimes difficult ways, like using redundant computers with different code. In Sea Launch’s case, perhaps a line or two of C that monitored the position of the valve would have made sense.

Robert Glass in his Facts and Fallacies of Software Engineering (Addison-Wesley, 2002, ISBN 0321117425) estimates that for each 25% increase in requirements the code’s complexity explodes by 100%. The number of required tests probably increases at about the same rate. Yet testing is nearly always left till the end of the project, when the schedule is at max stress. The boss is shrieking “ship it! Ship it!” while the spouse is wondering if you’ll ever come home again.

The tests get shortchanged. Disaster follows.

The higher levels of the FAA’s DO-178B safety critical standard require code and branch coverage tests on every single line of code and each conditional. The expense is staggering, but even those ruthless procedures aren’t enough to guarantee perfection.

SUBSCRIBE TO NEWS & EVENTS
The Real Time Review brings you the latest embedded software news and technical articles - published approx. six times throughout the year.
> VIEW ALL NEWS

RSS Feed
A Certified Women's Business Enterprise and Member of the Illinois Technology Association