Wednesday, September 2, 2009

Junk Code

The human genome contains three billion base pairs, the DNA letters in which the code of life is written. Yet only a tiny proportion of these letters -no more than 2 per cent- are actually used to write our 21,500 or so genes. The remainder, which makes none of the proteins that drive the chemical reactions of life, has long been something of a mystery. Its apparent lack of function has led it to be dubbed 'junk DNA'.

Much of our junk DNA has origins that have been relatively simple to establish. A very large part of it belonged originally to viruses, which have incorporated their own genetic codes into our genome in order to reproduce.

The legacy of our viral ancestors can also be seen in so-called retrotransposons. These repetitive chunks of DNA, which were originally deposited by viruses, have the ability to copy themselves into the human genome again and again, using an enzyme called transcriptase.

In some ways, the continued presence of this junk DNA is not surprising: DNA is 'selfish', and will replicate itself regardless of utility to its host organism. But for it to withstand natural selection, some of it must surely be functional.

Software made by humans is like synthetic DNA. Remarkably junk code also exists in software systems made by us. In fact in evolutionary design methodologies such as Scrum it seems more likely that imperfect code being copied to incoming generations of software.

Scrum sprints typically follow a short 4-6 week iteration pattern. In Scrum the focus is on delivery, therefore junk code may get more chance of being copied compared to quality focused methodologies.

I don't think however junk code makes evolutionary methodologies less successful. The advantages of being agile has more survival value than less adaptive quality driven methodologies.

Consider this real life example:

Sprint1 in Project P ends, the architect A approaches the developer D.

A- Hey.. Something drew my attention, you guys hard-coded the magic-token in your module. Whereas there is a common function in our library which you should have used to retrieve the magic-token. How come this happened?

D- Oh. We've just cut and pasted the code from project Q. They also used the hard coded magic-token.

A- Ahh. I see. Damn. They should have used the library too. I wonder how we may clean up this mess.

D- I can fix this easily now in project P.

A- No, wait.. Fix it in the next sprint. I'll talk to other teams see if I can get them fix their code too.

As you probably guessed the hard-coded magic-token in this example is junk code. Like Junk DNA it possesses a powerful intrinsic ability to get itself copied into other projects/sprints i.e. other generations of code (perhaps because people find it more reassuring to use magic-tokens than using functions returning them). It seems despite A's intention to clean up the code once and for all, in practice it may be logistically not viable and quite expensive to clean up dozens of other projects' code across all versions from this viral but relatively harmless junk code. So it is likely that the fix will only be made in project P (provided that it is remembered in sprint2), and the viral code will continue to survive.

Evolutionary design in software systems is remarkably similar to evolution by natural selection. More importantly in software projects aiming at perfect design would almost certainly limit software's adaptive power and consequently diminish its competitive value.

50 Genetic Ideas, Mark Henderson
Parasitism, Wikipedia
Agile Software Development, Wikipedia
Agile/Scrum Development, Wikipedia

1 comment:

Bob MacNeal said...

I like your use of biological systems as a lens to understand what we do.

Junk code and junk DNA is a nice analogy.

I suppose junk code sticks around because of the laws of optimal foraging ( That is, optimal foraging would say that a developer wouldn't bring a net gain of energy to the team by removing it.