Strong typing in programming languages

Strong typing can reduce defects and make code more readable

·

10 min read

Background

In the previous post I removed all syntax and language content to focus on just:

  • Structure, in the form of an aggregate
  • Sequence, in the form of an ordered number of steps
  • Classification, in the form of a type

Most programming languages have the attributes above (they are pretty much essential). But it is the last attribute, that of type; where languages vary in their use.

In this post I'll articulate why I think strong typing is important. Strong typing is the combination of Structure and Classification. For example an Address type could be defined as:

  • House Number
  • Street Name
  • City
  • Region
  • Postcode (Zipcode)

The above defines an aggregate called Address. Most people would not expect that type to fundamentally change in structure once declared. The values might not be set; but most people would not expect 'City' to no longer exist in some contexts and reappear in others. Some programming languages do allow this.

We are used to the idea that once something is classified it will; in general remain of that classification unless some significant event occurs to transform it. Following that event it is normally reclassified in some way.

So in line with creating a programming language that most people will be able to adopt and understand; I think it is quite important to base the language on that principle. Moreover we look to create order and classification almost by nature and without detailed thought. Humans are almost always looking to group and classify things they observe, it is in our nature.

Why Types

Even languages that are dynamic in nature like Javascript, do have types. It's just that those languages tend to be a little 'free-and-easy' with type conversions.

That is to say Javascript will allow (snip from @codingyuri original post)

let num1 = 5;
let num2 = 7;
num1 = num1.toString();
let total = Number(num1 + num2)

So you may ask, why is this strange? While not explicit num1 and num2 have been classified as Number. When I say classified I mean most people now create a mental mapping that something called 'num1' (a variable) is of type Number. The Javascript runtime has also classified them and a Number (typeof shows this).

But when the 'toString()' operation is used it returns back a value of String type. This is assigned to 'num1'. This is the equivalent of writing:

let num1 = "5";

Implicit type changing

So the type of 'num1' has now been altered from a Number to String. You could argue calling 'toString()' was the significant event that allows this. But other than the name 'toString()' there was nothing to inform the developer that the type would be would be changed.

Variable reassignment

There are various schools of thought around limiting variable reassignment, some say it leads to defects or code that is hard to read and understand.

But you will probably find that when you return to review code like the above; it takes time to 'grok'. Not only are variables being reused through the processing steps but they are also changing type. That's a lot of change going on and quite a bit of that change is implicit. This means you have to comprehend those implicit changes and mentally apply them through to processing steps.

Operators and implicit type coercion (promotion)

The part of the example above that deals with creating the 'total' shows implicit type coercion.

let num2 = 7;
let total = Number(num1 + num2)

The + operator is being used with 'num1' which has been type altered from a Number to a String and so now has the meaning of 'append'; rather than mathematic 'addition'. Enabling polymorphic use with operators is normal and natural; again it fits with our natural human traits of looking for symmetry and consistency.

But 'num2' is still of type Number and so will be silently coerced to String type. The result will be "57" and passed into the Number constructor which will create the Number 57.

An aside

If we were to alter the + operator to * then we would get mathematical multiplication; which requires 'num1' is silently coerced back to a Number. The variable 'total' will now have the value 35. For such a small change - changing what was a simple operator; the effect has been to transform the expression in a non-obvious manner.

Refactoring

If the code were now refactored as show below:

let num1 = 5;
let num2 = 7;
let total = Number(num1 + num2)
num1 = num1.toString();

The value of 'total' would be 12 rather than 57. The change to the processing is significant for such a simple change.

Why is this behaviour not desirable?

It is not desirable because it forces the developer to keep an unwritten mental model of what type each variable actually is (as it may change). By allowing the reassignment and changing of type of a variable the reader has to keep track of where these changes take place. Moreover the implicit type conversions that take place alter their semantic meaning depending on both the types involved and the operator in question.

In short you can't just look at the code and it be simple and self documenting.

The intention here is not to denigrate other languages, but to attempt to highlight the reasoning why strong typing is more natural for Humans. It is what we do naturally and without conscious thought most of the time.

What is the alternative

The simple alternative is to ensure that once a variable has an established type; it cannot be changed. This is much easier to understand; as once you know what the type of a variable is -- that's it.

So reusing 'num1' which is a Number to hold a String would not be allowed.

num1 = num1.toString();

Would result in an error, the developer would be informed 'num1' is of type Number and therefore the result of 'toString()' (which is a String) cannot be assigned to it.

The coercion of a Number to a String as shown below should not be implicit.

let total = Number(num1 + num2)

This too should result in an error being issued, i.e. It is not possible to add a Number value to a String value. If you wished to do this then your conversion should be explicit (maybe through the use of 'toString' or an operator such as '$').

Too Verbose

The argument is not to introduce too much 'verbosity' when creating structures/types or sequences of instructions. But to make statements much more explicit in nature. For example, there is little wrong with this:

let num1 = 5;

It is obvious 'num1' is a Number type and by using 'let' it is clear that it is being declared. It just 'num1' must now always be a Number while in scope. Alternatives might be:

Number num1 = 5;
num1: Number = 5;
num1 as Number = 5;
num1 <- 5

All of which declare a variable called 'num1' as a Number type and initialise that variable to have the value of 5.

So the argument is not to make declarations or syntax more verbose or 'bulky' but to ensure types don't change. When types can be safely inferred they can and should be. But there are times when it is necessary/desirable to be explicit as to what type are variable should be. Typically this is necessary when working with polymorphic types.

Polymorphism

There has been much discussion around polymorphism in various intellectual materials and blogs. It is seen by many as the real value of Object Oriented programming (irrespective of language). But it can be argued that the real value is:

  • The ability to call functions without depending on access to the source code of that function
  • The ability to have or present multiple different perspectives of some software entity

What do those statements mean in practical terms and why are types important?

Expression of relationships

Types give us a mechanism to express polymorphic relationships. This can take the form of inheritance or even exhibiting the same or similar traits. Some languages, such as Python use 'Duck typing' which is really implicit typing based on matching signatures. This does not really enable the expression of relationships in an explicit manner.

An example in C

While C is not an Object Oriented language it still has the capability to provide:

  • Structure, in the form of a 'struct'
  • Sequence, in the form of a 'function'
  • Classification, in the form of both 'struct' and 'typedef'

Moreover when used in a thoughtful and progressive manner it can provide various forms of polymorphism. Importantly this is done without C being an Object Oriented programming language.

#include <stdio.h>
typedef void (*operation)(int);
void fun1(int a) {
    printf("Value of a is %d\n", a);
}
void fun2(int a) {
    printf("a has the value of %d\n", a);
}

operation getFunction() {
    return &fun2;
}

int main() {
    getFunction()(10);
}

As can be seen in the example above strong typing aids safe use of pointers to functions in C. The 'main' function can call 'getFunction' and be assured that the result will be compatible with 'operation' a function that accepts a single int as a parameter.

Through the use of typedef the code above has created an abstraction for a function signature. In may ways this can be viewed as polymorphism. The 'main' function through 'getFunction' has become decoupled from which actual function it will be calling. It is now possible to use 'operation' as a type and pass variables around that are of that type.

operation op = getFunction();

In the above example the variable 'op' has the type operation. I am not advocating using C as an Object Oriented programming language. There is a reason C++, C#, Java and other OO languages were created after all.

The same example in Javascript

getFunction()(10)

function fun1(a) {
    console.log('The value of a is ' + a);
}

function fun2(a) {
    console.log('a has the value of ' + a);
}

function getFunction() {
    return fun2;
}

Discussion

The Javascript language has much going for it and the removal of the need for types everywhere means that coding can be much simpler and more dynamic. But it comes at a cost; safety. Refactoring code needs much more care and attention, altering the number of parameters of 'fun1' for example may cause errors at runtime.

In the C example the C compiler checks this at compilation time. If we were to alter the code above as follows:

function fun1() {
    console.log('The value of a is ' + a);
}

Then run the example, all would still be fine because 'fun1' has not been called. Should we then alter our code again.

function getFunction() {
    return fun1;
}

Only now would we get the error (at runtime).

Conclusion

There are significant disadvantages in having to use strong types everywhere, they require definition, variable declarations can be more verbose and generic (parametric programming) is much more verbose and complex.

But they can provide compile time safety and facilitate the expression of relationships. They are much more natural for humans as we tend to classify and categorise most things.

So putting additional definition requirements, slightly more verbose code issues to one side; the natural and logical effect strong types bring together with increased safety is worth the price.

Needs already catered for

This conclusion won't suit everyone and there are languages such as Python and Javascript that are much freer with respect to types.

But then there are numerous languages that have strong types already, such as C, C++, C#, Java, Scala, Kotlin, Dart, etc. So why create another language that is 'just like those'?

The next few blog posts will make this clearer. But do far I've not really added anything new other than to fall into the 'compiled', 'strong typing' camp of languages.