What's involved in creating a programming language
The main driving forces for a programming language
Introduction
Having used various programming languages over the last 30 or so years; I found there were features in some languages that really helped productivity. But I also found there were inconsistencies and syntax that caused defects in programs that were not always obvious.
Application of Programming Languages
There was a time when getting a computer to complete any task was a major achievement. In such times, when resources such as CPU power, disk/memory were very constrained; it was necessary to create code that would enable easy translation into machine instructions. However, it is now possible to create small applications with relative ease that can deliver significant functionality. So the move from:
printf("Hello, World")
to dynamic web pages has encouraged more people to get involved in computer programming. But many programmers struggle with the historic baggage that the genus of those early programming languages still carry forward.
Concepts and Syntax
Some of the concepts in languages have proven useful or even essential over time. There is commonality irrespective of programming language. Other parts of a language are just different ways of expressing the same or similar concepts.
Most new entrants to computer programming are really thrust straight into a specific language syntax (be it C, C++, Java, C#, Javascript, Python, Lisp or Prolog, etc). Before long most developers just accept the shortcomings or foibles of their selected language.
So really the first task in creating a new programming language is to remove some or all of the existing baggage; until it is possible to distil the bare minimum number of concepts and syntax to be able to create machine instructions.
The Building Blocks
The following are the two main ingredients of a programming language:
- Data Structures (or types)
- Processing Instructions (computation/flow control)
The question is then how to express and arrange these building blocks in a coherent and logical way. It is also important to accommodate how people comprehend information in concept, syntax and general instructions outside of computer programming.
While the original computer programmers were mainly mathematicians and scientists; today individuals entering into 'IT' are from various backgrounds and are much younger with fewer academic qualifications.
Building on what many have already been taught in general terms and also what comes most naturally, leads to the conclusion that most people understand:
- Structure, of the form of a postal address for example
- Sequence, of the form of a recipe or directions to an address
- Classification, of the form of plant/animal or cat/dog
It is these aspects that have lead to the adoption of imperative languages being more dominant over declarative languages (such as Lisp and Prolog). Those imperative ideas are just easier to understand for most people.
Data Structures
A data structure could be as simple as a whole number or a postal address (an aggregate of data items). It could also be a collection of data items, like a List of Address for example. Almost all programming languages contain such concepts and this basic building block is something most people can readily accept and comprehend.
Processing Instructions
It is the area of processing instructions that can cause confusion. For example:
x = 2
y = x + 4
In which both statements seem simple enough, but this is where the first major discord takes place. For a declarative programmer the next statement makes no sense:
x = 3
How can 'x' now be set to something different? Does this now mean 'y' should be reevaluated to be '7'?
The imperative programmer accepts the above as just an ordered sequence of statements, once 'y' was calculated with the value of 'x' was 2 that's all done and dusted. Now let 'x' be set to 3 and do something else with it.
Most people can and do accept 'x' can be set to something different. After all they have been used to completing sets of mathematical questions during their schooling. Albeit in a new and separate question!
Arrangement
The final part of this first post relates to how the data structures and processing instructions should be used and arranged in relation to each other.
There are really two schools of thought on this.
- Data structures are separate from processing (functional programming)
- Data and processing should be related (Object Oriented programming)
When you look in detail as most modern languages; you can see there is now an acceptance that both paradigms have pros and cons. Even Java has gone someway to accept functional programming (though not in that graceful a manner).
There are times it is appropriate to link state and behaviour (object-oriented); there are also times when it is more appropriate to have multiple functions operate on a data structure.
Conclusion
The first part of What's involved in creating a programming language has now been discussed and the conclusion is:
- Imperative programming language
- Strongly typed
- Both Object-oriented and functional programming model
That conclusion can be drawn because a new programming language has to target the very widest range of people.
So it sound's like it is possible to use Python, C++, Typescript or Scala without developing a new language then? Well there are other aspects to a language that are still to be addressed in terms of:
- Syntax (symbols, punctuation, layout)
- Ease of getting started but facilitating more advanced concepts
- Flow control constraints, error handling (known gotchas)
- Type safety and type inferrence (casting, type checking)
- Internal state (global variables, coupling and cohesion)
- Simplicity (nil/null NaN or is set)
- Reuse (packaging, modules, dependencies)