Progress
Well, it's been just over a year since my last blog post, I've been working on the Symbol Table (also changing jobs - so a bit distracted).
Review of Grammar
I've had the standard ANTLR grammar reviewed and all looks good. So I progressed with the main symbol table work. All this code is all up in GitHub.
What is a Symbol Table?
What do I intend to use the Symbols
and SymbolTable
for exactly? Well, as source code is broken up in to Tokens
, those Tokens
are used by the Parser
to match up against the grammar. In effect ANTL4 will build a 'abstract syntax tree' (AST) for me.
The ANTLR4 API then allows me to plug in 'visitors' and/or 'listeners'.
So when the AST is created by something from EK9 code like:
...
someIntegerValue <- 42
...
Here is what the AST looks like:
This is where it starts to get interesting for me. There are 'built-in' concepts such as String
and Integer
for example; that the EK9 language just must 'know about'. But how does it know about them? Also, how does the compiler 'know' 42 is an Integer
?
If you look at the AST above, you can see that the great work done in ANTR4 by Terence Parr together with the EK9 grammar enables me to 'see' that 42 is an 'integerLit'. i.e. a literal of type Integer
.
More importantly when a developer creates a new 'class' or 'function' then the compiler must then know about those new types as well.
This is where a symbol table comes in. Those Symbols
are actually types and so need to be recoded somewhere as types, this is so that when we declare a variable of a particular type; we can resolve it.
So for the compiler to be able to deal with a statement as simple as someIntegerValue <- 42
I need to have defined:
- The grammar
- The Lexer
- The Parser
- The ANTLR4 Visitor
- A Symbol for type
Integer
- A SymbolTable for EK9 where that
Integer
type can be recorded - Also a SymbolTable where the variable 'someIntegerValue' can be recorded and linked to its type
But as soon as I want to do something with the variable someIntegerValue, like add another integer value to it, I'll need some operators on the Integer
type. For that I'll need the type Integer
to be an Aggregate
(with methods/operators).
Boot strapping the SymbolTable
There is more than one SymbolTable
, in fact there will be thousands. But there is only one main global SymbolTable
that is part of the EK9 language. So the first job I have to do is define the concept of the SymbolTable
and the idea of a Symbol
.
The SymbolTable
has a Scope
this is sort of like a 'prefix' in programming terms like a 'namespace'. So it has a 'name', these names have to be unique. Like a module name for example.
It is within this Scope
we need to define a Symbol
. Now we can just keep a list of these Symbols
. The idea that some Symbols
may clash may depend on the type of Symbol
being defined.
So for example, I've described String
and Integer
, clearly as types, these must be unique. But let's consider Methods
on a Class
; where we allow method overloading.
In that case; we would have several Methods
(which will also be Symbols
) with the same name, but with different parameters in the same SymbolTable
.
So that's the job I'm working on at the moment, boot strapping the main EK9 Symbol Table
with all the standard built-in types and then some of the main built-in Functions
and Classes
.
In my prototype compilers; I actually used Java and reflection to do this, but now I'm moving to the first reference compiler - I'll define the Symbols
in terms of Aggregates
in a more abstract way. Only if the final compiled output targets Java will the appropriate Java class be employed. This will allow me to target different runtimes, initially it will be Java, but I want to be able to target LLVM as well (at least).
Resolving Symbols
Once you have SymbolTables
then next thing you need to be able to do after defining a Symbol
is resolving a Symbol
. But you have to bear in mind that programs tend to have nested structures. For example:
- Global (built-in EK9)
- module (the developers application)
- class
- method
- block
So when you want to resolve a symbol from within the block above, it is necessary to resolve Symbols
back up that nested structure; right up to the Global (building-in EK9 types).
Summary
So that's where I am at the moment, writing lots of Symbols
, Scopes
and many tests. I hope to get the bulk of the SymbolTable
done this year, but it'll depend on how draining my new job is!