The Symbol Table

Photo by Ben White on Unsplash

The Symbol Table

·

4 min read

Progress

Well, it's been just over a year since my last blog post, I've been working on the Symbol Table (also changing jobs - so a bit distracted).

Review of Grammar

I've had the standard ANTLR grammar reviewed and all looks good. So I progressed with the main symbol table work. All this code is all up in GitHub.

What is a Symbol Table?

What do I intend to use the Symbols and SymbolTable for exactly? Well, as source code is broken up in to Tokens, those Tokens are used by the Parser to match up against the grammar. In effect ANTL4 will build a 'abstract syntax tree' (AST) for me.

The ANTLR4 API then allows me to plug in 'visitors' and/or 'listeners'.

So when the AST is created by something from EK9 code like:

...
  someIntegerValue <- 42
...

Here is what the AST looks like:

AST2.png

This is where it starts to get interesting for me. There are 'built-in' concepts such as String and Integer for example; that the EK9 language just must 'know about'. But how does it know about them? Also, how does the compiler 'know' 42 is an Integer?

If you look at the AST above, you can see that the great work done in ANTR4 by Terence Parr together with the EK9 grammar enables me to 'see' that 42 is an 'integerLit'. i.e. a literal of type Integer.

More importantly when a developer creates a new 'class' or 'function' then the compiler must then know about those new types as well.

This is where a symbol table comes in. Those Symbols are actually types and so need to be recoded somewhere as types, this is so that when we declare a variable of a particular type; we can resolve it.

So for the compiler to be able to deal with a statement as simple as someIntegerValue <- 42 I need to have defined:

  • The grammar
  • The Lexer
  • The Parser
  • The ANTLR4 Visitor
  • A Symbol for type Integer
  • A SymbolTable for EK9 where that Integer type can be recorded
  • Also a SymbolTable where the variable 'someIntegerValue' can be recorded and linked to its type

But as soon as I want to do something with the variable someIntegerValue, like add another integer value to it, I'll need some operators on the Integer type. For that I'll need the type Integer to be an Aggregate (with methods/operators).

Boot strapping the SymbolTable

There is more than one SymbolTable, in fact there will be thousands. But there is only one main global SymbolTable that is part of the EK9 language. So the first job I have to do is define the concept of the SymbolTable and the idea of a Symbol.

The SymbolTable has a Scope this is sort of like a 'prefix' in programming terms like a 'namespace'. So it has a 'name', these names have to be unique. Like a module name for example.

It is within this Scope we need to define a Symbol. Now we can just keep a list of these Symbols. The idea that some Symbols may clash may depend on the type of Symbol being defined.

So for example, I've described String and Integer, clearly as types, these must be unique. But let's consider Methods on a Class; where we allow method overloading. In that case; we would have several Methods (which will also be Symbols) with the same name, but with different parameters in the same SymbolTable.

So that's the job I'm working on at the moment, boot strapping the main EK9 Symbol Table with all the standard built-in types and then some of the main built-in Functions and Classes.

In my prototype compilers; I actually used Java and reflection to do this, but now I'm moving to the first reference compiler - I'll define the Symbols in terms of Aggregates in a more abstract way. Only if the final compiled output targets Java will the appropriate Java class be employed. This will allow me to target different runtimes, initially it will be Java, but I want to be able to target LLVM as well (at least).

Resolving Symbols

Once you have SymbolTables then next thing you need to be able to do after defining a Symbol is resolving a Symbol. But you have to bear in mind that programs tend to have nested structures. For example:

  • Global (built-in EK9)
  • module (the developers application)
  • class
  • method
  • block

So when you want to resolve a symbol from within the block above, it is necessary to resolve Symbols back up that nested structure; right up to the Global (building-in EK9 types).

Summary

So that's where I am at the moment, writing lots of Symbols, Scopes and many tests. I hope to get the bulk of the SymbolTable done this year, but it'll depend on how draining my new job is!