Scrift — Part 1: Getting Started

Dane Walker

5 min readMar 7, 2022

Welcome back to the next part of the Scrift series. This is part 1, where we will attempt to lex a basic document.

The Journey Begins

Before we begin building our lexer, we first need something to lex.

Here’s a basic document written in version 1 of Scrift:

Okay. So let’s see what we’ve got.

First off, we can see there are three main token types in this example.

document and section are both keywords;
title, author and date are identifiers; and
anything between double quotes are strings.

What you might notice that is different to many other languages is the lack of curly braces. Meaning Scrift, like python and imba, relies on whitespace and newlines to identify parents and their children.

Finally Some Code

Let us first begin by defining the different token types that we need.

Here we create our TokenKind of type int, which we then use to define the different tokens.

Then we create a map so that when we print our token kind out, we get a string rather than the integer assigned to the token kind.

Let us also define the two keywords for our example file.

We don't care too much about lexing the newlines and whitespace as separate tokens so instead, we will combine them into a single token and set there value to the count that we found.

Now we have the basic blocks for our document we can begin lexing.

We expect our output to look like the following.

First Steps

First of all we need to read in the source file. To do this we will use a bufio reader as using os.ReadFile() reads the entire source into memory, limiting the size of the source file we would be able to read. Let us create the bufio.Reader into a new Source struct .

Next we will create the lexer struct .

We will then create a function to create a new lexer for us.

Now we will define some functions to help with logs.

Okay. Now we will create functions to help us navigate between characters within the source file.

We will also create a function that resets the line and column count for when we encounter a new line.

Now we will also create some functions that will help us identify whether the rune is a digit, letter or a character we specify.

Now we will define the next token function. This function will go to the next character in the source and take specific actions depending on what the character is.

Here we count all newlines and return a newline function if the next characters are newlines or we return an end of file token.

Here we return a whitespace token when we encounter whitespace.

When we come across a “ we return a string token with the contents between the “ .

Identifiers can start with letters or underscores and can contain letters, underscores and digits. We collect these characters into the lexme variable and if it is a keyword, we return a keyword token, otherwise we return an identifier token.

Finaly any other character which we have not yet defined will return the bad token. Here is the entire nextToken function.