Scrift — Part 0: A New Markup Language

Hi everyone! Welcome to the introduction of an ongoing series that documents my journey writing a compiler from scratch.

The Problem

LaTeX is my go-to for all things writing. Whether it's notes, assignments or even resumes, for that matter. I love it. However, the one thing I haven't entirely fallen in love with is its syntax. Believe me, I've tried, but there's something about having to write subsection or subsubsection every time you want to change the level of the section. Or the fact that you have to signal the end of a center, itemize or even the document but not a section. The lack of a formal structure makes LaTeX source code hard to read. And don't get me started on the overfull/underfull hbox warnings. So why not use a different document markup language? Languages with a more straightforward and cleaner syntax, such as markdown, are excellent. Still, they cannot often create complex documents. This is the problem with today's document markup languages; either they're easy to use but unsuitable for complex documents or difficult to use but well suited for complex documents.

The Solution

A couple days ago, I decided to start work on a new open source project to fix the issues with today's markup – called Scrift. Scrift is a document markup language that can handle complex document structures while maintaining a clean, easy to use syntax. Or at least that's what it's going to be. Join me in this journey where I'll document the process of creating a new document markup language from zero to hero.

Scrift

Let me make one thing clear, Scrift is NOT a toy compiler. Many tutorials for making compilers create a "toy compiler". A compiler that implements a small subset of a language to perform a specific task and not much else. The compiler we will be writing is intended to be the backbone of the Scrift ecosystem.

To begin our journey, let's map out the basic concepts that will make up Scrift.

Scrift Ecosystem

The Scrift ecosystem describes all of the related components that make Scrift compilers. These include:

  • The Core;
  • Transformers; and
  • Interfaces.

The Core

The core is the basic building block of all Scrift compilers. It reads in the source code and outputs intermediate JSON code.

Transformers

Transformers are part of the core that translates the outputted JSON code into a target code. This can be LaTeX, HTML, Javascript etc. The possibilities are endless.

Interfaces

Interfaces can be written in various languages to make it easier to work with the core. They provide an API for controlling the core by mapping its different C bindings to callable functions in the interface language.

Okay. So how does this all come together?

The core will be written in Go, along with numerous transformers. The transformers will process the JSON from the core into their respective target language. The core is also compiled into a C shared object file. These bindings create interfaces for various languages, enabling cross-language use for the Scrift compiler.

Confused? Here's an example of how this modular structure works.

Where To From Here

In Part 1 of this series, we will define the Scrift language and begin our lexer to tokenize it. If Scrift sounds like something you're interested in, consider following the journey and becoming a contributor here. If you want to learn how to implement a compiler from scratch, stick around and code along.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store