lxhome/orgs/essays/serene-on-llvm.org

82 lines
5.3 KiB
Org Mode
Raw Permalink Normal View History

2021-04-14 00:33:30 +01:00
#+SETUPFILE: ../../config.org
#+TAGS: Serene Languages
#+CATEGORY: Engineering
#+DATE: 2021-04-13
#+TITLE: Serene on the LLVM
#+DESC: The rational behind Serene
As you may know, I'm trying to build [[./my-new-programming-language.org][my new programming language]], after a ton of study and many experiments, I finally made
the decision on what platform I'll target for *Serene*. Here are the history and the
rational behind this decision.
* A little bit of history :Languages:Serene:
After the initial effort on [[./choosing-the-target-platform.org][choosing the right platform]]. I studied a bit about the
GraalVM and experiment with it. While it's a nice tool and I see a bright future for
it, I wasn't happy with some aspects of it. The most important one being the fact
that Oracle is behind it (Why? Well, don't open that door :D) and some other technical
reasons which I get to them later. So I looked around again and re-evaluated my
choices. I came across the [[https://llvm.org][LLVM]]. Previously I didn't pay much attention to the LLVM because
I was blinded by the *GraalVM* and the fact the both work the same theoretically. I mean
using both, we need to create the compiler frontend and they would take care of the
backend for us (more or less). Initially, one of the reasons why I've picked the *GraalVM*
over *LLVM* was due to its support for the *LLVM* itself, and it seemed obvious that
later on we can bridge the LLVM world to *Serene*'s world via *GraalVM*. But It was
quite the opposite.
This time, I looked into the *LLVM* more thoroughly and boy I was (still am) Impressed,
well designed tools and libraries to build a compiler. In compare to the *GraalVM* it
is very mature, well documented and quite modular. Aaand using the *LLVM* I still can
use *GraalVM* via its support for LLVM IR. Long story short the more I've read
about *LLVM* the more I got obsessed with it. So I've decide to move away from *GraalVM*
and start playing with *LLVM*.
* The challenge of the language again
With moving away from the *GraalVM*, I had to choose a host language again. While the
official language of the **LLVM* is *C++* I tried to avoid it, since I'm not skilled
enough in *C++*, So after a series of experiments (which all of them are available
in dedicated branches on the repo) I tried, *Rust*, *C*, *C++* (First attempt) and *Golang*.
I wrote the parser and an interpreter as an experiment and also to evaluate the
facilities of the language when it comes to working with the **LLVM API**. After many
iterations, I ended up using *Golang* to create an interpreter with a *FFI* interface
so we can write the compiler in *Serene* itself.
At the same time I started a journey into mathematics to learn more about the
different type systems in theory and different options that we might have for *Serene*
(I'll write about that separately in the future). Most of my day went to my studies
and I felt really good. But I always had a voice in my head that kept bugging me about
[[https://mlir.llvm.org][MLIR]]. I kinda watched a few introductory talks on it before and I had a rough idea
about what it is and what it does. In order to shut that voice up, I've decided to
look it up and read more about it, while I'm blocked by my math study and to my surprise,
it totally blew me away. MLIR is such a brilliant tool, made out of the experience
gained in making several languages and compilers, and follows some conventional and
well designed principles to build intermediate representation languages.
After I read more and more about the *MLIR* which by the way it's a sub project
of the *LLVM*, I still firmly believed that using *Golang* with should create
an interpreter as a bootstrap language an then provide a FFI interface via
the interpreter to use *MLIR*'s *C API* to interact with it. How naive I was.
During the course of my study on *MLIR*, I came across a beautiful thing called
[[https://llvm.org/docs/TableGen/][TableGen]]. It's part of the LLVM and designed to generate *C++* based on some
description in general. It's a generic tool which developers write backends
for, in order to generate code for specific purposes and in the case of
*MLIR* to generate IR [[https://mlir.llvm.org/docs/Dialects/LLVM/][dialects]]. The way MLIR utilizes the TableGen to generate
dialects and a majority of the operations and types is truly amazing. It makes
the cumbersome task of making a multi-layer IR quite straightforward. *MLIR*
singlehandedly changed my mind about the approach I want to take to build
the compiler. All of a sudden *C++* seemed like a reliable option. So I've
decided to give it a go. I revived the old C++ branch, forked into a branch
called =mlir= and started to work with it a bit. Made a prototype and enhanced it.
After a lot of consideration I finally decided to merge the =mlir= branch into the
=master= and move the *Golang* implementation into its own branch =golang-impl=.
I'm cleaning up the C++ implementation at the moment and I'll be adding a semantic
analysis phase to the compiler and I'll be aiming for a minimal lambda calculus
implementation to wire up everything in their most minimal state as the foundation
and build upon it.
Also I'll write another essay dedicated to the technical aspects of why LLVM and
MLIR are great for our use cases in more detail.