serene/docs/videos.org

5.3 KiB

How to build a compiler with LLVM and MLIR

DONE Episode 1 - Introduction

What is it all about?

  • Create a programming lang
  • Guide for contributors
  • A LLVM/MLIR guide

The Plan

  • Git branches
  • No live coding
  • Feel free to contribute

Serene and a bit of history

DONE Episode 2 - Basic Setup

CLOSED: [2021-07-10 Sat 09:04]

Installing Requirements

LLVM and Clang

  • mlir-tblgen

ccache (optional)

Building Serene and the builder

  • git hooks

Source tree structure

dev.org resources and TODOs

DONE Episode 3 - Overview

CLOSED: [2021-07-19 Mon 09:41]

Generic Compiler

Common Steps

  • Frontend

    • Lexical analyzer (Lexer)
    • Syntax analyzer (Parser)
    • Semantic analyzer
  • Middleend

    • Intermediate code generation
    • Code optimizer
  • Backend

    • Target code generation

LLVM

/Serene/serene/src/commit/860cb81a269c03d2157e037a5f93a0472283db24/docs/llvm.org

Quick overview

Deducted from https://www.aosabook.org/en/llvm.html /Serene/serene/media/commit/860cb81a269c03d2157e037a5f93a0472283db24/docs/imgs/llvm_dia.svg

  • It's a set of libraries to create a compiler.
  • Well engineered.
  • we can focus only on the fronted of the compiler and what is actually important to us and leave the tricky stuff to LLVM.
  • LLVM IR enables us to use multiple languages together.
  • It supports many targets.
  • We can benefit from already made IR level optimizers.
  • ….

MLIR

/Serene/serene/src/commit/860cb81a269c03d2157e037a5f93a0472283db24/docs/mlir.llvm.org /Serene/serene/media/commit/860cb81a269c03d2157e037a5f93a0472283db24/docs/imgs/mlir_dia.svg

  • With MLIR dialects provide higher level semantics than LLVM IR.
  • It's easier to reason about higher level IR that is modeled after the AST rather than a low level IR.
  • We can use the pass infrastructure to efficiently process and transform the IR.
  • With many ready to use dialects we can really focus on our language and us the other dialect when ever necessary.

Serene

A Compiler frontend

Flow

  • serenec in parses the command lines args
  • reader reads the input file and generates an AST
  • semantic analyzer walks the AST and generates a new AST and rewrites the necessary nodes.
  • slir generator generates slir dialect code from AST.
  • We lower slir to other dialects of the MLIR which we call the result mlir.
  • Then, We lower everything to the LLVMIR dialect and call it lir (lowered IR).
  • Finally we fully lower lir to LLVM IR and pass it to the object generator to generate object files.
  • Call the default c compiler to link the object files and generate the machine code.

DONE Episode 4 - The reader

CLOSED: [2021-07-27 Tue 22:50]

What is a Parser ?

To put it simply, Parser converts the source code to an AST

Our Parser

  • We have a hand written LL(1.5) like parser/lexer since lisp already has a structure.
  ;; pseudo code
  (def some-fn (fn (x y)
                   (+ x y)))
  (defn main ()
    (println "Result: " (some-fn 3 8)))
  • LL(1.5)?
  • O(n)

DONE Episode 5 - The Abstract Syntax Tree

CLOSED: [2021-07-30 Fri 14:01]

What is an AST?

Ast is a tree representation of the abstract syntactic structure of source code. It's just a tree made of nodes that each node is a data structure describing the syntax.

  ;; pseudo code
  (def main (fn () 4))
  (prn (main))

/Serene/serene/media/commit/860cb81a269c03d2157e037a5f93a0472283db24/docs/imgs/ast.svg

The Expression abstract class

Expressions

  • Expressions vs Statements
  • Serene(Lisp) and expressions

Node & AST

DONE Episode 6 - The Semantic Analyzer

CLOSED: [2021-08-21 Sat 18:44]

Qs

  • Why didn't we implement a linked list?
  • Why we are using the std::vector instead of llvm collections?

What is Semantic Analysis?

  • Semantic Analysis makes sure that the given program is semantically correct.
  • Type checkr works as part of this step as well.
  ;; pseudo code
  (4 main)

/Serene/serene/media/commit/860cb81a269c03d2157e037a5f93a0472283db24/docs/imgs/incorrct_semantic.svg

Semantic Analysis and rewrites

We need to reform the AST to reflect the semantics of Serene closly.

  ;; pseudo code
  (def main (fn () 4))
  (prn (main))

/Serene/serene/media/commit/860cb81a269c03d2157e037a5f93a0472283db24/docs/imgs/ast.svg

/Serene/serene/media/commit/860cb81a269c03d2157e037a5f93a0472283db24/docs/imgs/semantic.svg

Let's run the compiler to see the semantic analysis in action.

Let's check out the code

Episode 7 - The Context and Namespace

Namespaces

Unit of compilation

Usually maps to a file

keeps the state and evironment

SereneContext vs LLVM Context vs MLIR Context

Compilers global state

The owner of LLVM/MLIR contexts

Holds the namespace table

Probably will contain the primitive types as well