31 KiB

Raw Permalink Blame History

How to build a compiler with LLVM and MLIR

Episode 1 - Introduction
- What is it all about?
- The Plan
- Serene and a bit of history
Episode 2 - Basic Setup
- Installing Requirements
  - LLVM and Clang
  - ccache (optional)
- Building Serene and the builder
- Source tree structure
- dev.org resources and TODOs
Episode 3 - Overview
- Generic Compiler
  - Common Steps
- LLVM
  - Watch Introdution to LLVM
  - Quick overview
- MLIR
- Serene
  - A Compiler frontend
  - Flow
Episode 4 - The reader
- What is a Parser ?
Episode 5 - The Abstract Syntax Tree
- What is an AST?
- The Expression abstract class
  - Expressions
- Node & AST
Episode 6 - The Semantic Analyzer
- Qs
- What is Semantic Analysis?
- Semantic Analysis and rewrites
- Let's check out the code
Episode 7 - The Context and Namespace
- Namespaces
- SereneContext vs LLVM Context vs MLIR Context
Episode 8 - MLIR Basics
- Serene Changes
- Disclaimer
- Why?
- Language
- Pass Infrastructure
- Pattern Rewriting
- Operation Definition Specification
- Examples
- Resources
Episode 9 - IR (SLIR) generation
- Updates:
- How does IR generation works
- How to define a new dialect
- SLIR
  - The SLIR goal
  - Steps
Episode 10 - Pass Infrastructure
- The next Step
- Updates:
  - CMake changes
- What is a Pass
- Pass Infrastructure
Episode 11 - Lowering SLIR
- Overview
- Dialect lowering
- Dialect Conversions
- Full vs Partial Conversion
Episode 12 - Target code generation
- Updates:
  - JIT work
  - Emacs dev mode
- So far….
- Next Step
  - Compile to object files
  - Link object files to create an executable
- End of wiring for static compilers
- What is an object file?
- Linking process
  - Linker will
- AOT vs JIT
- Let's look at some code
- Resources:
Episode 13 - Source Managers
- FAQ:
- Updates:
- Forgot to show case the code generation
- What is a source manager
Episode 14 - JIT Basics
- Updates:
- What is Just In Time Compilation?
  - JIT vs Typical interpreters
- Why to use JIT?
- How we're going to use a JIT?
- LLVM/MLIR and JIT
Episode 15 - LLVM ORC JIT
- Uptades:
- What is ORCv2?
- Terminology
- ORC highlevel API
- Two major solutions to build a JIT
- Resources
  - Docs
  - Examples
  - Talks
Eposide 16 - ORC Layers
- Updates:
- The plan for today
- What are Layers?
- Kaleidoscope JIT
Episode 17 - Custom ORC Layers
- Updates:
- Quick overview an ORC based JIT engine
- In order to build a custom layer we need:
  - A custom materialization unit
  - And the layer class itself
Episode 18 - JIT Engine Part 1
- Halley JIT Engine
Episode 19 - JIT Engine Part 2
- How Serene is different from other programming langs?
- Let's look at some code
Episode 20 - Future Roadmap
- So Far
- Design change
- What's next?

DONE Episode 1 - Introduction

What is it all about?

Create a programming lang
Guide for contributors
A LLVM/MLIR guide

The Plan

Git branches
No live coding
Feel free to contribute

Serene and a bit of history

Other Implementations
Requirements
- C++ 14
- CMake
Repository: https://devheroes.codes/Serene
Website: lxsameer.com Email: lxsameer@gnu.org

DONE Episode 2 - Basic Setup

CLOSED: [2021-07-10 Sat 09:04]

Installing Requirements

LLVM and Clang

mlir-tblgen

ccache (optional)

Building Serene and the `builder`

git hooks

Source tree structure

`dev.org` resources and TODOs

DONE Episode 3 - Overview

CLOSED: [2021-07-19 Mon 09:41]

Generic Compiler

Common Steps

Frontend
- Lexical analyzer (Lexer)
- Syntax analyzer (Parser)
- Semantic analyzer
Middleend
- Intermediate code generation
- Code optimizer
Backend
- Target code generation

LLVM

/Serene/serene/src/branch/master/docs/llvm.org

Watch Introdution to LLVM

Quick overview

Deducted from https://www.aosabook.org/en/llvm.html /Serene/serene/media/branch/master/docs/imgs/llvm_dia.svg

It's a set of libraries to create a compiler.
Well engineered.
we can focus only on the fronted of the compiler and what is actually important to us and leave the tricky stuff to LLVM.
LLVM IR enables us to use multiple languages together.
It supports many targets.
We can benefit from already made IR level optimizers.
….

MLIR

/Serene/serene/src/branch/master/docs/mlir.llvm.org /Serene/serene/media/branch/master/docs/imgs/mlir_dia.svg

With MLIR dialects provide higher level semantics than LLVM IR.
It's easier to reason about higher level IR that is modeled after the AST rather than a low level IR.
We can use the pass infrastructure to efficiently process and transform the IR.
With many ready to use dialects we can really focus on our language and us the other dialect when ever necessary.
…

Serene

A Compiler frontend

Flow

serenec in parses the command lines args
reader reads the input file and generates an AST
semantic analyzer walks the AST and generates a new AST and rewrites the necessary nodes.
slir generator generates slir dialect code from AST.
We lower slir to other dialects of the MLIR which we call the result mlir.
Then, We lower everything to the LLVMIR dialect and call it lir (lowered IR).
Finally we fully lower lir to LLVM IR and pass it to the object generator to generate object files.
Call the default c compiler to link the object files and generate the machine code.

DONE Episode 4 - The reader

CLOSED: [2021-07-27 Tue 22:50]

What is a Parser ?

To put it simply, Parser converts the source code to an AST

Algorithms

LL(k)
LR
LALR
PEG
…..

https://en.wikipedia.org/wiki/Comparison_of_parser_generators

Our Parser

We have a hand written LL(1.5) like parser/lexer since lisp already has a structure.

  ;; pseudo code
  (def some-fn (fn (x y)
                   (+ x y)))
  (defn main ()
    (println "Result: " (some-fn 3 8)))

LL(1.5)?
O(n)

DONE Episode 5 - The Abstract Syntax Tree

CLOSED: [2021-07-30 Fri 14:01]

What is an AST?

Ast is a tree representation of the abstract syntactic structure of source code. It's just a tree made of nodes that each node is a data structure describing the syntax.

  ;; pseudo code
  (def main (fn () 4))
  (prn (main))

/Serene/serene/media/branch/master/docs/imgs/ast.svg

The `Expression` abstract class

Expressions

Expressions vs Statements
Serene(Lisp) and expressions

Node & AST

DONE Episode 6 - The Semantic Analyzer

CLOSED: [2021-08-21 Sat 18:44]

Qs

Why didn't we implement a linked list?
Why we are using the std::vector instead of llvm collections?

What is Semantic Analysis?

Semantic Analysis makes sure that the given program is semantically correct.
Type checkr works as part of this step as well.

  ;; pseudo code
  (4 main)

/Serene/serene/media/branch/master/docs/imgs/incorrct_semantic.svg

Semantic Analysis and rewrites

We need to reform the AST to reflect the semantics of Serene closly.

  ;; pseudo code
  (def main (fn () 4))
  (prn (main))

/Serene/serene/media/branch/master/docs/imgs/ast.svg

/Serene/serene/media/branch/master/docs/imgs/semantic.svg

Let's run the compiler to see the semantic analysis in action.

Let's check out the code

DONE Episode 7 - The Context and Namespace

CLOSED: [2021-09-04 Sat 10:53]

Namespaces

Unit of compilation

Usually maps to a file

keeps the state and evironment

SereneContext vs LLVM Context vs MLIR Context

Compilers global state

The owner of LLVM/MLIR contexts

Holds the namespace table

Probably will contain the primitive types as well

DONE Episode 8 - MLIR Basics

CLOSED: [2021-09-17 Fri 10:18]

Serene Changes

Introducing a SourceManager
Reader changes
serenec cli interface in changing

Disclaimer

I'm not an expert in MLIR

Why?

A bit of history
LLVM IR is to low level
We need an IR to implement high level concepts and flows MLIR is a framework to build a compiler with your own IR. kinda :P
Reusability
…

Language

Overview

SSA Based (https://en.wikipedia.org/wiki/Static_single_assignment_form)
Typed
Context free(for lack of better words)

Dialects

A collection of operations
Custom types
Meta data
We can use a mixture of different dialects

builtin dialects:

std
llvm
math
async
…

Opetations

Higher level of abstraction
Not instructions
SSA forms
Tablegen backend
Verifiers and printers

Attributes

Blocks & Regions

Types

Extesible

Pass Infrastructure

Analysis and transformation infrastructure

We will implement most of our semantic analysis logic and type checker as passes

Pattern Rewriting

Tablegen backed

Operation Definition Specification

Examples

Not: You need mlir-mode and llvm-mode available to you for the code highlighting of the following code blocks. Both of those are distributed with the LLVM.

General syntax

  %result:2 = "somedialect.blah"(%x#2) { some.attribute = true, other_attribute = 3 }
  : (!somedialect<"example_type">) -> (!somedialect<"foo_s">, i8)
  loc(callsite("main" at "main.srn":10:8))

Blocks and Regions

  func @simple(i64, i1) -> i64 {
  ^bb0(%a: i64, %cond: i1): // Code dominated by ^bb0 may refer to %a
  cond_br %cond, ^bb1, ^bb2

  ^bb1:
  br ^bb3(%a: i64)    // Branch passes %a as the argument

  ^bb2:
  %b = addi %a, %a : i64
  br ^bb3(%b: i64)    // Branch passes %b as the argument

  // ^bb3 receives an argument, named %c, from predecessors
  // and passes it on to bb4 along with %a. %a is referenced
  // directly from its defining operation and is not passed through
  // an argument of ^bb3.
  ^bb3(%c: i64):
  //br ^bb4(%c, %a : i64, i64)
  "serene.ifop"(%c) ({ // if %a is in-scope in the containing region...
  // then %a is in-scope here too.
  %new_value = "another_op"(%c) : (i64) -> (i64)

  ^someblock(%new_value):
  %x = "some_other_op"() {value = 4 : i64} : () -> i64

  }) : (i64) -> (i64)
  ^bb4(%d : i64, %e : i64):
  %0 = addi %d, %e : i64
  return %0 : i64   // Return is also a terminator.
  }

SLIR example

Command line arguments to emir slir

  ./builder run --build-dir ./build -emit slir `pwd`/docs/examples/hello_world.srn

Output:

  module @user  {
  %0 = "serene.fn"() ( {
  %2 = "serene.value"() {value = 0 : i64} : () -> i64
  return %2 : i64
  }) {args = {}, name = "main", sym_visibility = "public"} : () -> i64

  %1 = "serene.fn"() ( {
  %2 = "serene.value"() {value = 0 : i64} : () -> i64
  return %2 : i64
  }) {args = {n = i64, v = i64, y = i64}, name = "main1", sym_visibility = "public"} : () -> i64
  }

Serene's MLIR (maybe we need a better name)

Command line arguments to emir mlir

  ./builder run --build-dir ./build -emit mlir `pwd`/docs/examples/hello_world.srn

Output:

  module @user  {
  func @main() -> i64 {
  %c3_i64 = constant 3 : i64
  return %c3_i64 : i64
  }
  func @main1(%arg0: i64, %arg1: i64, %arg2: i64) -> i64 {
  %c3_i64 = constant 3 : i64
  return %c3_i64 : i64
  }
  }

LIR

Command line arguments to emir lir

  ./builder run --build-dir ./build -emit lir `pwd`/docs/examples/hello_world.srn

Output:

  module @user  {
  llvm.func @main() -> i64 {
  %0 = llvm.mlir.constant(3 : i64) : i64
  llvm.return %0 : i64
  }
  llvm.func @main1(%arg0: i64, %arg1: i64, %arg2: i64) -> i64 {
  %0 = llvm.mlir.constant(3 : i64) : i64
  llvm.return %0 : i64
  }
  }

LLVMIR

Command line arguments to emir llvmir

  ./builder run --build-dir ./build -emit ir `pwd`/docs/examples/hello_world.srn

Output:

  target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
  target triple = "x86_64-unknown-linux-gnu"

  declare i8* @malloc(i64 %0)

  declare void @free(i8* %0)

  define i64 @main() !dbg !3 {
  ret i64 3, !dbg !7
  }

  define i64 @main1(i64 %0, i64 %1, i64 %2) !dbg !9 {
  ret i64 3, !dbg !10
  }

  !llvm.dbg.cu = !{!0}
  !llvm.module.flags = !{!2}

  !0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "mlir", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
  !1 = !DIFile(filename: "LLVMDialectModule", directory: "/")
  !2 = !{i32 2, !"Debug Info Version", i32 3}
  !3 = distinct !DISubprogram(name: "main", linkageName: "main", scope: null, file: !4, type: !5, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !6)
  !4 = !DIFile(filename: "REPL", directory: "/home/lxsameer/src/serene/serene/build")
  !5 = !DISubroutineType(types: !6)
  !6 = !{}
  !7 = !DILocation(line: 0, column: 10, scope: !8)
  !8 = !DILexicalBlockFile(scope: !3, file: !4, discriminator: 0)
  !9 = distinct !DISubprogram(name: "main1", linkageName: "main1", scope: null, file: !4, line: 1, type: !5, scopeLine: 1, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !6)
  !10 = !DILocation(line: 1, column: 11, scope: !11)
  !11 = !DILexicalBlockFile(scope: !9, file: !4, discriminator: 0)

Resources

DONE Episode 9 - IR (SLIR) generation

CLOSED: [2021-10-01 Fri 18:56]

Updates:

Source manager
Diagnostic Engine
JIT

There will be an episode dedicated to eache of these

How does IR generation works

Pass around MLIR context
Create Builder objects that creates operations in specific locations
ModuleOp
Namespace

How to define a new dialect

Pure C++
Tablegen

SLIR

The SLIR goal

An IR that follows the AST
Rename?

Steps

Define the new dialect
Setup the tablegen
Define the operations
Walk the AST and generate the operations

DONE Episode 10 - Pass Infrastructure

CLOSED: [2021-10-15 Fri 14:17]

The next Step

Updates:

CMake changes

What is a Pass

Passes are the unit of abstraction for optimization and transformation in LLVM/MLIR

Compilation is all about transforming the input data and produce an output

Source code -> IR X -> IR Y -> IR Z -> … -> Target Code

Almost like a function composition

The big picture

Pass Managers (Pipelines) are made out of a collection of passes and can be nested

The most of the interesting parts of the compiler reside in Passes.

We will probably spend most of our time working with passes

Pass Infrastructure

ODS or C++

Operation is the main abstract unit of transformation

OperationPass is the base class for all the passes.

We need to override `runOnOperation`

There's some rules you need to follow when defining your Pass

Must not maintain any global mutable state

Must not modify the state of another operation not nested within the current operation being operated on

…

Passes are either OpSpecific or OpAgnostic

OpSpecific

  struct MyFunctionPass : public PassWrapper<MyFunctionPass,
                                             OperationPass<FuncOp>> {
    void runOnOperation() override {
      // Get the current FuncOp operation being operated on.
      FuncOp f = getOperation();

      // Walk the operations within the function.
      f.walk([](Operation *inst) {
        // ....
      });
    }
  };

  /// Register this pass so that it can be built via from a textual pass pipeline.
  /// (Pass registration is discussed more below)
  void registerMyPass() {
    PassRegistration<MyFunctionPass>();
  }

OpAgnostic

  struct MyOperationPass : public PassWrapper<MyOperationPass, OperationPass<>> {
    void runOnOperation() override {
      // Get the current operation being operated on.
      Operation *op = getOperation();
      // ...
    }
  };

How transformation works?

Analyses and Passes

Pass management and nested pass managers

  // Create a top-level `PassManager` class. If an operation type is not
  // explicitly specific, the default is the builtin `module` operation.
  PassManager pm(ctx);

  // Note: We could also create the above `PassManager` this way.
  PassManager pm(ctx, /*operationName=*/"builtin.module");

  // Add a pass on the top-level module operation.
  pm.addPass(std::make_unique<MyModulePass>());

  // Nest a pass manager that operates on `spirv.module` operations nested
  // directly under the top-level module.
  OpPassManager &nestedModulePM = pm.nest<spirv::ModuleOp>();
  nestedModulePM.addPass(std::make_unique<MySPIRVModulePass>());

  // Nest a pass manager that operates on functions within the nested SPIRV
  // module.
  OpPassManager &nestedFunctionPM = nestedModulePM.nest<FuncOp>();
  nestedFunctionPM.addPass(std::make_unique<MyFunctionPass>());

  // Run the pass manager on the top-level module.
  ModuleOp m = ...;
  if (failed(pm.run(m))) {
    // Handle the failure
   }

DONE Episode 11 - Lowering SLIR

CLOSED: [2021-11-01 Mon 15:14]

Overview

What is a Pass?
Pass Manager

Dialect lowering

Why?

Transforming a dialect to another dialect or LLVM IR

The goal is to lower SLIR to LLVM IR directly or indirectly.

Dialect Conversions

This framework allows for transforming a set of illegal operations to a set of legal ones.

Target Conversion

Rewrite Patterns

Type Converter

Full vs Partial Conversion

DONE Episode 12 - Target code generation

CLOSED: [2021-11-04 Thu 00:57]

Updates:

JIT work

Emacs dev mode

So far….

Next Step

Compile to object files

Link object files to create an executable

End of wiring for static compilers

What is an object file?

Symbols

A pair of a name and a value
Value of a defined symbol is an offset in the Content
undefined symbols

Relocations

Are computation to perform on the Content. For example, "set this location in the contents to the value of this symbol plus this addend".

Linker will apply all the relocations in an object file on link time and if it can not resolve an undefined symbol most of the time it will raise an error (depending on the relocation and the symbol).

Are what memory should look like during the execution
Have a size
Have a type
Have an array of bytes
Has sections like:
- .text: The target code generated by the compiler
- .data: The values of initialized variables
- .rdata: Static unnamed data like literal strings, protocol tables and ….
- .bss: Uninitialized variables (the content can be omitted or striped and assume to contain only zeros)

Linking process

During the linking process, linker assigns an address to each defined symbol and tries to resolve undefined symbols

Linker will

reads the object files
reads the contents
- as raw data
- figures out the length
- reads the symbols and create a symbol table
- link undefined symbols to their definitions (possibly from other obj fils or libs)
- decide where all the content should go in the memory
- sort them based on the type
- concat them together
- apply relocations
- write the result to a file as an executable

AOT vs JIT

Let's look at some code

Resources:

20 part linker essay

DONE Episode 13 - Source Managers

CLOSED: [2021-12-18 Sat 11:17]

FAQ:

What tools are you using?

Updates:

Still JIT
We're going to start the JIT discussion from next EP

Forgot to show case the code generation

I didn't show it in action

What is a source manager

It owns and manages are the source buffers
All of our interactions with source files will happen though Source manager
- Including reading files
- Loading namespaces
- Including namespaces
- …
LLVM provides a SourceMgr class that we're not using it

DONE Episode 14 - JIT Basics

CLOSED: [2022-01-05 Wed 17:37]

Updates:

Lost my data :_(
Fixed some compatibility issues
New video series on How to build an editor with Emacs Lisp

What is Just In Time Compilation?

Compiling at "runtime" (air quote) Or it might be better to say, "on demand compilation"
Usually in interpreters and Runtimes

  digraph {
    graph [bgcolor=transparent]
    node [color=gray80 shape="box"]
    edge [color=gray80]
    rankdir = "LR"

    a[label="Some kind of input code"]
    b[label="JIT"]
    c[label="Some sort of target code"]
    d[label="Execute the result"]
    a -> b -> c -> d
  }

/tmp/jit.svg

  digraph G {
        graph [bgcolor=transparent]
        node [color=gray80 shape="box"]
        edge [color=gray80]
        rankdir = "LR"

        a[label="Source code"]
        b[label="Parser"]
        c[label="Semantic Analyzer"]
        d[label="IR Generator"]
        e[label="Pass Manager"]
        f[label="Object Layer"]
        g[label="Native Code"]
        z[label="Preload Core libs"]
        a -> b
        b -> c {label="AST"}
        c -> d
        z -> f
        subgraph cluster0 {
            color=lightgrey;
            d -> e -> f
            label = "JIT Engine";
        }
        f -> g
        g -> Store
        g -> Execute
  }

/tmp/jit-2.svg

Trade off Compilation speed vs Execution speed

JIT vs Typical interpreters

Why to use JIT?

Make the interpreter to run "faster" (air quote again)
Speed up the compilation Avoid generating the target code and generate some byte-code instead and then use a JIT in runtime to execute the byte-code.
Use runtime data to find optimization opportunities.
Support more archs
And many other reasons

How we're going to use a JIT?

We need a JIT engine to implement Lisp Macros
Compile time vs Runtime
- Abstraction
A JIT engine to just compile Serene code
Our compiler will be a fancy JIT engine

  digraph {
    graph [bgcolor=transparent]
    node [color=gray80 shape="box"]
    edge [color=gray80]
    rankdir = "LR"

    a[label="Serene AST"]
    b[label="For every node"]
    c[label="is it a macro call?" shapp="diamond"]
    d[label="Add it to JIT"]
    e[label="Expand it (call it)"]
    f[label="Generate target code"]
    a -> b


    subgraph cluster0 {
        color=lightgrey;
        b -> c
        c -> d [label="NO"]
        c -> e [label="YES"]
        e -> b
        d -> f
        label = "JIT Engine";
    }

    f -> Store
    f -> Execute
  }

/tmp/jit-3.svg

LLVM/MLIR and JIT

3 different approaches
MLIR's JIT
LLVM JITs
- MCJIT (Deprecated)
- LLJIT (Based on ORCv2)
- LazyLLJIT (Based on LLJIT)
Use LLVM's ORCv2 directly to create an engine

DONE Episode 15 - LLVM ORC JIT

CLOSED: [2022-01-28 Fri 12:15]

Uptades:

Created a bare min JIT that:
- Eagrly compiles namespaces
- Reload namespacs
I guess this the time to start the Serene's Spec

What is ORCv2?

On request compiler
Replaces MCJIT
ORCv2 docs and examples
Kaleidoscope tutorial (not complete)

Before We can move to Serene's code we need to understand ORC first

Terminology

Execution Session

A running JIT program. It contains the JITDylibs, error reporting mechanisms, and dispatches the materializers.

JITAddress

It's just an address of a JITed code

JITDylib

Represents a JIT'd dynamic library.

This class aims to mimic the behavior of a shared object, but without requiring the contained program representations to be compiled up-front. The JITDylib's content is defined by adding MaterializationUnits, and contained MaterializationUnits will typically rely on the JITDylib's links-against order to resolve external references.

JITDylibs cannot be moved or copied. Their address is stable, and useful as a key in some JIT data structures.

MaterializationUnit

A MaterializationUnit represents a set of symbol definitions that can be materialized as a group, or individually discarded (when overriding definitions are encountered).

MaterializationUnits are used when providing lazy definitions of symbols to JITDylibs. The JITDylib will call materialize when the address of a symbol is requested via the lookup method. The JITDylib will call discard if a stronger definition is added or already present.

MaterializationUnit stores in JITDylibs.

MaterializationResponsibility

Represents and tracks responsibility for materialization and mediates interactions between MaterializationUnits and JITDylibs. It provides a way for Dylib to find out about the outcome of the materialization.

Memory Manager

A class that manages how JIT engine should use memory, like allocations and deallocations. SectionMemoryManager is a simple memory manager that is provided by ORC.

Layers

ORC based JIT engines are constructed from several layers. Each layer has a specific responsiblity and passes the result of its operation to the next layer. E.g Compile Layer, Link layer and ….

Resource Tracker

The API to remove or transfer the ownership of JIT resources. Usually, a resource is a module.

ThreadSafeModule

A thread safe container for the LLVM module.

ORC highlevel API

ORC provides a Layer based design to that let us create our own JIT engine.
It comes with two ready to use engines: We will look at their implementaion later
- LLJIT
- LLLazyJIT

Two major solutions to build a JIT

Wrap LLJIT or LLLazyJIT
Create your own JIT engine and the wrapper

Resources

Docs

https://www.llvm.org/docs/ORCv2.html

Examples

Talks

DONE Eposide 16 - ORC Layers

CLOSED: [2022-02-26 Sat 12:50]

Updates:

Support for adding AST directly to the JIT

Minor change to SLIR (big changes are coming)

Started to unify the llvm::Errors with Serene errors

Tablegen backend for error classes

The plan for today

We had a brief look at LLJIT/LLLazyJIT
Better understanding To understand them better we need to understand other components first. Starting from layers.
We'll have a look at how to define our own layers in the future episodes.

What are Layers?

Layers are the basic blocks of an engine
They are composable (kinda)
Each layer has it's own requirements and details
Each layer holds a reference to it's downstream layer

  digraph {
    graph [bgcolor=transparent]
    node [color=gray80 shape="box"]
    edge [color=gray80]
    rankdir = "LR"

    a[label="Input Type A"]
    b[label="Input Type B"]
    c[label="Layer A"]
    d[label="Layer B"]
    e[label="Layer C"]
    f[label="Layer D"]
    g[label="Layer E"]
    h[label="Target Code"]
    a -> c
    b -> d

    subgraph cluster0 {
        color=lightgrey;
        c -> e
        d -> e
        e -> f
        f -> g

        label = "JIT Engine";
    }

    g -> h
  }

/tmp/ep16-1.svg

Kaleidoscope JIT

Chapter 1

  digraph {
    graph [bgcolor=transparent]
    node [color=gray80 shape="box"]
    edge [color=gray80]
    rankdir = "LR"

    a[label="LLVM IR Module"]
    b[label="Compiler Layer"]
    c[label="Object Layer"]
    d[label="Target Code"]

    a -> b

    subgraph cluster0 {
        color=lightgrey;
        b -> c

        label = "Kaleidoscope JIT";
    }

    c -> d
  }

/tmp/ep16-2.svg

Chapter 2

  digraph {
    graph [bgcolor=transparent]
    node [color=gray80 shape="box"]
    edge [color=gray80]
    rankdir = "LR"

    a[label="LLVM IR Module"]
    b[label="Compiler Layer"]
    c[label="Object Layer"]
    e[label="Target Code"]
    d[label="Optimize layer"]
    a -> d

    subgraph cluster0 {
        color=lightgrey;
        d -> b
        b -> c

        label = "Kaleidoscope JIT";
    }

    c -> e
  }

/tmp/ep16-3.svg

DONE Episode 17 - Custom ORC Layers

CLOSED: [2022-03-28 Mon 14:00]

Updates:

Finished the basic compiler wiring
Restructured the source tree
Tweaked the build system mostly for install targets
Refactoring, cleaning up the code and writing tests

Quick overview an ORC based JIT engine

JIT engines are made out of layers
Engines have a hierarchy of layers
Layers don't know about each other
Layers wrap the program representation in a MaterializationUnit, which is then stored in the JITDylib.
MaterializationUnits are responsible for describing the definitions they provide, and for unwrapping the program representation and passing it back to the layer when compilation is required.
When a MaterializationUnit hands a program representation back to the layer it comes with an associated MaterializationResponsibility object. This object tracks the definitions that must be materialized and provides a way to notify the JITDylib once they are either successfully materialized or a failure occurs.

In order to build a custom layer we need:

A custom materialization unit

Let's have a look at the MaterializationUnit class.

And the layer class itself

The layer classes are not special but conventionally the come with few functions like: add, emit and getInterface.

DONE Episode 18 - JIT Engine Part 1

CLOSED: [2022-03-29 Tue 19:56]

`Halley` JIT Engine

It's not the final implementation
Wraps LLJIT and LLLazyJIT
Uses object cache layer
Supports ASTs and Namespaces

DONE Episode 19 - JIT Engine Part 2

CLOSED: [2022-05-04 Wed 21:30]

How Serene is different from other programming langs?

Serene is just a JIT engine
Compiletime vs Runtime
- The borderline is not clear in case of Serene

The Big picture

digraph {
    fontcolor="gray80"

    graph [bgcolor=transparent]
    node [color=gray80 shape="box", fontcolor="gray80"]
    edge [color=gray80, fontcolor="gray80"]


    input_ns[label="Input Namespace"]

    ast[label="AST"]
    vast[label="Valid AST"]
    ir[label="LLVM IR"]

    subgraph cluster_2 {
        label="REPL"
        color="gray80"

        graph [bgcolor=transparent, fontcolor="gray80"]
        node [color=gray80 shape="box", fontcolor="gray80"]

        input_form[label="Input Form"]
        result[label="Evaluation result"]
        result -> input_form[label="Loop"]
}

subgraph cluster_0 {
    label="JIT"
    color="gray80"

    graph [bgcolor=transparent, fontcolor="gray80"]
    node [color=gray80 shape="box", fontcolor="gray80"]


    execute[label="Execute native code"]
    binary[label="Binary File"]

    subgraph cluster_1 {
        label="AddAst/Ns"
        color="gray80"

        graph [bgcolor=transparent, fontcolor="gray80"]
        node [color=gray80 shape="box", fontcolor="gray80"]

        vast -> ir [label="Compile (No optimization)"]

        subgraph cluster_4 {
            label="Macro Expansion"
            color="gray80"

            graph [bgcolor=transparent, fontcolor="gray80"]
            node [color=gray80 shape="box", fontcolor="gray80"]

            vast -> macros [label=" Find the macros"]
            macros -> JITDylib [label=" look up the required\n Symbols in compiled code"]
            JITDylib -> symbols [label=" lookup"]
        }

        wrapped_code[lable="Wrapped IR"]
        ir -> wrapped_code[label= " Wrap top level Forms in \nfunctions/calls"]
        wrapped_code -> native [label=" Compile (No optimization)"]
    }

    symbols -> execute [label="Execute the functions mapped to the symbols"]
    execute -> vast
    execute -> result [label=" Print"]
    execute -> binary [label=" Dump"]

    native ->  execute [label="invoke"]
    native -> JITDylib [label="Add"]
    JITDylib -> Context [label="Store as part the namespace"]
  }

  subgraph cluster_3 {
    label="CLI interface"
    color="gray80"

    graph [bgcolor=transparent, fontcolor="gray80"]
    node [color=gray80 shape="box", fontcolor="gray80"]

    input_ns -> file [label=" resolve to file"]
  }

  input_form -> ast [label=" read"]
  file -> ast [label=" read"]
  ast -> vast [label=" Semantic Analysis"]
}

/tmp/ep19-1.png

Let's look at some code

Episode 20 - Future Roadmap

So Far

We created a bare bone and minimal compiler
- That is capable of just in time and ahead of time compilation
We had an over of MLIR and pass management
We didn't spend time on fundamentals

Design change

The current implementation is suitable for a static compiler
We want to move toward a more dynamic compiler

What's next?

Part 2
We're going to focus on some of the compiler fundamentals
We will create simple utilities to help us in our journey
Hopefully we will talk about type systems
We're going to sharpen our skills on LLVM/MLIR
I'm going to work on the new design

31 KiB Raw Permalink Blame History Unescape Escape

How to build a compiler with LLVM and MLIR

DONE Episode 1 - Introduction

What is it all about?

The Plan

Serene and a bit of history

DONE Episode 2 - Basic Setup

Installing Requirements

LLVM and Clang

ccache (optional)

Building Serene and the builder

Source tree structure

dev.org resources and TODOs

DONE Episode 3 - Overview

Generic Compiler

Common Steps

LLVM

Watch Introdution to LLVM

Quick overview

MLIR

Serene

A Compiler frontend

Flow

DONE Episode 4 - The reader

What is a Parser ?

Algorithms

Libraries

Our Parser

DONE Episode 5 - The Abstract Syntax Tree

What is an AST?

The Expression abstract class

Expressions

Node & AST

DONE Episode 6 - The Semantic Analyzer

Qs

What is Semantic Analysis?

Semantic Analysis and rewrites

Let's check out the code

DONE Episode 7 - The Context and Namespace

Namespaces

Unit of compilation

Usually maps to a file

keeps the state and evironment

SereneContext vs LLVM Context vs MLIR Context

Compilers global state

The owner of LLVM/MLIR contexts

Holds the namespace table

Probably will contain the primitive types as well

DONE Episode 8 - MLIR Basics

Serene Changes

Disclaimer

Why?

Language

Overview

Dialects

builtin dialects:

Opetations

Attributes

Blocks & Regions

Types

Pass Infrastructure

Pattern Rewriting

Operation Definition Specification

Examples

General syntax

Blocks and Regions

SLIR example

Serene's MLIR (maybe we need a better name)

LIR

LLVMIR

Resources

DONE Episode 9 - IR (SLIR) generation

Updates:

How does IR generation works

How to define a new dialect

SLIR

The SLIR goal

Steps

DONE Episode 10 - Pass Infrastructure

The next Step

31 KiB

Raw Permalink Blame History

Building Serene and the `builder`

`dev.org` resources and TODOs

The `Expression` abstract class

We need to override `runOnOperation`