serene/docs/videos.org

11 KiB
Raw Blame History

How to build a compiler with LLVM and MLIR

DONE Episode 1 - Introduction

What is it all about?

  • Create a programming lang
  • Guide for contributors
  • A LLVM/MLIR guide

The Plan

  • Git branches
  • No live coding
  • Feel free to contribute

Serene and a bit of history

DONE Episode 2 - Basic Setup

CLOSED: [2021-07-10 Sat 09:04]

Installing Requirements

LLVM and Clang

  • mlir-tblgen

ccache (optional)

Building Serene and the builder

  • git hooks

Source tree structure

dev.org resources and TODOs

DONE Episode 3 - Overview

CLOSED: [2021-07-19 Mon 09:41]

Generic Compiler

Common Steps

  • Frontend

    • Lexical analyzer (Lexer)
    • Syntax analyzer (Parser)
    • Semantic analyzer
  • Middleend

    • Intermediate code generation
    • Code optimizer
  • Backend

    • Target code generation

LLVM

/Serene/serene/src/commit/f980da8e4e5464ca96003d6a64ddfba995de43cb/docs/llvm.org

Quick overview

Deducted from https://www.aosabook.org/en/llvm.html /Serene/serene/media/commit/f980da8e4e5464ca96003d6a64ddfba995de43cb/docs/imgs/llvm_dia.svg

  • It's a set of libraries to create a compiler.
  • Well engineered.
  • we can focus only on the fronted of the compiler and what is actually important to us and leave the tricky stuff to LLVM.
  • LLVM IR enables us to use multiple languages together.
  • It supports many targets.
  • We can benefit from already made IR level optimizers.
  • ….

MLIR

/Serene/serene/src/commit/f980da8e4e5464ca96003d6a64ddfba995de43cb/docs/mlir.llvm.org /Serene/serene/media/commit/f980da8e4e5464ca96003d6a64ddfba995de43cb/docs/imgs/mlir_dia.svg

  • With MLIR dialects provide higher level semantics than LLVM IR.
  • It's easier to reason about higher level IR that is modeled after the AST rather than a low level IR.
  • We can use the pass infrastructure to efficiently process and transform the IR.
  • With many ready to use dialects we can really focus on our language and us the other dialect when ever necessary.

Serene

A Compiler frontend

Flow

  • serenec in parses the command lines args
  • reader reads the input file and generates an AST
  • semantic analyzer walks the AST and generates a new AST and rewrites the necessary nodes.
  • slir generator generates slir dialect code from AST.
  • We lower slir to other dialects of the MLIR which we call the result mlir.
  • Then, We lower everything to the LLVMIR dialect and call it lir (lowered IR).
  • Finally we fully lower lir to LLVM IR and pass it to the object generator to generate object files.
  • Call the default c compiler to link the object files and generate the machine code.

DONE Episode 4 - The reader

CLOSED: [2021-07-27 Tue 22:50]

What is a Parser ?

To put it simply, Parser converts the source code to an AST

Our Parser

  • We have a hand written LL(1.5) like parser/lexer since lisp already has a structure.
  ;; pseudo code
  (def some-fn (fn (x y)
                   (+ x y)))
  (defn main ()
    (println "Result: " (some-fn 3 8)))
  • LL(1.5)?
  • O(n)

DONE Episode 5 - The Abstract Syntax Tree

CLOSED: [2021-07-30 Fri 14:01]

What is an AST?

Ast is a tree representation of the abstract syntactic structure of source code. It's just a tree made of nodes that each node is a data structure describing the syntax.

  ;; pseudo code
  (def main (fn () 4))
  (prn (main))

/Serene/serene/media/commit/f980da8e4e5464ca96003d6a64ddfba995de43cb/docs/imgs/ast.svg

The Expression abstract class

Expressions

  • Expressions vs Statements
  • Serene(Lisp) and expressions

Node & AST

DONE Episode 6 - The Semantic Analyzer

CLOSED: [2021-08-21 Sat 18:44]

Qs

  • Why didn't we implement a linked list?
  • Why we are using the std::vector instead of llvm collections?

What is Semantic Analysis?

  • Semantic Analysis makes sure that the given program is semantically correct.
  • Type checkr works as part of this step as well.
  ;; pseudo code
  (4 main)

/Serene/serene/media/commit/f980da8e4e5464ca96003d6a64ddfba995de43cb/docs/imgs/incorrct_semantic.svg

Semantic Analysis and rewrites

We need to reform the AST to reflect the semantics of Serene closly.

  ;; pseudo code
  (def main (fn () 4))
  (prn (main))

/Serene/serene/media/commit/f980da8e4e5464ca96003d6a64ddfba995de43cb/docs/imgs/ast.svg

/Serene/serene/media/commit/f980da8e4e5464ca96003d6a64ddfba995de43cb/docs/imgs/semantic.svg

Let's run the compiler to see the semantic analysis in action.

Let's check out the code

DONE Episode 7 - The Context and Namespace

CLOSED: [2021-09-04 Sat 10:53]

Namespaces

Unit of compilation

Usually maps to a file

keeps the state and evironment

SereneContext vs LLVM Context vs MLIR Context

Compilers global state

The owner of LLVM/MLIR contexts

Holds the namespace table

Probably will contain the primitive types as well

Episode 8 - MLIR Basics

Serene Changes

  • Introducing a SourceManager
  • Reader changes
  • serenec cli interface in changing

Disclaimer

I'm not an expert in MLIR

Why?

  • A bit of history
  • LLVM IR is to low level
  • We need an IR to implement high level concepts and flows MLIR is a framework to build a compiler with your own IR. kinda :P
  • Reusability

Language

Overview

Dialects

  • A collection of operations
  • Custom types
  • Meta data
  • We can use a mixture of different dialects
builtin dialects:
  • std
  • llvm
  • math
  • async

Opetations

  • Higher level of abstraction
  • Not instructions
  • SSA forms
  • Tablegen backend
  • Verifiers and printers

Attributes

Blocks & Regions

Types

  • Extesible

Pass Infrastructure

Analysis and transformation infrastructure

  • We will implement most of our semantic analysis logic and type checker as passes

Pattern Rewriting

  • Tablegen backed

Operation Definition Specification

Examples

Not: You need mlir-mode and llvm-mode available to you for the code highlighting of the following code blocks. Both of those are distributed with the LLVM.

General syntax

   %result:2 = "somedialect.blah"(%x#2) { some.attribute = true, other_attribute = 3 }
               : (!somedialect<"example_type">) -> (!somedialect<"foo_s">, i8)
                  loc(callsite("main" at "main.srn":10:8))

Blocks and Regions

  func @simple(i64, i1) -> i64 {
  ^bb0(%a: i64, %cond: i1): // Code dominated by ^bb0 may refer to %a
    cond_br %cond, ^bb1, ^bb2

  ^bb1:
    br ^bb3(%a: i64)    // Branch passes %a as the argument

  ^bb2:
    %b = addi %a, %a : i64
    br ^bb3(%b: i64)    // Branch passes %b as the argument

  // ^bb3 receives an argument, named %c, from predecessors
  // and passes it on to bb4 along with %a. %a is referenced
  // directly from its defining operation and is not passed through
  // an argument of ^bb3.
  ^bb3(%c: i64):
    //br ^bb4(%c, %a : i64, i64)
    "serene.ifop"(%c) ({ // if %a is in-scope in the containing region...
         // then %a is in-scope here too.
          %new_value = "another_op"(%c) : (i64) -> (i64)

          ^someblock(%new_value):
            %x = "some_other_op"() {value = 4 : i64} : () -> i64

    }) : (i64) -> (i64)
  ^bb4(%d : i64, %e : i64):
    %0 = addi %d, %e : i64
    return %0 : i64   // Return is also a terminator.
  }

SLIR example

Command line arguments to emir slir

  ./builder run --build-dir ./build -emit slir `pwd`/docs/examples/hello_world.srn

Output:

  module @user  {
    %0 = "serene.fn"() ( {
      %2 = "serene.value"() {value = 0 : i64} : () -> i64
      return %2 : i64
    }) {args = {}, name = "main", sym_visibility = "public"} : () -> i64

    %1 = "serene.fn"() ( {
      %2 = "serene.value"() {value = 0 : i64} : () -> i64
      return %2 : i64
    }) {args = {n = i64, v = i64, y = i64}, name = "main1", sym_visibility = "public"} : () -> i64
  }

Serene's MLIR (maybe we need a better name)

Command line arguments to emir mlir

  ./builder run --build-dir ./build -emit mlir `pwd`/docs/examples/hello_world.srn

Output:

module @user  {
  func @main() -> i64 {
    %c3_i64 = constant 3 : i64
    return %c3_i64 : i64
  }
  func @main1(%arg0: i64, %arg1: i64, %arg2: i64) -> i64 {
    %c3_i64 = constant 3 : i64
    return %c3_i64 : i64
  }
}

LIR

Command line arguments to emir lir

  ./builder run --build-dir ./build -emit lir `pwd`/docs/examples/hello_world.srn

Output:

module @user  {
  llvm.func @main() -> i64 {
    %0 = llvm.mlir.constant(3 : i64) : i64
    llvm.return %0 : i64
  }
  llvm.func @main1(%arg0: i64, %arg1: i64, %arg2: i64) -> i64 {
    %0 = llvm.mlir.constant(3 : i64) : i64
    llvm.return %0 : i64
  }
}

LLVMIR

Command line arguments to emir llvmir

  ./builder run --build-dir ./build -emit ir `pwd`/docs/examples/hello_world.srn

Output:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

declare i8* @malloc(i64 %0)

declare void @free(i8* %0)

define i64 @main() !dbg !3 {
  ret i64 3, !dbg !7
}

define i64 @main1(i64 %0, i64 %1, i64 %2) !dbg !9 {
  ret i64 3, !dbg !10
}

!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!2}

!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "mlir", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
!1 = !DIFile(filename: "LLVMDialectModule", directory: "/")
!2 = !{i32 2, !"Debug Info Version", i32 3}
!3 = distinct !DISubprogram(name: "main", linkageName: "main", scope: null, file: !4, type: !5, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !6)
!4 = !DIFile(filename: "REPL", directory: "/home/lxsameer/src/serene/serene/build")
!5 = !DISubroutineType(types: !6)
!6 = !{}
!7 = !DILocation(line: 0, column: 10, scope: !8)
!8 = !DILexicalBlockFile(scope: !3, file: !4, discriminator: 0)
!9 = distinct !DISubprogram(name: "main1", linkageName: "main1", scope: null, file: !4, line: 1, type: !5, scopeLine: 1, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !6)
!10 = !DILocation(line: 1, column: 11, scope: !11)
!11 = !DILexicalBlockFile(scope: !9, file: !4, discriminator: 0)