serene/docs/videos.org

15 KiB
Raw Blame History

How to build a compiler with LLVM and MLIR

DONE Episode 1 - Introduction

What is it all about?

  • Create a programming lang
  • Guide for contributors
  • A LLVM/MLIR guide

The Plan

  • Git branches
  • No live coding
  • Feel free to contribute

Serene and a bit of history

DONE Episode 2 - Basic Setup

CLOSED: [2021-07-10 Sat 09:04]

Installing Requirements

LLVM and Clang

  • mlir-tblgen

ccache (optional)

Building Serene and the builder

  • git hooks

Source tree structure

dev.org resources and TODOs

DONE Episode 3 - Overview

CLOSED: [2021-07-19 Mon 09:41]

Generic Compiler

Common Steps

  • Frontend

    • Lexical analyzer (Lexer)
    • Syntax analyzer (Parser)
    • Semantic analyzer
  • Middleend

    • Intermediate code generation
    • Code optimizer
  • Backend

    • Target code generation

LLVM

/Serene/serene/src/commit/895ff27a51e8f1e8b19429009de64281c3e265ac/docs/llvm.org

Quick overview

Deducted from https://www.aosabook.org/en/llvm.html /Serene/serene/media/commit/895ff27a51e8f1e8b19429009de64281c3e265ac/docs/imgs/llvm_dia.svg

  • It's a set of libraries to create a compiler.
  • Well engineered.
  • we can focus only on the fronted of the compiler and what is actually important to us and leave the tricky stuff to LLVM.
  • LLVM IR enables us to use multiple languages together.
  • It supports many targets.
  • We can benefit from already made IR level optimizers.
  • ….

MLIR

/Serene/serene/src/commit/895ff27a51e8f1e8b19429009de64281c3e265ac/docs/mlir.llvm.org /Serene/serene/media/commit/895ff27a51e8f1e8b19429009de64281c3e265ac/docs/imgs/mlir_dia.svg

  • With MLIR dialects provide higher level semantics than LLVM IR.
  • It's easier to reason about higher level IR that is modeled after the AST rather than a low level IR.
  • We can use the pass infrastructure to efficiently process and transform the IR.
  • With many ready to use dialects we can really focus on our language and us the other dialect when ever necessary.

Serene

A Compiler frontend

Flow

  • serenec in parses the command lines args
  • reader reads the input file and generates an AST
  • semantic analyzer walks the AST and generates a new AST and rewrites the necessary nodes.
  • slir generator generates slir dialect code from AST.
  • We lower slir to other dialects of the MLIR which we call the result mlir.
  • Then, We lower everything to the LLVMIR dialect and call it lir (lowered IR).
  • Finally we fully lower lir to LLVM IR and pass it to the object generator to generate object files.
  • Call the default c compiler to link the object files and generate the machine code.

DONE Episode 4 - The reader

CLOSED: [2021-07-27 Tue 22:50]

What is a Parser ?

To put it simply, Parser converts the source code to an AST

Our Parser

  • We have a hand written LL(1.5) like parser/lexer since lisp already has a structure.
  ;; pseudo code
  (def some-fn (fn (x y)
                   (+ x y)))
  (defn main ()
    (println "Result: " (some-fn 3 8)))
  • LL(1.5)?
  • O(n)

DONE Episode 5 - The Abstract Syntax Tree

CLOSED: [2021-07-30 Fri 14:01]

What is an AST?

Ast is a tree representation of the abstract syntactic structure of source code. It's just a tree made of nodes that each node is a data structure describing the syntax.

  ;; pseudo code
  (def main (fn () 4))
  (prn (main))

/Serene/serene/media/commit/895ff27a51e8f1e8b19429009de64281c3e265ac/docs/imgs/ast.svg

The Expression abstract class

Expressions

  • Expressions vs Statements
  • Serene(Lisp) and expressions

Node & AST

DONE Episode 6 - The Semantic Analyzer

CLOSED: [2021-08-21 Sat 18:44]

Qs

  • Why didn't we implement a linked list?
  • Why we are using the std::vector instead of llvm collections?

What is Semantic Analysis?

  • Semantic Analysis makes sure that the given program is semantically correct.
  • Type checkr works as part of this step as well.
  ;; pseudo code
  (4 main)

/Serene/serene/media/commit/895ff27a51e8f1e8b19429009de64281c3e265ac/docs/imgs/incorrct_semantic.svg

Semantic Analysis and rewrites

We need to reform the AST to reflect the semantics of Serene closly.

  ;; pseudo code
  (def main (fn () 4))
  (prn (main))

/Serene/serene/media/commit/895ff27a51e8f1e8b19429009de64281c3e265ac/docs/imgs/ast.svg

/Serene/serene/media/commit/895ff27a51e8f1e8b19429009de64281c3e265ac/docs/imgs/semantic.svg

Let's run the compiler to see the semantic analysis in action.

Let's check out the code

DONE Episode 7 - The Context and Namespace

CLOSED: [2021-09-04 Sat 10:53]

Namespaces

Unit of compilation

Usually maps to a file

keeps the state and evironment

SereneContext vs LLVM Context vs MLIR Context

Compilers global state

The owner of LLVM/MLIR contexts

Holds the namespace table

Probably will contain the primitive types as well

DONE Episode 8 - MLIR Basics

CLOSED: [2021-09-17 Fri 10:18]

Serene Changes

  • Introducing a SourceManager
  • Reader changes
  • serenec cli interface in changing

Disclaimer

I'm not an expert in MLIR

Why?

  • A bit of history
  • LLVM IR is to low level
  • We need an IR to implement high level concepts and flows MLIR is a framework to build a compiler with your own IR. kinda :P
  • Reusability

Language

Overview

Dialects

  • A collection of operations
  • Custom types
  • Meta data
  • We can use a mixture of different dialects
builtin dialects:
  • std
  • llvm
  • math
  • async

Opetations

  • Higher level of abstraction
  • Not instructions
  • SSA forms
  • Tablegen backend
  • Verifiers and printers

Attributes

Blocks & Regions

Types

  • Extesible

Pass Infrastructure

Analysis and transformation infrastructure

  • We will implement most of our semantic analysis logic and type checker as passes

Pattern Rewriting

  • Tablegen backed

Operation Definition Specification

Examples

Not: You need mlir-mode and llvm-mode available to you for the code highlighting of the following code blocks. Both of those are distributed with the LLVM.

General syntax

   %result:2 = "somedialect.blah"(%x#2) { some.attribute = true, other_attribute = 3 }
               : (!somedialect<"example_type">) -> (!somedialect<"foo_s">, i8)
                  loc(callsite("main" at "main.srn":10:8))

Blocks and Regions

  func @simple(i64, i1) -> i64 {
  ^bb0(%a: i64, %cond: i1): // Code dominated by ^bb0 may refer to %a
    cond_br %cond, ^bb1, ^bb2

  ^bb1:
    br ^bb3(%a: i64)    // Branch passes %a as the argument

  ^bb2:
    %b = addi %a, %a : i64
    br ^bb3(%b: i64)    // Branch passes %b as the argument

  // ^bb3 receives an argument, named %c, from predecessors
  // and passes it on to bb4 along with %a. %a is referenced
  // directly from its defining operation and is not passed through
  // an argument of ^bb3.
  ^bb3(%c: i64):
    //br ^bb4(%c, %a : i64, i64)
    "serene.ifop"(%c) ({ // if %a is in-scope in the containing region...
         // then %a is in-scope here too.
          %new_value = "another_op"(%c) : (i64) -> (i64)

          ^someblock(%new_value):
            %x = "some_other_op"() {value = 4 : i64} : () -> i64

    }) : (i64) -> (i64)
  ^bb4(%d : i64, %e : i64):
    %0 = addi %d, %e : i64
    return %0 : i64   // Return is also a terminator.
  }

SLIR example

Command line arguments to emir slir

  ./builder run --build-dir ./build -emit slir `pwd`/docs/examples/hello_world.srn

Output:

  module @user  {
    %0 = "serene.fn"() ( {
      %2 = "serene.value"() {value = 0 : i64} : () -> i64
      return %2 : i64
    }) {args = {}, name = "main", sym_visibility = "public"} : () -> i64

    %1 = "serene.fn"() ( {
      %2 = "serene.value"() {value = 0 : i64} : () -> i64
      return %2 : i64
    }) {args = {n = i64, v = i64, y = i64}, name = "main1", sym_visibility = "public"} : () -> i64
  }

Serene's MLIR (maybe we need a better name)

Command line arguments to emir mlir

  ./builder run --build-dir ./build -emit mlir `pwd`/docs/examples/hello_world.srn

Output:

module @user  {
  func @main() -> i64 {
    %c3_i64 = constant 3 : i64
    return %c3_i64 : i64
  }
  func @main1(%arg0: i64, %arg1: i64, %arg2: i64) -> i64 {
    %c3_i64 = constant 3 : i64
    return %c3_i64 : i64
  }
}

LIR

Command line arguments to emir lir

  ./builder run --build-dir ./build -emit lir `pwd`/docs/examples/hello_world.srn

Output:

module @user  {
  llvm.func @main() -> i64 {
    %0 = llvm.mlir.constant(3 : i64) : i64
    llvm.return %0 : i64
  }
  llvm.func @main1(%arg0: i64, %arg1: i64, %arg2: i64) -> i64 {
    %0 = llvm.mlir.constant(3 : i64) : i64
    llvm.return %0 : i64
  }
}

LLVMIR

Command line arguments to emir llvmir

  ./builder run --build-dir ./build -emit ir `pwd`/docs/examples/hello_world.srn

Output:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

declare i8* @malloc(i64 %0)

declare void @free(i8* %0)

define i64 @main() !dbg !3 {
  ret i64 3, !dbg !7
}

define i64 @main1(i64 %0, i64 %1, i64 %2) !dbg !9 {
  ret i64 3, !dbg !10
}

!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!2}

!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "mlir", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
!1 = !DIFile(filename: "LLVMDialectModule", directory: "/")
!2 = !{i32 2, !"Debug Info Version", i32 3}
!3 = distinct !DISubprogram(name: "main", linkageName: "main", scope: null, file: !4, type: !5, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !6)
!4 = !DIFile(filename: "REPL", directory: "/home/lxsameer/src/serene/serene/build")
!5 = !DISubroutineType(types: !6)
!6 = !{}
!7 = !DILocation(line: 0, column: 10, scope: !8)
!8 = !DILexicalBlockFile(scope: !3, file: !4, discriminator: 0)
!9 = distinct !DISubprogram(name: "main1", linkageName: "main1", scope: null, file: !4, line: 1, type: !5, scopeLine: 1, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !6)
!10 = !DILocation(line: 1, column: 11, scope: !11)
!11 = !DILexicalBlockFile(scope: !9, file: !4, discriminator: 0)

DONE Episode 9 - IR (SLIR) generation

CLOSED: [2021-10-01 Fri 18:56]

Updates:

  • Source manager
  • Diagnostic Engine
  • JIT

There will be an episode dedicated to eache of these

How does IR generation works

  • Pass around MLIR context
  • Create Builder objects that creates operations in specific locations
  • ModuleOp
  • Namespace

How to define a new dialect

  • Pure C++
  • Tablegen

SLIR

The SLIR goal

  • An IR that follows the AST
  • Rename?

Steps

  • Define the new dialect
  • Setup the tablegen
  • Define the operations
  • Walk the AST and generate the operations

DONE Episode 10 - Pass Infrastructure

CLOSED: [2021-10-15 Fri 14:17]

The next Step

Updates:

CMake changes

What is a Pass

Passes are the unit of abstraction for optimization and transformation in LLVM/MLIR

Compilation is all about transforming the input data and produce an output

Source code -> IR X -> IR Y -> IR Z -> … -> Target Code

Almost like a function composition

The big picture

Pass Managers (Pipelines) are made out of a collection of passes and can be nested

The most of the interesting parts of the compiler reside in Passes.

We will probably spend most of our time working with passes

Pass Infrastructure

ODS or C++

Operation is the main abstract unit of transformation

OperationPass is the base class for all the passes.

We need to override runOnOperation

There's some rules you need to follow when defining your Pass

Must not maintain any global mutable state
Must not modify the state of another operation not nested within the current operation being operated on

Passes are either OpSpecific or OpAgnostic

OpSpecific
  struct MyFunctionPass : public PassWrapper<MyFunctionPass,
                                             OperationPass<FuncOp>> {
    void runOnOperation() override {
      // Get the current FuncOp operation being operated on.
      FuncOp f = getOperation();

      // Walk the operations within the function.
      f.walk([](Operation *inst) {
        // ....
      });
    }
  };

  /// Register this pass so that it can be built via from a textual pass pipeline.
  /// (Pass registration is discussed more below)
  void registerMyPass() {
    PassRegistration<MyFunctionPass>();
  }
OpAgnostic
  struct MyOperationPass : public PassWrapper<MyOperationPass, OperationPass<>> {
    void runOnOperation() override {
      // Get the current operation being operated on.
      Operation *op = getOperation();
      // ...
    }
  };

How transformation works?

Analyses and Passes

Pass management and nested pass managers

  // Create a top-level `PassManager` class. If an operation type is not
  // explicitly specific, the default is the builtin `module` operation.
  PassManager pm(ctx);

  // Note: We could also create the above `PassManager` this way.
  PassManager pm(ctx, /*operationName=*/"builtin.module");

  // Add a pass on the top-level module operation.
  pm.addPass(std::make_unique<MyModulePass>());

  // Nest a pass manager that operates on `spirv.module` operations nested
  // directly under the top-level module.
  OpPassManager &nestedModulePM = pm.nest<spirv::ModuleOp>();
  nestedModulePM.addPass(std::make_unique<MySPIRVModulePass>());

  // Nest a pass manager that operates on functions within the nested SPIRV
  // module.
  OpPassManager &nestedFunctionPM = nestedModulePM.nest<FuncOp>();
  nestedFunctionPM.addPass(std::make_unique<MyFunctionPass>());

  // Run the pass manager on the top-level module.
  ModuleOp m = ...;
  if (failed(pm.run(m))) {
    // Handle the failure
   }

Episode 11 - Lowering SLIR

Overview

  • What is a Pass?
  • Pass Manager

Dialect lowering

Why?

Transforming a dialect to another dialect or LLVM IR

The goal is to lower SLIR to LLVM IR directly or indirectly.

Dialect Conversions

This framework allows for transforming a set of illegal operations to a set of legal ones.

Target Conversion

Rewrite Patterns

Type Converter

Full vs Partial Conversion