#+TITLE: How to build a compiler with LLVM and MLIR #+SEQ_TODO: TODO(t/!) NEXT(n/!) BLOCKED(b@/!) | DONE(d%) CANCELLED(c@/!) FAILED(f@/!) #+TAGS: READER(r) MISC(m) #+STARTUP: logdrawer logdone logreschedule indent content align constSI entitiespretty overview * DONE Episode 1 - Introduction ** What is it all about? - Create a programming lang - Guide for contributors - A LLVM/MLIR guide ** The Plan - Git branches - No live coding - Feel free to contribute ** Serene and a bit of history - Other Implementations - Requirements - C++ 14 - CMake - Repository: https://devheroes.codes/Serene - Website: lxsameer.com Email: lxsameer@gnu.org * DONE Episode 2 - Basic Setup CLOSED: [2021-07-10 Sat 09:04] ** Installing Requirements *** LLVM and Clang - mlir-tblgen *** ccache (optional) ** Building Serene and the =builder= - git hooks ** Source tree structure ** =dev.org= resources and TODOs * DONE Episode 3 - Overview CLOSED: [2021-07-19 Mon 09:41] ** Generic Compiler - [[https://www.cs.princeton.edu/~appel/modern/ml/whichver.html][Modern Compiler Implementation in ML: Basic Techniques]] - [[https://suif.stanford.edu/dragonbook/][Compilers: Principles, Techniques, and Tools (The Dragon Book)]] *** Common Steps - Frontend - Lexical analyzer (Lexer) - Syntax analyzer (Parser) - Semantic analyzer - Middleend - Intermediate code generation - Code optimizer - Backend - Target code generation ** LLVM [[llvm.org]] *** Watch [[https://www.youtube.com/watch?v=J5xExRGaIIY][Introdution to LLVM]] *** Quick overview Deducted from https://www.aosabook.org/en/llvm.html [[./imgs/llvm_dia.svg]] - It's a set of libraries to create a compiler. - Well engineered. - we can focus only on the fronted of the compiler and what is actually important to us and leave the tricky stuff to LLVM. - LLVM IR enables us to use multiple languages together. - It supports many targets. - We can benefit from already made IR level optimizers. - .... ** MLIR [[mlir.llvm.org]] [[./imgs/mlir_dia.svg]] - With MLIR dialects provide higher level semantics than LLVM IR. - It's easier to reason about higher level IR that is modeled after the AST rather than a low level IR. - We can use the pass infrastructure to efficiently process and transform the IR. - With many ready to use dialects we can really focus on our language and us the other dialect when ever necessary. - ... ** Serene *** A Compiler frontend *** Flow - =serenec= in parses the command lines args - =reader= reads the input file and generates an =AST= - =semantic analyzer= walks the =AST= and generates a new =AST= and rewrites the necessary nodes. - =slir= generator generates =slir= dialect code from =AST=. - We lower =slir= to other dialects of the *MLIR* which we call the result =mlir=. - Then, We lower everything to the =LLVMIR dialect= and call it =lir= (lowered IR). - Finally we fully lower =lir= to =LLVM IR= and pass it to the object generator to generate object files. - Call the default =c compiler= to link the object files and generate the machine code. * DONE Episode 4 - The reader CLOSED: [2021-07-27 Tue 22:50] ** What is a Parser ? To put it simply, Parser converts the source code to an [[https://en.wikipedia.org/wiki/Abstract_syntax_tree][AST]] *** Algorithms - LL(k) - LR - LALR - PEG - ..... Read More: - https://stereobooster.com/posts/an-overview-of-parsing-algorithms/ - https://tomassetti.me/guide-parsing-algorithms-terminology/ *** Libraries - https://en.wikipedia.org/wiki/Comparison_of_parser_generators *** Our Parser - We have a hand written LL(1.5) like parser/lexer since lisp already has a structure. #+BEGIN_SRC lisp ;; pseudo code (def some-fn (fn (x y) (+ x y))) (defn main () (println "Result: " (some-fn 3 8))) #+END_SRC - LL(1.5)? - O(n) * DONE Episode 5 - The Abstract Syntax Tree CLOSED: [2021-07-30 Fri 14:01] ** What is an AST? Ast is a tree representation of the abstract syntactic structure of source code. It's just a tree made of nodes that each node is a data structure describing the syntax. #+BEGIN_SRC lisp ;; pseudo code (def main (fn () 4)) (prn (main)) #+END_SRC [[./imgs/ast.svg]] ** The =Expression= abstract class *** Expressions - Expressions vs Statements - Serene(Lisp) and expressions ** Node & AST * DONE Episode 6 - The Semantic Analyzer CLOSED: [2021-08-21 Sat 18:44] ** Qs - Why didn't we implement a linked list? - Why we are using the =std::vector= instead of llvm collections? ** What is Semantic Analysis? - Semantic Analysis makes sure that the given program is semantically correct. - Type checkr works as part of this step as well. #+BEGIN_SRC lisp ;; pseudo code (4 main) #+END_SRC [[./imgs/incorrct_semantic.svg]] ** Semantic Analysis and rewrites We need to reform the AST to reflect the semantics of Serene closly. #+BEGIN_SRC lisp ;; pseudo code (def main (fn () 4)) (prn (main)) #+END_SRC [[./imgs/ast.svg]] [[./imgs/semantic.svg]] Let's run the compiler to see the semantic analysis in action. ** Let's check out the code * DONE Episode 7 - The Context and Namespace CLOSED: [2021-09-04 Sat 10:53] ** Namespaces *** Unit of compilation *** Usually maps to a file *** keeps the state and evironment ** SereneContext vs LLVM Context vs MLIR Context *** Compilers global state *** The owner of LLVM/MLIR contexts *** Holds the namespace table *** Probably will contain the primitive types as well * DONE Episode 8 - MLIR Basics CLOSED: [2021-09-17 Fri 10:18] ** Serene Changes - Introducing a SourceManager - Reader changes - *serenec* cli interface in changing ** Disclaimer *I'm not an expert in MLIR* ** Why? - A bit of history - LLVM IR is to low level - We need an IR to implement high level concepts and flows *MLIR* is a framework to build a compiler with your own IR. kinda :P - Reusability - ... ** Language *** Overview - SSA Based (https://en.wikipedia.org/wiki/Static_single_assignment_form) - Typed - Context free(for lack of better words) *** Dialects - A collection of operations - Custom types - Meta data - We can use a mixture of different dialects **** builtin dialects: - std - llvm - math - async - ... *** Opetations - Higher level of abstraction - Not instructions - SSA forms - Tablegen backend - Verifiers and printers *** Attributes *** Blocks & Regions *** Types - Extesible ** Pass Infrastructure Analysis and transformation infrastructure - We will implement most of our semantic analysis logic and type checker as passes ** Pattern Rewriting - Tablegen backed ** Operation Definition Specification ** Examples *Not*: You need =mlir-mode= and =llvm-mode= available to you for the code highlighting of the following code blocks. Both of those are distributed with the LLVM. *** General syntax #+BEGIN_SRC mlir %result:2 = "somedialect.blah"(%x#2) { some.attribute = true, other_attribute = 3 } : (!somedialect<"example_type">) -> (!somedialect<"foo_s">, i8) loc(callsite("main" at "main.srn":10:8)) #+END_SRC *** Blocks and Regions #+BEGIN_SRC mlir func @simple(i64, i1) -> i64 { ^bb0(%a: i64, %cond: i1): // Code dominated by ^bb0 may refer to %a cond_br %cond, ^bb1, ^bb2 ^bb1: br ^bb3(%a: i64) // Branch passes %a as the argument ^bb2: %b = addi %a, %a : i64 br ^bb3(%b: i64) // Branch passes %b as the argument // ^bb3 receives an argument, named %c, from predecessors // and passes it on to bb4 along with %a. %a is referenced // directly from its defining operation and is not passed through // an argument of ^bb3. ^bb3(%c: i64): //br ^bb4(%c, %a : i64, i64) "serene.ifop"(%c) ({ // if %a is in-scope in the containing region... // then %a is in-scope here too. %new_value = "another_op"(%c) : (i64) -> (i64) ^someblock(%new_value): %x = "some_other_op"() {value = 4 : i64} : () -> i64 }) : (i64) -> (i64) ^bb4(%d : i64, %e : i64): %0 = addi %d, %e : i64 return %0 : i64 // Return is also a terminator. } #+END_SRC *** SLIR example Command line arguments to emir =slir= #+BEGIN_SRC sh ./builder run --build-dir ./build -emit slir `pwd`/docs/examples/hello_world.srn #+END_SRC Output: #+BEGIN_SRC mlir module @user { %0 = "serene.fn"() ( { %2 = "serene.value"() {value = 0 : i64} : () -> i64 return %2 : i64 }) {args = {}, name = "main", sym_visibility = "public"} : () -> i64 %1 = "serene.fn"() ( { %2 = "serene.value"() {value = 0 : i64} : () -> i64 return %2 : i64 }) {args = {n = i64, v = i64, y = i64}, name = "main1", sym_visibility = "public"} : () -> i64 } #+END_SRC *** Serene's MLIR (maybe we need a better name) Command line arguments to emir =mlir= #+BEGIN_SRC sh ./builder run --build-dir ./build -emit mlir `pwd`/docs/examples/hello_world.srn #+END_SRC Output: #+BEGIN_SRC mlir module @user { func @main() -> i64 { %c3_i64 = constant 3 : i64 return %c3_i64 : i64 } func @main1(%arg0: i64, %arg1: i64, %arg2: i64) -> i64 { %c3_i64 = constant 3 : i64 return %c3_i64 : i64 } } #+END_SRC *** LIR Command line arguments to emir =lir= #+BEGIN_SRC sh ./builder run --build-dir ./build -emit lir `pwd`/docs/examples/hello_world.srn #+END_SRC Output: #+BEGIN_SRC mlir module @user { llvm.func @main() -> i64 { %0 = llvm.mlir.constant(3 : i64) : i64 llvm.return %0 : i64 } llvm.func @main1(%arg0: i64, %arg1: i64, %arg2: i64) -> i64 { %0 = llvm.mlir.constant(3 : i64) : i64 llvm.return %0 : i64 } } #+END_SRC *** LLVMIR Command line arguments to emir =llvmir= #+BEGIN_SRC sh ./builder run --build-dir ./build -emit ir `pwd`/docs/examples/hello_world.srn #+END_SRC Output: #+BEGIN_SRC llvm target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" declare i8* @malloc(i64 %0) declare void @free(i8* %0) define i64 @main() !dbg !3 { ret i64 3, !dbg !7 } define i64 @main1(i64 %0, i64 %1, i64 %2) !dbg !9 { ret i64 3, !dbg !10 } !llvm.dbg.cu = !{!0} !llvm.module.flags = !{!2} !0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "mlir", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug) !1 = !DIFile(filename: "LLVMDialectModule", directory: "/") !2 = !{i32 2, !"Debug Info Version", i32 3} !3 = distinct !DISubprogram(name: "main", linkageName: "main", scope: null, file: !4, type: !5, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !6) !4 = !DIFile(filename: "REPL", directory: "/home/lxsameer/src/serene/serene/build") !5 = !DISubroutineType(types: !6) !6 = !{} !7 = !DILocation(line: 0, column: 10, scope: !8) !8 = !DILexicalBlockFile(scope: !3, file: !4, discriminator: 0) !9 = distinct !DISubprogram(name: "main1", linkageName: "main1", scope: null, file: !4, line: 1, type: !5, scopeLine: 1, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !6) !10 = !DILocation(line: 1, column: 11, scope: !11) !11 = !DILexicalBlockFile(scope: !9, file: !4, discriminator: 0) #+END_SRC ** Resources - [[https://www.youtube.com/watch?v=Y4SvqTtOIDk][2020 LLVM Developers’ Meeting: M. Amini & R. Riddle “MLIR Tutorial”]] - [[https://www.youtube.com/watch?v=qzljG6DKgic][2019 EuroLLVM Developers’ Meeting: T. Shpeisman & C. Lattner “MLIR: Multi-Level Intermediate Repr..”]] - https://mlir.llvm.org/docs - https://mlir.llvm.org/docs/LangRef - https://en.wikipedia.org/wiki/Basic_block * DONE Episode 9 - IR (SLIR) generation CLOSED: [2021-10-01 Fri 18:56] ** Updates: - Source manager - Diagnostic Engine - JIT There will be an episode dedicated to eache of these ** How does IR generation works - Pass around MLIR context - Create Builder objects that creates operations in specific locations - ModuleOp - Namespace ** How to define a new dialect - Pure C++ - Tablegen ** SLIR *** The SLIR goal - An IR that follows the AST - Rename? *** Steps - [X] Define the new dialect - [X] Setup the tablegen - [X] Define the operations - [X] Walk the AST and generate the operations * DONE Episode 10 - Pass Infrastructure CLOSED: [2021-10-15 Fri 14:17] ** The next Step ** Updates: *** CMake changes ** What is a Pass *** Passes are the unit of abstraction for optimization and transformation in LLVM/MLIR *** Compilation is all about transforming the input data and produce an output Source code -> IR X -> IR Y -> IR Z -> ... -> Target Code *** Almost like a function composition *** The big picture *** Pass Managers (Pipelines) are made out of a collection of passes and can be nested *** The most of the interesting parts of the compiler reside in Passes. *** We will probably spend most of our time working with passes ** Pass Infrastructure *** ODS or C++ *** Operation is the main abstract unit of transformation *** OperationPass is the base class for all the passes. *** We need to override =runOnOperation= *** There's some rules you need to follow when defining your Pass **** Must not maintain any global mutable state **** Must not modify the state of another operation not nested within the current operation being operated on **** ... *** Passes are either OpSpecific or OpAgnostic **** OpSpecific #+BEGIN_SRC C++ struct MyFunctionPass : public PassWrapper> { void runOnOperation() override { // Get the current FuncOp operation being operated on. FuncOp f = getOperation(); // Walk the operations within the function. f.walk([](Operation *inst) { // .... }); } }; /// Register this pass so that it can be built via from a textual pass pipeline. /// (Pass registration is discussed more below) void registerMyPass() { PassRegistration(); } #+END_SRC **** OpAgnostic #+BEGIN_SRC C++ struct MyOperationPass : public PassWrapper> { void runOnOperation() override { // Get the current operation being operated on. Operation *op = getOperation(); // ... } }; #+END_SRC *** How transformation works? *** Analyses and Passes *** Pass management and nested pass managers #+BEGIN_SRC C++ // Create a top-level `PassManager` class. If an operation type is not // explicitly specific, the default is the builtin `module` operation. PassManager pm(ctx); // Note: We could also create the above `PassManager` this way. PassManager pm(ctx, /*operationName=*/"builtin.module"); // Add a pass on the top-level module operation. pm.addPass(std::make_unique()); // Nest a pass manager that operates on `spirv.module` operations nested // directly under the top-level module. OpPassManager &nestedModulePM = pm.nest(); nestedModulePM.addPass(std::make_unique()); // Nest a pass manager that operates on functions within the nested SPIRV // module. OpPassManager &nestedFunctionPM = nestedModulePM.nest(); nestedFunctionPM.addPass(std::make_unique()); // Run the pass manager on the top-level module. ModuleOp m = ...; if (failed(pm.run(m))) { // Handle the failure } #+END_SRC * DONE Episode 11 - Lowering SLIR CLOSED: [2021-11-01 Mon 15:14] ** Overview - What is a Pass? - Pass Manager ** Dialect lowering *** Why? *** Transforming a dialect to another dialect or LLVM IR *** The goal is to lower SLIR to LLVM IR directly or indirectly. ** Dialect Conversions This framework allows for transforming a set of illegal operations to a set of legal ones. *** Target Conversion *** Rewrite Patterns *** Type Converter ** Full vs Partial Conversion * DONE Episode 12 - Target code generation CLOSED: [2021-11-04 Thu 00:57] ** Updates: *** JIT work *** Emacs dev mode ** So far.... ** Next Step *** Compile to object files *** Link object files to create an executable ** End of wiring for static compilers ** What is an object file? *** Symbols - A pair of a name and a value - Value of a *defined symbol* is an offset in the =Content= - *undefined symbols* *** Relocations Are computation to perform on the =Content=. For example, "set this location in the contents to the value of this symbol plus this addend". Linker will apply all the *relocations* in an object file on link time and if it can not resolve an undefined symbol most of the time it will raise an error (depending on the relocation and the symbol). *** Contents - Are what memory should look like during the execution - Have a size - Have a type - Have an array of bytes - Has sections like: + .text: The target code generated by the compiler + .data: The values of initialized variables + .rdata: Static unnamed data like literal strings, protocol tables and .... + .bss: Uninitialized variables (the content can be omitted or striped and assume to contain only zeros) ** Linking process During the linking process, linker assigns an address to each *defined* symbol and tries to =resolve= *undefined* symbols *** Linker will - reads the object files - reads the contents + as raw data + figures out the length + reads the symbols and create a symbol table + link undefined symbols to their definitions (possibly from other obj fils or libs) + decide where all the content should go in the memory + sort them based on the *type* + concat them together + apply relocations + write the result to a file as an executable ** AOT vs JIT ** Let's look at some code ** Resources: - [[https://lwn.net/Articles/276782/][20 part linker essay]] * DONE Episode 13 - Source Managers CLOSED: [2021-12-18 Sat 11:17] ** FAQ: - What tools are you using? ** Updates: - Still JIT - We're going to start the JIT discussion from next EP ** Forgot to show case the code generation I didn't show it in action ** What is a source manager - It owns and manages are the source buffers - All of our interactions with source files will happen though Source manager - Including reading files - Loading namespaces - Including namespaces - ... - LLVM provides a =SourceMgr= class that we're not using it * DONE Episode 14 - JIT Basics CLOSED: [2022-01-05 Wed 17:37] ** Updates: - Lost my data :_( - Fixed some compatibility issues - New video series on *How to build an editor with Emacs Lisp* ** What is Just In Time Compilation? - Compiling at "runtime" (air quote) Or it might be better to say, "on demand compilation" - Usually in interpreters and Runtimes #+NAME: ep-14-jit-1 #+BEGIN_SRC graphviz-dot :file /tmp/jit.svg :cmdline -Kdot -Tsvg digraph { graph [bgcolor=transparent] node [color=gray80 shape="box"] edge [color=gray80] rankdir = "LR" a[label="Some kind of input code"] b[label="JIT"] c[label="Some sort of target code"] d[label="Execute the result"] a -> b -> c -> d } #+END_SRC #+RESULTS: ep-14-jit-1 [[file:/tmp/jit.svg]] #+NAME: ep-14-jit-2 #+BEGIN_SRC graphviz-dot :file /tmp/jit-2.svg :cmdline -Kdot -Tsvg digraph G { graph [bgcolor=transparent] node [color=gray80 shape="box"] edge [color=gray80] rankdir = "LR" a[label="Source code"] b[label="Parser"] c[label="Semantic Analyzer"] d[label="IR Generator"] e[label="Pass Manager"] f[label="Object Layer"] g[label="Native Code"] z[label="Preload Core libs"] a -> b b -> c {label="AST"} c -> d z -> f subgraph cluster0 { color=lightgrey; d -> e -> f label = "JIT Engine"; } f -> g g -> Store g -> Execute } #+END_SRC #+RESULTS: ep-14-jit-2 [[file:/tmp/jit-2.svg]] - Trade off Compilation speed vs Execution speed *** JIT vs Typical interpreters ** Why to use JIT? - Make the interpreter to run "faster" (air quote again) - Speed up the compilation Avoid generating the target code and generate some byte-code instead and then use a JIT in runtime to execute the byte-code. - Use runtime data to find optimization opportunities. - Support more archs - And many other reasons ** How we're going to use a JIT? - We need a JIT engine to implement Lisp Macros - Compile time vs Runtime + Abstraction - A JIT engine to just compile Serene code - Our compiler will be a fancy JIT engine #+NAME: ep-14-jit-3 #+BEGIN_SRC graphviz-dot :file /tmp/jit-3.svg :cmdline -Kdot -Tsvg digraph { graph [bgcolor=transparent] node [color=gray80 shape="box"] edge [color=gray80] rankdir = "LR" a[label="Serene AST"] b[label="For every node"] c[label="is it a macro call?" shapp="diamond"] d[label="Add it to JIT"] e[label="Expand it (call it)"] f[label="Generate target code"] a -> b subgraph cluster0 { color=lightgrey; b -> c c -> d [label="NO"] c -> e [label="YES"] e -> b d -> f label = "JIT Engine"; } f -> Store f -> Execute } #+END_SRC #+RESULTS: ep-14-jit-3 [[file:/tmp/jit-3.svg]] ** LLVM/MLIR and JIT - 3 different approaches - MLIR's JIT - LLVM JITs + MCJIT (Deprecated) + LLJIT (Based on ORCv2) + LazyLLJIT (Based on LLJIT) - Use LLVM's ORCv2 directly to create an engine * DONE Episode 15 - LLVM ORC JIT CLOSED: [2022-01-28 Fri 12:15] ** Uptades: - Created a bare min JIT that: - Eagrly compiles namespaces - Reload namespacs - I guess this the time to start the Serene's Spec ** What is ORCv2? - On request compiler - Replaces MCJIT - ORCv2 docs and examples - Kaleidoscope tutorial (not complete) Before We can move to Serene's code we need to understand ORC first ** Terminology *** Execution Session A running JIT program. It contains the JITDylibs, error reporting mechanisms, and dispatches the materializers. *** JITAddress It's just an address of a JITed code *** JITDylib Represents a JIT'd dynamic library. This class aims to mimic the behavior of a shared object, but without requiring the contained program representations to be compiled up-front. The JITDylib's content is defined by adding MaterializationUnits, and contained MaterializationUnits will typically rely on the JITDylib's links-against order to resolve external references. JITDylibs cannot be moved or copied. Their address is stable, and useful as a key in some JIT data structures. *** MaterializationUnit A =MaterializationUnit= represents a set of symbol definitions that can be materialized as a group, or individually discarded (when overriding definitions are encountered). =MaterializationUnits= are used when providing lazy definitions of symbols to JITDylibs. The JITDylib will call materialize when the address of a symbol is requested via the lookup method. The =JITDylib= will call discard if a stronger definition is added or already present. MaterializationUnit stores in JITDylibs. *** MaterializationResponsibility Represents and tracks responsibility for materialization and mediates interactions between =MaterializationUnits= and =JITDylibs=. It provides a way for Dylib to find out about the outcome of the materialization. *** Memory Manager A class that manages how JIT engine should use memory, like allocations and deallocations. =SectionMemoryManager= is a simple memory manager that is provided by ORC. *** Layers ORC based JIT engines are constructed from several layers. Each layer has a specific responsiblity and passes the result of its operation to the next layer. E.g Compile Layer, Link layer and .... *** Resource Tracker The API to remove or transfer the ownership of JIT resources. Usually, a resource is a module. *** ThreadSafeModule A thread safe container for the LLVM module. ** ORC highlevel API - ORC provides a Layer based design to that let us create our own JIT engine. - It comes with two ready to use engines: We will look at their implementaion later + LLJIT + LLLazyJIT ** Two major solutions to build a JIT - Wrap LLJIT or LLLazyJIT - Create your own JIT engine and the wrapper ** Resources *** Docs - https://www.llvm.org/docs/ORCv2.html *** Examples - https://github.com/llvm/llvm-project/tree/main/llvm/examples/HowToUseLLJIT - https://github.com/llvm/llvm-project/tree/main/llvm/examples/OrcV2Examples/LLJITDumpObjects - https://github.com/llvm/llvm-project/tree/main/llvm/examples/OrcV2Examples/LLJITWithInitializers - https://github.com/llvm/llvm-project/tree/main/llvm/examples/OrcV2Examples/LLJITWithLazyReexports *** Talks - [[https://www.youtube.com/watch?v=i-inxFudrgI][ORCv2 -- LLVM JIT APIs Deep Dive]] - [[https://www.youtube.com/watch?v=MOQG5vkh9J8][Updating ORC JIT for Concurrency]] - [[https://www.youtube.com/watch?v=hILdR8XRvdQ][ORC -- LLVM's Next Generation of JIT API]] * DONE Eposide 16 - ORC Layers CLOSED: [2022-02-26 Sat 12:50] ** Updates: *** Support for adding AST directly to the JIT *** Minor change to SLIR (big changes are coming) *** Started to unify the llvm::Errors with Serene errors *** Tablegen backend for error classes ** The plan for today - We had a brief look at LLJIT/LLLazyJIT - Better understanding To understand them better we need to understand other components first. Starting from *layers*. - We'll have a look at how to define our own layers in the future episodes. ** What are Layers? - Layers are the basic blocks of an engine - They are composable (kinda) - Each layer has it's own requirements and details - Each layer holds a reference to it's downstream layer #+NAME: ep-16-jit-1 #+BEGIN_SRC graphviz-dot :file /tmp/ep16-1.svg :cmdline -Kdot -Tsvg digraph { graph [bgcolor=transparent] node [color=gray80 shape="box"] edge [color=gray80] rankdir = "LR" a[label="Input Type A"] b[label="Input Type B"] c[label="Layer A"] d[label="Layer B"] e[label="Layer C"] f[label="Layer D"] g[label="Layer E"] h[label="Target Code"] a -> c b -> d subgraph cluster0 { color=lightgrey; c -> e d -> e e -> f f -> g label = "JIT Engine"; } g -> h } #+END_SRC #+RESULTS: ep-16-jit-1 [[file:/tmp/ep16-1.svg]] ** Kaleidoscope JIT - Chapter 1 #+NAME: ep-16-jit-2 #+BEGIN_SRC graphviz-dot :file /tmp/ep16-2.svg :cmdline -Kdot -Tsvg digraph { graph [bgcolor=transparent] node [color=gray80 shape="box"] edge [color=gray80] rankdir = "LR" a[label="LLVM IR Module"] b[label="Compiler Layer"] c[label="Object Layer"] d[label="Target Code"] a -> b subgraph cluster0 { color=lightgrey; b -> c label = "Kaleidoscope JIT"; } c -> d } #+END_SRC #+RESULTS: ep-16-jit-2 [[file:/tmp/ep16-2.svg]] - Chapter 2 #+NAME: ep-16-jit-3 #+BEGIN_SRC graphviz-dot :file /tmp/ep16-3.svg :cmdline -Kdot -Tsvg digraph { graph [bgcolor=transparent] node [color=gray80 shape="box"] edge [color=gray80] rankdir = "LR" a[label="LLVM IR Module"] b[label="Compiler Layer"] c[label="Object Layer"] e[label="Target Code"] d[label="Optimize layer"] a -> d subgraph cluster0 { color=lightgrey; d -> b b -> c label = "Kaleidoscope JIT"; } c -> e } #+END_SRC #+RESULTS: ep-16-jit-3 [[file:/tmp/ep16-3.svg]] * DONE Episode 17 - Custom ORC Layers CLOSED: [2022-03-28 Mon 14:00] ** Updates: - Finished the basic compiler wiring - Restructured the source tree - Tweaked the build system mostly for install targets - Refactoring, cleaning up the code and writing tests ** Quick overview an ORC based JIT engine - JIT engines are made out of layers - Engines have a hierarchy of layers - Layers don't know about each other - Layers wrap the program representation in a =MaterializationUnit=, which is then stored in the =JITDylib=. - =MaterializationUnits= are responsible for describing the definitions they provide, and for unwrapping the program representation and passing it back to the layer when compilation is required. - When a =MaterializationUnit= hands a program representation back to the layer it comes with an associated =MaterializationResponsibility= object. This object tracks the definitions that must be materialized and provides a way to notify the =JITDylib= once they are either successfully materialized or a failure occurs. ** In order to build a custom layer we need: *** A custom materialization unit Let's have a look at the =MaterializationUnit= class. *** And the layer class itself The layer classes are not special but conventionally the come with few functions like: =add=, =emit= and =getInterface=. * DONE Episode 18 - JIT Engine Part 1 CLOSED: [2022-03-29 Tue 19:56] ** =Halley= JIT Engine - It's not the final implementation - Wraps LLJIT and LLLazyJIT - Uses object cache layer - Supports ASTs and Namespaces * DONE Episode 19 - JIT Engine Part 2 CLOSED: [2022-05-04 Wed 21:30] ** How Serene is different from other programming langs? - Serene is just a JIT engine - Compiletime vs Runtime + The borderline is not clear in case of Serene The Big picture #+NAME: ep-19-jit-1 #+BEGIN_SRC graphviz-dot :file /tmp/ep19-1.svg :cmdline -Kdot -Tsvg digraph { fontcolor="gray80" graph [bgcolor=transparent] node [color=gray80 shape="box", fontcolor="gray80"] edge [color=gray80, fontcolor="gray80"] input_ns[label="Input Namespace"] ast[label="AST"] vast[label="Valid AST"] ir[label="LLVM IR"] subgraph cluster_2 { label="REPL" color="gray80" graph [bgcolor=transparent, fontcolor="gray80"] node [color=gray80 shape="box", fontcolor="gray80"] input_form[label="Input Form"] result[label="Evaluation result"] result -> input_form[label="Loop"] } subgraph cluster_0 { label="JIT" color="gray80" graph [bgcolor=transparent, fontcolor="gray80"] node [color=gray80 shape="box", fontcolor="gray80"] execute[label="Execute native code"] binary[label="Binary File"] subgraph cluster_1 { label="AddAst/Ns" color="gray80" graph [bgcolor=transparent, fontcolor="gray80"] node [color=gray80 shape="box", fontcolor="gray80"] vast -> ir [label="Compile (No optimization)"] subgraph cluster_4 { label="Macro Expansion" color="gray80" graph [bgcolor=transparent, fontcolor="gray80"] node [color=gray80 shape="box", fontcolor="gray80"] vast -> macros [label=" Find the macros"] macros -> JITDylib [label=" look up the required\n Symbols in compiled code"] JITDylib -> symbols [label=" lookup"] } wrapped_code[lable="Wrapped IR"] ir -> wrapped_code[label= " Wrap top level Forms in \nfunctions/calls"] wrapped_code -> native [label=" Compile (No optimization)"] } symbols -> execute [label="Execute the functions mapped to the symbols"] execute -> vast execute -> result [label=" Print"] execute -> binary [label=" Dump"] native -> execute [label="invoke"] native -> JITDylib [label="Add"] JITDylib -> Context [label="Store as part the namespace"] } subgraph cluster_3 { label="CLI interface" color="gray80" graph [bgcolor=transparent, fontcolor="gray80"] node [color=gray80 shape="box", fontcolor="gray80"] input_ns -> file [label=" resolve to file"] } input_form -> ast [label=" read"] file -> ast [label=" read"] ast -> vast [label=" Semantic Analysis"] } #+END_SRC #+RESULTS: ep-19-jit-1 [[file:/tmp/ep19-1.png]] ** Let's look at some code * Episode 20 - Future Roadmap ** So Far - We created a bare bone and minimal compiler + That is capable of just in time and ahead of time compilation - We had an over of MLIR and pass management - We didn't spend time on fundamentals ** Design change - The current implementation is suitable for a static compiler - We want to move toward a more dynamic compiler ** What's next? - Part 2 - We're going to focus on some of the compiler fundamentals - We will create simple utilities to help us in our journey - Hopefully we will talk about type systems - We're going to sharpen our skills on LLVM/MLIR - I'm going to work on the new design