serene/docs/videos.org

511 lines
15 KiB
Org Mode
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#+TITLE: How to build a compiler with LLVM and MLIR
#+SEQ_TODO: TODO(t/!) NEXT(n/!) BLOCKED(b@/!) | DONE(d%) CANCELLED(c@/!) FAILED(f@/!)
#+TAGS: READER(r) MISC(m)
#+STARTUP: logdrawer logdone logreschedule indent content align constSI entitiespretty overview
* DONE Episode 1 - Introduction
** What is it all about?
- Create a programming lang
- Guide for contributors
- A LLVM/MLIR guide
** The Plan
- Git branches
- No live coding
- Feel free to contribute
** Serene and a bit of history
- Other Implementations
- Requirements
- C++ 14
- CMake
- Repository: https://devheroes.codes/Serene
- Website: lxsameer.com
Email: lxsameer@gnu.org
* DONE Episode 2 - Basic Setup
CLOSED: [2021-07-10 Sat 09:04]
** Installing Requirements
*** LLVM and Clang
- mlir-tblgen
*** ccache (optional)
** Building Serene and the =builder=
- git hooks
** Source tree structure
** =dev.org= resources and TODOs
* DONE Episode 3 - Overview
CLOSED: [2021-07-19 Mon 09:41]
** Generic Compiler
- [[https://www.cs.princeton.edu/~appel/modern/ml/whichver.html][Modern Compiler Implementation in ML: Basic Techniques]]
- [[https://suif.stanford.edu/dragonbook/][Compilers: Principles, Techniques, and Tools (The Dragon Book)]]
*** Common Steps
- Frontend
- Lexical analyzer (Lexer)
- Syntax analyzer (Parser)
- Semantic analyzer
- Middleend
- Intermediate code generation
- Code optimizer
- Backend
- Target code generation
** LLVM
[[llvm.org]]
*** Watch [[https://www.youtube.com/watch?v=J5xExRGaIIY][Introdution to LLVM]]
*** Quick overview
Deducted from https://www.aosabook.org/en/llvm.html
[[./imgs/llvm_dia.svg]]
- It's a set of libraries to create a compiler.
- Well engineered.
- we can focus only on the fronted of the compiler and what is
actually important to us and leave the tricky stuff to LLVM.
- LLVM IR enables us to use multiple languages together.
- It supports many targets.
- We can benefit from already made IR level optimizers.
- ....
** MLIR
[[mlir.llvm.org]]
[[./imgs/mlir_dia.svg]]
- With MLIR dialects provide higher level semantics than LLVM IR.
- It's easier to reason about higher level IR that is modeled after
the AST rather than a low level IR.
- We can use the pass infrastructure to efficiently process and transform the IR.
- With many ready to use dialects we can really focus on our language and us the other
dialect when ever necessary.
- ...
** Serene
*** A Compiler frontend
*** Flow
- =serenec= in parses the command lines args
- =reader= reads the input file and generates an =AST=
- =semantic analyzer= walks the =AST= and generates a new =AST= and rewrites
the necessary nodes.
- =slir= generator generates =slir= dialect code from =AST=.
- We lower =slir= to other dialects of the *MLIR* which we call the result =mlir=.
- Then, We lower everything to the =LLVMIR dialect= and call it =lir= (lowered IR).
- Finally we fully lower =lir= to =LLVM IR= and pass it to the object generator
to generate object files.
- Call the default =c compiler= to link the object files and generate the machine code.
* DONE Episode 4 - The reader
CLOSED: [2021-07-27 Tue 22:50]
** What is a Parser ?
To put it simply, Parser converts the source code to an [[https://en.wikipedia.org/wiki/Abstract_syntax_tree][AST]]
*** Algorithms
- LL(k)
- LR
- LALR
- PEG
- .....
Read More:
- https://stereobooster.com/posts/an-overview-of-parsing-algorithms/
- https://tomassetti.me/guide-parsing-algorithms-terminology/
*** Libraries
- https://en.wikipedia.org/wiki/Comparison_of_parser_generators
*** Our Parser
- We have a hand written LL(1.5) like parser/lexer since lisp already has a structure.
#+BEGIN_SRC lisp
;; pseudo code
(def some-fn (fn (x y)
(+ x y)))
(defn main ()
(println "Result: " (some-fn 3 8)))
#+END_SRC
- LL(1.5)?
- O(n)
* DONE Episode 5 - The Abstract Syntax Tree
CLOSED: [2021-07-30 Fri 14:01]
** What is an AST?
Ast is a tree representation of the abstract syntactic structure of source code. It's just a tree made of nodes that each node is
a data structure describing the syntax.
#+BEGIN_SRC lisp
;; pseudo code
(def main (fn () 4))
(prn (main))
#+END_SRC
[[./imgs/ast.svg]]
** The =Expression= abstract class
*** Expressions
- Expressions vs Statements
- Serene(Lisp) and expressions
** Node & AST
* DONE Episode 6 - The Semantic Analyzer
CLOSED: [2021-08-21 Sat 18:44]
** Qs
- Why didn't we implement a linked list?
- Why we are using the =std::vector= instead of llvm collections?
** What is Semantic Analysis?
- Semantic Analysis makes sure that the given program is semantically correct.
- Type checkr works as part of this step as well.
#+BEGIN_SRC lisp
;; pseudo code
(4 main)
#+END_SRC
[[./imgs/incorrct_semantic.svg]]
** Semantic Analysis and rewrites
We need to reform the AST to reflect the semantics of Serene closly.
#+BEGIN_SRC lisp
;; pseudo code
(def main (fn () 4))
(prn (main))
#+END_SRC
[[./imgs/ast.svg]]
[[./imgs/semantic.svg]]
Let's run the compiler to see the semantic analysis in action.
** Let's check out the code
* DONE Episode 7 - The Context and Namespace
CLOSED: [2021-09-04 Sat 10:53]
** Namespaces
*** Unit of compilation
*** Usually maps to a file
*** keeps the state and evironment
** SereneContext vs LLVM Context vs MLIR Context
*** Compilers global state
*** The owner of LLVM/MLIR contexts
*** Holds the namespace table
*** Probably will contain the primitive types as well
* DONE Episode 8 - MLIR Basics
CLOSED: [2021-09-17 Fri 10:18]
** Serene Changes
- Introducing a SourceManager
- Reader changes
- *serenec* cli interface in changing
** Disclaimer
*I'm not an expert in MLIR*
** Why?
- A bit of history
- LLVM IR is to low level
- We need an IR to implement high level concepts and flows *MLIR* is a framework to build a compiler with your own IR. kinda :P
- Reusability
- ...
** Language
*** Overview
- SSA Based (https://en.wikipedia.org/wiki/Static_single_assignment_form)
- Typed
- Context free(for lack of better words)
*** Dialects
- A collection of operations
- Custom types
- Meta data
- We can use a mixture of different dialects
**** builtin dialects:
- std
- llvm
- math
- async
- ...
*** Opetations
- Higher level of abstraction
- Not instructions
- SSA forms
- Tablegen backend
- Verifiers and printers
*** Attributes
*** Blocks & Regions
*** Types
- Extesible
** Pass Infrastructure
Analysis and transformation infrastructure
- We will implement most of our semantic analysis logic and type checker as passes
** Pattern Rewriting
- Tablegen backed
** Operation Definition Specification
** Examples
*Not*: You need =mlir-mode= and =llvm-mode= available to you for the code highlighting of
the following code blocks. Both of those are distributed with the LLVM.
*** General syntax
#+BEGIN_SRC mlir
%result:2 = "somedialect.blah"(%x#2) { some.attribute = true, other_attribute = 3 }
: (!somedialect<"example_type">) -> (!somedialect<"foo_s">, i8)
loc(callsite("main" at "main.srn":10:8))
#+END_SRC
*** Blocks and Regions
#+BEGIN_SRC mlir
func @simple(i64, i1) -> i64 {
^bb0(%a: i64, %cond: i1): // Code dominated by ^bb0 may refer to %a
cond_br %cond, ^bb1, ^bb2
^bb1:
br ^bb3(%a: i64) // Branch passes %a as the argument
^bb2:
%b = addi %a, %a : i64
br ^bb3(%b: i64) // Branch passes %b as the argument
// ^bb3 receives an argument, named %c, from predecessors
// and passes it on to bb4 along with %a. %a is referenced
// directly from its defining operation and is not passed through
// an argument of ^bb3.
^bb3(%c: i64):
//br ^bb4(%c, %a : i64, i64)
"serene.ifop"(%c) ({ // if %a is in-scope in the containing region...
// then %a is in-scope here too.
%new_value = "another_op"(%c) : (i64) -> (i64)
^someblock(%new_value):
%x = "some_other_op"() {value = 4 : i64} : () -> i64
}) : (i64) -> (i64)
^bb4(%d : i64, %e : i64):
%0 = addi %d, %e : i64
return %0 : i64 // Return is also a terminator.
}
#+END_SRC
*** SLIR example
Command line arguments to emir =slir=
#+BEGIN_SRC sh
./builder run --build-dir ./build -emit slir `pwd`/docs/examples/hello_world.srn
#+END_SRC
Output:
#+BEGIN_SRC mlir
module @user {
%0 = "serene.fn"() ( {
%2 = "serene.value"() {value = 0 : i64} : () -> i64
return %2 : i64
}) {args = {}, name = "main", sym_visibility = "public"} : () -> i64
%1 = "serene.fn"() ( {
%2 = "serene.value"() {value = 0 : i64} : () -> i64
return %2 : i64
}) {args = {n = i64, v = i64, y = i64}, name = "main1", sym_visibility = "public"} : () -> i64
}
#+END_SRC
*** Serene's MLIR (maybe we need a better name)
Command line arguments to emir =mlir=
#+BEGIN_SRC sh
./builder run --build-dir ./build -emit mlir `pwd`/docs/examples/hello_world.srn
#+END_SRC
Output:
#+BEGIN_SRC mlir
module @user {
func @main() -> i64 {
%c3_i64 = constant 3 : i64
return %c3_i64 : i64
}
func @main1(%arg0: i64, %arg1: i64, %arg2: i64) -> i64 {
%c3_i64 = constant 3 : i64
return %c3_i64 : i64
}
}
#+END_SRC
*** LIR
Command line arguments to emir =lir=
#+BEGIN_SRC sh
./builder run --build-dir ./build -emit lir `pwd`/docs/examples/hello_world.srn
#+END_SRC
Output:
#+BEGIN_SRC mlir
module @user {
llvm.func @main() -> i64 {
%0 = llvm.mlir.constant(3 : i64) : i64
llvm.return %0 : i64
}
llvm.func @main1(%arg0: i64, %arg1: i64, %arg2: i64) -> i64 {
%0 = llvm.mlir.constant(3 : i64) : i64
llvm.return %0 : i64
}
}
#+END_SRC
*** LLVMIR
Command line arguments to emir =llvmir=
#+BEGIN_SRC sh
./builder run --build-dir ./build -emit ir `pwd`/docs/examples/hello_world.srn
#+END_SRC
Output:
#+BEGIN_SRC llvm
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
declare i8* @malloc(i64 %0)
declare void @free(i8* %0)
define i64 @main() !dbg !3 {
ret i64 3, !dbg !7
}
define i64 @main1(i64 %0, i64 %1, i64 %2) !dbg !9 {
ret i64 3, !dbg !10
}
!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!2}
!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "mlir", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
!1 = !DIFile(filename: "LLVMDialectModule", directory: "/")
!2 = !{i32 2, !"Debug Info Version", i32 3}
!3 = distinct !DISubprogram(name: "main", linkageName: "main", scope: null, file: !4, type: !5, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !6)
!4 = !DIFile(filename: "REPL", directory: "/home/lxsameer/src/serene/serene/build")
!5 = !DISubroutineType(types: !6)
!6 = !{}
!7 = !DILocation(line: 0, column: 10, scope: !8)
!8 = !DILexicalBlockFile(scope: !3, file: !4, discriminator: 0)
!9 = distinct !DISubprogram(name: "main1", linkageName: "main1", scope: null, file: !4, line: 1, type: !5, scopeLine: 1, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !6)
!10 = !DILocation(line: 1, column: 11, scope: !11)
!11 = !DILexicalBlockFile(scope: !9, file: !4, discriminator: 0)
#+END_SRC
** Resources
- [[https://www.youtube.com/watch?v=Y4SvqTtOIDk][2020 LLVM Developers Meeting: M. Amini & R. Riddle “MLIR Tutorial”]]
- [[https://www.youtube.com/watch?v=qzljG6DKgic][2019 EuroLLVM Developers Meeting: T. Shpeisman & C. Lattner “MLIR: Multi-Level Intermediate Repr..”]]
- https://mlir.llvm.org/docs
- https://mlir.llvm.org/docs/LangRef
- https://en.wikipedia.org/wiki/Basic_block
* DONE Episode 9 - IR (SLIR) generation
CLOSED: [2021-10-01 Fri 18:56]
** Updates:
- Source manager
- Diagnostic Engine
- JIT
There will be an episode dedicated to eache of these
** How does IR generation works
- Pass around MLIR context
- Create Builder objects that creates operations in specific
locations
- ModuleOp
- Namespace
** How to define a new dialect
- Pure C++
- Tablegen
** SLIR
*** The SLIR goal
- An IR that follows the AST
- Rename?
*** Steps
- [X] Define the new dialect
- [X] Setup the tablegen
- [X] Define the operations
- [X] Walk the AST and generate the operations
* DONE Episode 10 - Pass Infrastructure
CLOSED: [2021-10-15 Fri 14:17]
** The next Step
** Updates:
*** CMake changes
** What is a Pass
*** Passes are the unit of abstraction for optimization and transformation in LLVM/MLIR
*** Compilation is all about transforming the input data and produce an output
Source code -> IR X -> IR Y -> IR Z -> ... -> Target Code
*** Almost like a function composition
*** The big picture
*** Pass Managers (Pipelines) are made out of a collection of passes and can be nested
*** The most of the interesting parts of the compiler reside in Passes.
*** We will probably spend most of our time working with passes
** Pass Infrastructure
*** ODS or C++
*** Operation is the main abstract unit of transformation
*** OperationPass is the base class for all the passes.
*** We need to override =runOnOperation=
*** There's some rules you need to follow when defining your Pass
**** Must not maintain any global mutable state
**** Must not modify the state of another operation not nested within the current operation being operated on
**** ...
*** Passes are either OpSpecific or OpAgnostic
**** OpSpecific
#+BEGIN_SRC C++
struct MyFunctionPass : public PassWrapper<MyFunctionPass,
OperationPass<FuncOp>> {
void runOnOperation() override {
// Get the current FuncOp operation being operated on.
FuncOp f = getOperation();
// Walk the operations within the function.
f.walk([](Operation *inst) {
// ....
});
}
};
/// Register this pass so that it can be built via from a textual pass pipeline.
/// (Pass registration is discussed more below)
void registerMyPass() {
PassRegistration<MyFunctionPass>();
}
#+END_SRC
**** OpAgnostic
#+BEGIN_SRC C++
struct MyOperationPass : public PassWrapper<MyOperationPass, OperationPass<>> {
void runOnOperation() override {
// Get the current operation being operated on.
Operation *op = getOperation();
// ...
}
};
#+END_SRC
*** How transformation works?
*** Analyses and Passes
*** Pass management and nested pass managers
#+BEGIN_SRC C++
// Create a top-level `PassManager` class. If an operation type is not
// explicitly specific, the default is the builtin `module` operation.
PassManager pm(ctx);
// Note: We could also create the above `PassManager` this way.
PassManager pm(ctx, /*operationName=*/"builtin.module");
// Add a pass on the top-level module operation.
pm.addPass(std::make_unique<MyModulePass>());
// Nest a pass manager that operates on `spirv.module` operations nested
// directly under the top-level module.
OpPassManager &nestedModulePM = pm.nest<spirv::ModuleOp>();
nestedModulePM.addPass(std::make_unique<MySPIRVModulePass>());
// Nest a pass manager that operates on functions within the nested SPIRV
// module.
OpPassManager &nestedFunctionPM = nestedModulePM.nest<FuncOp>();
nestedFunctionPM.addPass(std::make_unique<MyFunctionPass>());
// Run the pass manager on the top-level module.
ModuleOp m = ...;
if (failed(pm.run(m))) {
// Handle the failure
}
#+END_SRC
* Episode 11 - Lowering SLIR
** Overview
- What is a Pass?
- Pass Manager
** Dialect lowering
*** Why?
*** Transforming a dialect to another dialect or LLVM IR
*** The goal is to lower SLIR to LLVM IR directly or indirectly.
** Dialect Conversions
This framework allows for transforming a set of illegal operations to a set of legal ones.
*** Target Conversion
*** Rewrite Patterns
*** Type Converter
** Full vs Partial Conversion