161 lines
4.9 KiB
Org Mode
161 lines
4.9 KiB
Org Mode
#+TITLE: How to build a compiler with LLVM and MLIR
|
|
#+SEQ_TODO: TODO(t/!) NEXT(n/!) BLOCKED(b@/!) | DONE(d%) CANCELLED(c@/!) FAILED(f@/!)
|
|
#+TAGS: READER(r) MISC(m)
|
|
#+STARTUP: logdrawer logdone logreschedule indent content align constSI entitiespretty
|
|
|
|
* DONE Episode 1 - Introduction
|
|
** What is it all about?
|
|
- Create a programming lang
|
|
- Guide for contributors
|
|
- A LLVM/MLIR guide
|
|
** The Plan
|
|
- Git branches
|
|
- No live coding
|
|
- Feel free to contribute
|
|
** Serene and a bit of history
|
|
- Other Implementations
|
|
- Requirements
|
|
- C++ 14
|
|
- CMake
|
|
- Repository: https://devheroes.codes/Serene
|
|
- Website: lxsameer.com
|
|
Email: lxsameer@gnu.org
|
|
* DONE Episode 2 - Basic Setup
|
|
CLOSED: [2021-07-10 Sat 09:04]
|
|
** Installing Requirements
|
|
*** LLVM and Clang
|
|
- mlir-tblgen
|
|
*** ccache (optional)
|
|
** Building Serene and the =builder=
|
|
- git hooks
|
|
** Source tree structure
|
|
** =dev.org= resources and TODOs
|
|
* DONE Episode 3 - Overview
|
|
CLOSED: [2021-07-19 Mon 09:41]
|
|
** Generic Compiler
|
|
- [[https://www.cs.princeton.edu/~appel/modern/ml/whichver.html][Modern Compiler Implementation in ML: Basic Techniques]]
|
|
- [[https://suif.stanford.edu/dragonbook/][Compilers: Principles, Techniques, and Tools (The Dragon Book)]]
|
|
*** Common Steps
|
|
- Frontend
|
|
- Lexical analyzer (Lexer)
|
|
- Syntax analyzer (Parser)
|
|
- Semantic analyzer
|
|
- Middleend
|
|
- Intermediate code generation
|
|
- Code optimizer
|
|
- Backend
|
|
- Target code generation
|
|
** LLVM
|
|
[[llvm.org]]
|
|
*** Watch [[https://www.youtube.com/watch?v=J5xExRGaIIY][Introdution to LLVM]]
|
|
*** Quick overview
|
|
Deducted from https://www.aosabook.org/en/llvm.html
|
|
[[./imgs/llvm_dia.svg]]
|
|
- It's a set of libraries to create a compiler.
|
|
- Well engineered.
|
|
- we can focus only on the fronted of the compiler and what is
|
|
actually important to us and leave the tricky stuff to LLVM.
|
|
- LLVM IR enables us to use multiple languages together.
|
|
- It supports many targets.
|
|
- We can benefit from already made IR level optimizers.
|
|
- ....
|
|
|
|
** MLIR
|
|
[[mlir.llvm.org]]
|
|
[[./imgs/mlir_dia.svg]]
|
|
|
|
- With MLIR dialects provide higher level semantics than LLVM IR.
|
|
- It's easier to reason about higher level IR that is modeled after
|
|
the AST rather than a low level IR.
|
|
- We can use the pass infrastructure to efficiently process and transform the IR.
|
|
- With many ready to use dialects we can really focus on our language and us the other
|
|
dialect when ever necessary.
|
|
- ...
|
|
** Serene
|
|
*** A Compiler frontend
|
|
*** Flow
|
|
- =serenec= in parses the command lines args
|
|
- =reader= reads the input file and generates an =AST=
|
|
- =semantic analyzer= walks the =AST= and generates a new =AST= and rewrites
|
|
the necessary nodes.
|
|
- =slir= generator generates =slir= dialect code from =AST=.
|
|
- We lower =slir= to other dialects of the *MLIR* which we call the result =mlir=.
|
|
- Then, We lower everything to the =LLVMIR dialect= and call it =lir= (lowered IR).
|
|
- Finally we fully lower =lir= to =LLVM IR= and pass it to the object generator
|
|
to generate object files.
|
|
- Call the default =c compiler= to link the object files and generate the machine code.
|
|
* DONE Episode 4 - The reader
|
|
CLOSED: [2021-07-27 Tue 22:50]
|
|
** What is a Parser ?
|
|
To put it simply, Parser converts the source code to an [[https://en.wikipedia.org/wiki/Abstract_syntax_tree][AST]]
|
|
*** Algorithms
|
|
- LL(k)
|
|
- LR
|
|
- LALR
|
|
- PEG
|
|
- .....
|
|
|
|
Read More:
|
|
- https://stereobooster.com/posts/an-overview-of-parsing-algorithms/
|
|
- https://tomassetti.me/guide-parsing-algorithms-terminology/
|
|
*** Libraries
|
|
- https://en.wikipedia.org/wiki/Comparison_of_parser_generators
|
|
*** Our Parser
|
|
- We have a hand written LL(1.5) like parser/lexer since lisp already has a structure.
|
|
#+BEGIN_SRC lisp
|
|
;; pseudo code
|
|
(def some-fn (fn (x y)
|
|
(+ x y)))
|
|
(defn main ()
|
|
(println "Result: " (some-fn 3 8)))
|
|
#+END_SRC
|
|
- LL(1.5)?
|
|
- O(n)
|
|
* DONE Episode 5 - The Abstract Syntax Tree
|
|
CLOSED: [2021-07-30 Fri 14:01]
|
|
** What is an AST?
|
|
Ast is a tree representation of the abstract syntactic structure of source code. It's just a tree made of nodes that each node is
|
|
a data structure describing the syntax.
|
|
|
|
#+BEGIN_SRC lisp
|
|
;; pseudo code
|
|
(def main (fn () 4))
|
|
(prn (main))
|
|
#+END_SRC
|
|
|
|
|
|
[[./imgs/ast.svg]]
|
|
** The =Expression= abstract class
|
|
*** Expressions
|
|
- Expressions vs Statements
|
|
- Serene(Lisp) and expressions
|
|
** Node & AST
|
|
* Episode 6 - The Semantic Analyzer
|
|
** Qs
|
|
- Why didn't we implement a linked list?
|
|
- Why we are using the =std::vector= instead of llvm collections?
|
|
** What is Semantic Analysis?
|
|
- Semantic Analysis makes sure that the given program is semantically correct.
|
|
- Type checkr works as part of this step as well.
|
|
|
|
#+BEGIN_SRC lisp
|
|
;; pseudo code
|
|
(4 main)
|
|
#+END_SRC
|
|
|
|
[[./imgs/incorrct_semantic.svg]]
|
|
** Semantic Analysis and rewrites
|
|
We need to reform the AST to reflect the semantics of Serene closly.
|
|
|
|
#+BEGIN_SRC lisp
|
|
;; pseudo code
|
|
(def main (fn () 4))
|
|
(prn (main))
|
|
#+END_SRC
|
|
[[./imgs/ast.svg]]
|
|
|
|
[[./imgs/semantic.svg]]
|
|
|
|
Let's run the compiler to see the semantic analysis in action.
|
|
** Let's check out the code
|