#+TITLE: How to build a compiler with LLVM and MLIR #+SEQ_TODO: TODO(t/!) NEXT(n/!) BLOCKED(b@/!) | DONE(d%) CANCELLED(c@/!) FAILED(f@/!) #+TAGS: READER(r) MISC(m) #+STARTUP: logdrawer logdone logreschedule indent content align constSI entitiespretty * DONE Episode 1 - Introduction ** What is it all about? - Create a programming lang - Guide for contributors - A LLVM/MLIR guide ** The Plan - Git branches - No live coding - Feel free to contribute ** Serene and a bit of history - Other Implementations - Requirements - C++ 14 - CMake - Repository: https://devheroes.codes/Serene - Website: lxsameer.com Email: lxsameer@gnu.org * DONE Episode 2 - Basic Setup CLOSED: [2021-07-10 Sat 09:04] ** Installing Requirements *** LLVM and Clang - mlir-tblgen *** ccache (optional) ** Building Serene and the =builder= - git hooks ** Source tree structure ** =dev.org= resources and TODOs * DONE Episode 3 - Overview CLOSED: [2021-07-19 Mon 09:41] ** Generic Compiler - [[https://www.cs.princeton.edu/~appel/modern/ml/whichver.html][Modern Compiler Implementation in ML: Basic Techniques]] - [[https://suif.stanford.edu/dragonbook/][Compilers: Principles, Techniques, and Tools (The Dragon Book)]] *** Common Steps - Frontend - Lexical analyzer (Lexer) - Syntax analyzer (Parser) - Semantic analyzer - Middleend - Intermediate code generation - Code optimizer - Backend - Target code generation ** LLVM [[llvm.org]] *** Watch [[https://www.youtube.com/watch?v=J5xExRGaIIY][Introdution to LLVM]] *** Quick overview Deducted from https://www.aosabook.org/en/llvm.html [[./imgs/llvm_dia.svg]] - It's a set of libraries to create a compiler. - Well engineered. - we can focus only on the fronted of the compiler and what is actually important to us and leave the tricky stuff to LLVM. - LLVM IR enables us to use multiple languages together. - It supports many targets. - We can benefit from already made IR level optimizers. - .... ** MLIR [[mlir.llvm.org]] [[./imgs/mlir_dia.svg]] - With MLIR dialects provide higher level semantics than LLVM IR. - It's easier to reason about higher level IR that is modeled after the AST rather than a low level IR. - We can use the pass infrastructure to efficiently process and transform the IR. - With many ready to use dialects we can really focus on our language and us the other dialect when ever necessary. - ... ** Serene *** A Compiler frontend *** Flow - =serenec= in parses the command lines args - =reader= reads the input file and generates an =AST= - =semantic analyzer= walks the =AST= and generates a new =AST= and rewrites the necessary nodes. - =slir= generator generates =slir= dialect code from =AST=. - We lower =slir= to other dialects of the *MLIR* which we call the result =mlir=. - Then, We lower everything to the =LLVMIR dialect= and call it =lir= (lowered IR). - Finally we fully lower =lir= to =LLVM IR= and pass it to the object generator to generate object files. - Call the default =c compiler= to link the object files and generate the machine code. * DONE Episode 4 - The reader CLOSED: [2021-07-27 Tue 22:50] ** What is a Parser ? To put it simply, Parser converts the source code to an [[https://en.wikipedia.org/wiki/Abstract_syntax_tree][AST]] *** Algorithms - LL(k) - LR - LALR - PEG - ..... Read More: - https://stereobooster.com/posts/an-overview-of-parsing-algorithms/ - https://tomassetti.me/guide-parsing-algorithms-terminology/ *** Libraries - https://en.wikipedia.org/wiki/Comparison_of_parser_generators *** Our Parser - We have a hand written LL(1.5) like parser/lexer since lisp already has a structure. #+BEGIN_SRC lisp ;; pseudo code (def some-fn (fn (x y) (+ x y))) (defn main () (println "Result: " (some-fn 3 8))) #+END_SRC - LL(1.5)? - O(n) * DONE Episode 5 - The Abstract Syntax Tree CLOSED: [2021-07-30 Fri 14:01] ** What is an AST? Ast is a tree representation of the abstract syntactic structure of source code. It's just a tree made of nodes that each node is a data structure describing the syntax. #+BEGIN_SRC lisp ;; pseudo code (def main (fn () 4)) (prn (main)) #+END_SRC [[./imgs/ast.svg]] ** The =Expression= abstract class *** Expressions - Expressions vs Statements - Serene(Lisp) and expressions ** Node & AST * Episode 6 - The Semantic Analyzer ** Qs - Why didn't we implement a linked list? - Why we are using the =std::vector= instead of llvm collections? ** What is Semantic Analysis? - Semantic Analysis makes sure that the given program is semantically correct. - Type checkr works as part of this step as well. #+BEGIN_SRC lisp ;; pseudo code (4 main) #+END_SRC [[./imgs/incorrct_semantic.svg]] ** Semantic Analysis and rewrites We need to reform the AST to reflect the semantics of Serene closly. #+BEGIN_SRC lisp ;; pseudo code (def main (fn () 4)) (prn (main)) #+END_SRC [[./imgs/ast.svg]] [[./imgs/semantic.svg]] Let's run the compiler to see the semantic analysis in action. ** Let's check out the code