serene/dev.org

245 lines
14 KiB
Org Mode
Raw Permalink Normal View History

#+TITLE: Serene Development
#+AUTHOR: Sameer Rahmani
#+SEQ_TODO: TODO(t/!) NEXT(n/!) BLOCKED(b@/!) | DONE(d%) WONT_DO(c@/!) FAILED(f@/!)
#+TAGS: DOCS(d) EXAMPLES(e) serenecli(c) reader(r) context(x) Misc(m) JIT(j) GC(g) Tool(t)
#+STARTUP: logdrawer logdone logreschedule indent content align constSI entitiespretty nolatexpreview
#+OPTIONS: tex:t
#+HTML_MATHJAX: align: left indent: 5em tagside: left font: Neo-Eule
#+LATEX_CLASS: article
#+LATEX_CLASS_OPTIONS: [a4paper]
#+LATEX_HEADER: \usepackage{tcolorbox}
#+LATEX_HEADER: \usepackage{mathabx}
#+LATEX_HEADER: \newtcolorbox{infobox}[2][]{colback=cyan!5!white,before skip=14pt,after skip=8pt,colframe=cyan!75!black,sharp corners,title={#2},#1}
* Serene's Development Resources
This document is dedicated to the process of developing *Serene*. It contains a collection of resources
from the early days of the project and resources that need to be studied and A list of tasks and features
that needs to be done. This document is written using [[https://orgmode.org/][org-mode]]. You can use [[https://emacsclub.github.io/html/org_tutorial.html#sec-7][this cheatsheet]] as a quick guide
for the format but you will get more out of it using org-mode.
* Resources
For a generic list of resources on compiler design take a look at
[[https://tomassetti.me/resources-create-programming-languages/][the list of resource to create a programming language]] and [[https://www.reddit.com/r/ProgrammingLanguages/comments/8ggx2n/is_llvm_a_good_backend_for_functional_languages/][this list]]
** Parser
First of all you need to read [[https://tomassetti.me/guide-parsing-algorithms-terminology/][All you need to know about Parser algorithms]].
Then here is the list or parsers that we have considered
2020-06-07 15:22:51 +01:00
- [[https://github.com/Geal/nom/][Rust parser combinator framework]]
- [[https://github.com/lalrpop/lalrpop][LR(1) parser generator for Rust]]
- [[https://github.com/Marwes/combine][A parser combinator library for Rust]]
- [[https://github.com/kevinmehall/rust-peg][Parsing Expression Grammar (PEG) parser generator for Rust]]
- [[https://pest.rs/][General purpose parser]]
2020-10-29 19:34:12 +00:00
** Lisp
- [[https://github.com/kanaka/mal/blob/master/process/guide.md][Make a Lisp]]
*** Quasiquotation
- [[http://www.lispworks.com/documentation/HyperSpec/Body/02_df.htm][Backquote in CL]]
- [[https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node367.html][Backquote spec in Common Lisp the Language, 2nd Edition]]
- [[http://christophe.rhodes.io/notes/blog/posts/2014/backquote_and_pretty_printing/][Backquote and pretty printing]]
*** Compilers
- https://bernsteinbear.com/blog/compiling-a-lisp-0/
** Rust
- [[https://doc.rust-lang.org/book/][The Rust book]] (in [[https://www.reddit.com/r/rust/comments/2s1zj2/the_rust_programming_language_book_as_epub/][EPUB]] format)
** LLVM
- [[https://www.infoworld.com/article/3247799/what-is-llvm-the-power-behind-swift-rust-clang-and-more.html][Brief overview of LLVM]]
- [[https://aosabook.org/en/llvm.html][A bit in depth details on LLVM]]
- [[https://llvm.org/docs/tutorial/][Official LLVM tutorial C++]]
- [[https://blog.llvm.org/posts/2020-11-30-interactive-cpp-with-cling/][Interactive C++ with Cling]]
- [[https://www.wilfred.me.uk/blog/2015/02/21/my-first-llvm-compiler/][My First LLVM Compiler]]
- [[https://mukulrathi.co.uk/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/][A Complete Guide to LLVM for Programming Language Creators]]
** Data structures
- [[https://www.cs.cmu.edu/~rwh/theses/okasaki.pdf][Pure functional datastructures papaer]]
- [[https://reader.elsevier.com/reader/sd/pii/0167642394000042?token=CEFF5C5D1B03FD680762FC4889A14C0CA2BB28FE390EC51099984536E12AC358F3D28A5C25C274296ACBBC32E5AE23CD][Dynamic typing: syntax and proof theory]]
- [[https://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.4394][Representing Type Information in Dynamically Typed Languages]]
- [[https://www.researchgate.net/publication/259634489_An_empirical_study_on_the_impact_of_static_typing_on_software_maintainability][An empirical study on the impact of static typing on software maintainability]]
** Other languages
- [[https://julialang.org/research/julia-fresh-approach-BEKS.pdf][Julia: A Fresh Approach toNumerical Computing]]
** Cranelift
- [[https://github.com/bytecodealliance/wasmtime/tree/master/cranelift][Source tree]]
2022-03-05 14:28:37 +00:00
** Type Systems
- [[https://www.cs.cmu.edu/~rwh/courses/hott/][Homotopy Type Theory]]
2023-05-14 10:48:07 +01:00
- No, dynamic type systems are not inherently more open:
https://lexi-lambda.github.io/blog/2020/01/19/no-dynamic-type-systems-are-not-inherently-more-open/
- Type theory resources:
https://github.com/jozefg/learn-tt
** Memory management
- [[https://deepu.tech/memory-management-in-golang/][Visualizing memory management in Golang]]
- [[http://goog-perftools.sourceforge.net/doc/tcmalloc.html][TCMalloc : Thread-Caching Malloc]]
- [[https://medium.com/@ankur_anand/a-visual-guide-to-golang-memory-allocator-from-ground-up-e132258453ed][A visual guide to Go Memory Allocator from scratch (Golang)]]
** Concurrency
- [[https://www.ardanlabs.com/blog/2018/08/scheduling-in-go-part1.html][Scheduling In Go (Series)]]
2020-09-12 17:26:27 +01:00
** Garbage collection :GC:
- [[https://v8.dev/blog/high-performance-cpp-gc][GC on V8]]
- [[https://www.microsoft.com/en-us/research/uploads/prod/2020/11/perceus-tr-v1.pdf][Perceus: Garbage Free Reference Counting with Reuse]]
*** [[https://www.hboehm.info/gc/][Boehm GC]] :Tool:
*** [[https://www.ravenbrook.com/project/mps/][MPS]] :Tool:
*** [[https://www.mmtk.io/code][MMTK]] :Tool:
*** [[https://github.com/JWesleySM/Whiro][Whiro]] :Tool:
This is not GC but a tool to debug GC and memory allocation.
2020-09-12 17:26:27 +01:00
** JIT
- [[https://asmjit.com/][Machine code generator for C++]]
- https://www.youtube.com/watch?v=hILdR8XRvdQ&t=1511s
** Optimizations
- [[https://sunfishcode.github.io/blog/2018/10/22/Canonicalization.html][Canonicalization]]
** Compiler
- [[https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64][Stack frame layout on x86-64]]
*** Branch instructions
It would be cool to have macro to instruct the compiler about the likelyhood
of a branch in a conditional. Something similar to kernel's *likely* and *unlikely*
macros
2023-05-14 10:48:07 +01:00
*** How to learn compilers: LLVM Edition
https://lowlevelbits.org/how-to-learn-compilers-llvm-edition/
*** Pointers Are Complicated III, or: Pointer-integer casts exposed
https://www.ralfj.de/blog/2022/04/11/provenance-exposed.html
*** Execution Instrumentation
The compiler should be able to embed some code in the program to collect data about
the different execution paths or function instrumentation and other useful data the
can help the compiler to optimize the program even further. For example Imagine a
scenario which we compile a program with out any optimization ( in debug mode ) and
using some test cases or real usage of the program in several iteration we collect
data about the compiled application in a file (let's call it the ADF short for Analytic
Data File), and the we can pass that ADF file to the compiler to let it compile and optimize
the program by using the usual passes alonge side with some extra passes that operate
on ADF
** Lang
2020-11-25 21:27:25 +00:00
*** Scheme
- [[https://call-cc.org][Chicken Scheme - Easy-to-use compiler and interpreter, with lots of libraries]]
- [[https://github.com/barak/stalin][Stalin - Brutally optimizing Scheme compiler, with lots of optimization flags]]
** Utilities
- [[https://www.ralfj.de/blog/2020/12/14/provenance.html][Pointers Are Complicated]]
*** Emacs mode
- [[https://www.wilfred.me.uk/blog/2015/03/19/adding-a-new-language-to-emacs/][Adding A New Language to Emacs]]
- [[https://www.wilfred.me.uk/blog/2014/09/27/the-definitive-guide-to-syntax-highlighting/][The Definitive Guide To Syntax Highlighting]]
2023-05-14 10:48:07 +01:00
** Linker
- [[https://lwn.net/Articles/276782/][20 part linker essay]]
2021-11-01 15:09:11 +00:00
- [[https://lld.llvm.org/index.html][LLD Usage]]
** LLVM
- [[https://blog.yossarian.net/2021/07/19/LLVM-internals-part-1-bitcode-format][LLVM Internals]]
*** TableGen
- [[https://llvm.org/docs/TableGen/BackGuide.html#creating-a-new-backend][Create a backend]]
** Toolchain
- https://llvm.org/docs/BuildingADistribution.html
** Cross compilation
- https://blog.gibson.sh/2017/11/26/creating-portable-linux-binaries/#some-general-suggestions
A nice to read article on some of the common problems when linking statically
with none default libc or libc++
2023-05-14 10:48:07 +01:00
** Useful courses and resources
2021-12-19 21:45:49 +00:00
- https://www.cs.cornell.edu/courses/cs6120/2020fa/lesson/
2023-05-14 10:48:07 +01:00
- https://cs.lmu.edu/~ray/notes/languagedesignnotes/
* Considerations
** Hashmaps
*** DOS attack
- https://www.anchor.com.au/blog/2012/12/how-to-explain-hash-dos-to-your-parents-by-using-cats/
- https://en.wikipedia.org/wiki/Collision_attack
2021-11-18 10:03:41 +00:00
* Ideas
** Destructure types
Imagine a type that is a subset of a Coll, and when we
pass a Coll to its type constructor in destructs the input and
construct the type base on the data that it needs only and
leave the rest untouched
** Hot function optimization
it would be nice for the JIT to add instrumentation to the compiled
functions and detect hot functions similar to how javascript jits do it
and recompile those functions with more optimization passes
* Conversations
** Solutions to link other ~libc~ rather than the default
From my discassion with ~lhames~
#+BEGIN_QUOTE
I can think of a few approaches with different trade-offs:
- Link your whole JIT (including LLVM) against musl rather than the default -- JIT'd
code uses the desired libc, there's only one libc in the JIT'd process, but the
cost is high (perhaps prohibitive, depending on your constraints)
- JIT out-of-process -- JIT (including LLVM) uses default libc and is compiled only once,
executor links the (alternative) desired libc at compile time and must be compiled each
time that you want to change it -- JIT'd code uses the desired libc, there's only one libc
in the JIT'd process, but the config is involved (requires a cross-process setup)
- JIT in process, link desired libc via JIT -- Easy to set up, but now you've got two
libcs in the process. I've never tested that config. It might just work, it might
fail at link or runtime in weird ways.
#+END_QUOTE
* TODOs
** Strings
*** TODO How to concat to strings in a functional and immutable way?
Should we include an pointer to another string???
2022-03-08 14:20:15 +00:00
** TODO Create =Catch2= generators to be used in tests. Specially for the =reader= tests
** TODO Investigate possible implementanion for Internal Errors
- An option is to use llvm registry functionality like the one used in =clang-doc= instead of
=errorVariants= var.
2022-03-02 18:26:39 +00:00
** TODO In =SereneContext::getLatestJITDylib= function, make sure that the JITDylib is still valid
Make sure that the returning Dylib still exists in the JIT
by calling =jit->engine->getJITDylibByName(dylib_name);=
** TODO Provide the CLI arguments to pass the =createTargetMachine=.
We need a way to tweak the target machine object. It's better to provide cli tools
to do so.
2022-01-03 16:57:31 +00:00
** TODO Walk the module and register the symbols with the engine (lazy and nonlazy) :JIT:
** TODO Change the compilation layer to accept MLIR modules instead of LLVM IR :JIT:
This way we can fine tune MLIR's passes based on the JIT settings as well
2021-11-11 15:37:23 +00:00
** TODO Create a pass to rename functions to include the ns name
** TODO Use =const= where ever it makes sense
** TODO Create different pass pipeline for different compilation phases
So we can use them directly via command line, like -O1 for example
2021-11-11 15:37:23 +00:00
** TODO Investigate the huge size of serenec
So far it seems that the static linking and the lack of tree shaking is the issue
2021-11-11 15:37:23 +00:00
** DONE Add the support for =ns-paths= :serenecli:context:
CLOSED: [2021-09-25 Sat 19:22]
:LOGBOOK:
- State "DONE" from "TODO" [2021-09-25 Sat 19:22]
:END:
2021-07-15 10:27:17 +01:00
We need to add the support for an array of paths to lookup namespaces. The =ns-paths= should
be an array that each entry represents a path which serene has to look into in order to find
a namespace. For instance, when serene wants to load the =foo.bar= namespace, it should walk
the paths in =ns-paths= and look for that ns. Similar to =classpath= in the JVM or =LOAD_PATH=
in python.
- [ ] Add the support to the *Context*.
- [ ] Add the support to *Namespace*.
- [ ] Add the cli argument to the =bin/serene.cpp=
2021-08-15 13:42:22 +01:00
2021-11-11 15:37:23 +00:00
** TODO Error handling
Create proper error handling for the internal infra
2021-11-11 15:37:23 +00:00
** TODO Replace =llvm::outs()= with debug statements
** TODO Move the generatable logic out of its files and remove them
** TODO Add a CLI option to get any extra pass
** TODO Add support for =sourcemgr= for input files
** TODO Language Spec :DOCS:
** TODO A proper List implementation
** TODO Vector implementation
** TODO Hashmap implementation
** TODO Meta data support
** TODO Docstring support :DOCS:
- [ ] For functions and macros
- [ ] For namespaces and projects
- [ ] API to interact with docstrings and helps
2021-11-11 15:37:23 +00:00
** TODO FFI interface
** TODO nREPL
** TODO Emacs mode :Misc:
** TODO Number implementation
** TODO String implementation
** TODO Enum implementation
** TODO Protocol
** TODO Struct implementation
** TODO Multi arity functions
** TODO QuasiQuotation
** TODO Linter :Misc:
** TODO Document generator :DOCS:Misc:
** TODO Spec like functionality
** TODO Laziness implementation
** TODO Investigate the Semantic Error tree and tracking
2021-04-30 16:07:28 +01:00
Basically we should be able to create an error tree on semantic analysis
time and trace semantic errors on different layers and intensively.
Is it a good idea ?
2021-11-11 15:37:23 +00:00
** Standard libraries
*** TODO IO library
*** TODO Test library