14 KiB
Serene Development
- Serene's Development Resources
- Resources
- Considerations
- Ideas
- Conversations
- TODOs
- Strings
- Create
Catch2
generators to be used in tests. Specially for thereader
tests - Investigate possible implementanion for Internal Errors
- In
SereneContext::getLatestJITDylib
function, make sure that the JITDylib is still valid - Provide the CLI arguments to pass the
createTargetMachine
. - Walk the module and register the symbols with the engine (lazy and nonlazy)
- Change the compilation layer to accept MLIR modules instead of LLVM IR
- Create a pass to rename functions to include the ns name
- Use
const
where ever it makes sense - Create different pass pipeline for different compilation phases
- Investigate the huge size of serenec
- Add the support for
ns-paths
- Error handling
- Replace
llvm::outs()
with debug statements - Move the generatable logic out of its files and remove them
- Add a CLI option to get any extra pass
- Add support for
sourcemgr
for input files - Language Spec
- A proper List implementation
- Vector implementation
- Hashmap implementation
- Meta data support
- Docstring support
- FFI interface
- nREPL
- Emacs mode
- Number implementation
- String implementation
- Enum implementation
- Protocol
- Struct implementation
- Multi arity functions
- QuasiQuotation
- Linter
- Document generator
- Spec like functionality
- Laziness implementation
- Investigate the Semantic Error tree and tracking
- Standard libraries
Serene's Development Resources
This document is dedicated to the process of developing Serene. It contains a collection of resources from the early days of the project and resources that need to be studied and A list of tasks and features that needs to be done. This document is written using org-mode. You can use this cheatsheet as a quick guide for the format but you will get more out of it using org-mode.
Resources
For a generic list of resources on compiler design take a look at the list of resource to create a programming language and this list
Parser
First of all you need to read All you need to know about Parser algorithms. Then here is the list or parsers that we have considered
Lisp
Rust
- The Rust book (in EPUB format)
LLVM
Data structures
Other languages
Cranelift
Type Systems
- Homotopy Type Theory
- No, dynamic type systems are not inherently more open: https://lexi-lambda.github.io/blog/2020/01/19/no-dynamic-type-systems-are-not-inherently-more-open/
- Type theory resources: https://github.com/jozefg/learn-tt
Memory management
Concurrency
Garbage collection GC
Optimizations
Compiler
Branch instructions
It would be cool to have macro to instruct the compiler about the likelyhood of a branch in a conditional. Something similar to kernel's likely and unlikely macros
How to learn compilers: LLVM Edition
Pointers Are Complicated III, or: Pointer-integer casts exposed
Execution Instrumentation
The compiler should be able to embed some code in the program to collect data about the different execution paths or function instrumentation and other useful data the can help the compiler to optimize the program even further. For example Imagine a scenario which we compile a program with out any optimization ( in debug mode ) and using some test cases or real usage of the program in several iteration we collect data about the compiled application in a file (let's call it the ADF short for Analytic Data File), and the we can pass that ADF file to the compiler to let it compile and optimize the program by using the usual passes alonge side with some extra passes that operate on ADF
Lang
Utilities
Linker
LLVM
TableGen
Cross compilation
- https://blog.gibson.sh/2017/11/26/creating-portable-linux-binaries/#some-general-suggestions A nice to read article on some of the common problems when linking statically with none default libc or libc++
Useful courses and resources
Considerations
Ideas
Destructure types
Imagine a type that is a subset of a Coll, and when we pass a Coll to its type constructor in destructs the input and construct the type base on the data that it needs only and leave the rest untouched
Hot function optimization
it would be nice for the JIT to add instrumentation to the compiled functions and detect hot functions similar to how javascript jits do it and recompile those functions with more optimization passes
Conversations
Solutions to link other libc
rather than the default
From my discassion with lhames
I can think of a few approaches with different trade-offs:
- Link your whole JIT (including LLVM) against musl rather than the default – JIT'd code uses the desired libc, there's only one libc in the JIT'd process, but the cost is high (perhaps prohibitive, depending on your constraints)
- JIT out-of-process – JIT (including LLVM) uses default libc and is compiled only once, executor links the (alternative) desired libc at compile time and must be compiled each time that you want to change it – JIT'd code uses the desired libc, there's only one libc in the JIT'd process, but the config is involved (requires a cross-process setup)
- JIT in process, link desired libc via JIT – Easy to set up, but now you've got two libcs in the process. I've never tested that config. It might just work, it might fail at link or runtime in weird ways.
TODOs
Strings
TODO How to concat to strings in a functional and immutable way?
Should we include an pointer to another string???
TODO
Create Catch2
generators to be used in tests. Specially for the reader
tests
TODO Investigate possible implementanion for Internal Errors
- An option is to use llvm registry functionality like the one used in
clang-doc
instead oferrorVariants
var.
TODO
In SereneContext::getLatestJITDylib
function, make sure that the JITDylib is still valid
Make sure that the returning Dylib still exists in the JIT
by calling jit->engine->getJITDylibByName(dylib_name);
TODO
Provide the CLI arguments to pass the createTargetMachine
.
We need a way to tweak the target machine object. It's better to provide cli tools to do so.
TODO Walk the module and register the symbols with the engine (lazy and nonlazy) JIT
TODO Change the compilation layer to accept MLIR modules instead of LLVM IR JIT
This way we can fine tune MLIR's passes based on the JIT settings as well
TODO Create a pass to rename functions to include the ns name
TODO
Use const
where ever it makes sense
TODO Create different pass pipeline for different compilation phases
So we can use them directly via command line, like -O1 for example
TODO Investigate the huge size of serenec
So far it seems that the static linking and the lack of tree shaking is the issue
DONE
Add the support for ns-paths
serenecli context
CLOSED: [2021-09-25 Sat 19:22]
- State "DONE" from "TODO" [2021-09-25 Sat 19:22]
We need to add the support for an array of paths to lookup namespaces. The ns-paths
should
be an array that each entry represents a path which serene has to look into in order to find
a namespace. For instance, when serene wants to load the foo.bar
namespace, it should walk
the paths in ns-paths
and look for that ns. Similar to classpath
in the JVM or LOAD_PATH
in python.
- Add the support to the Context.
- Add the support to Namespace.
- Add the cli argument to the
bin/serene.cpp
TODO Error handling
Create proper error handling for the internal infra
TODO
Replace llvm::outs()
with debug statements
TODO Move the generatable logic out of its files and remove them
TODO Add a CLI option to get any extra pass
TODO
Add support for sourcemgr
for input files
TODO Language Spec DOCS
TODO A proper List implementation
TODO Vector implementation
TODO Hashmap implementation
TODO Meta data support
TODO Docstring support DOCS
- For functions and macros
- For namespaces and projects
- API to interact with docstrings and helps
TODO FFI interface
TODO nREPL
TODO Emacs mode Misc
TODO Number implementation
TODO String implementation
TODO Enum implementation
TODO Protocol
TODO Struct implementation
TODO Multi arity functions
TODO QuasiQuotation
TODO Linter Misc
TODO Document generator DOCS Misc
TODO Spec like functionality
TODO Laziness implementation
TODO Investigate the Semantic Error tree and tracking
Basically we should be able to create an error tree on semantic analysis time and trace semantic errors on different layers and intensively. Is it a good idea ?