serene/dev.org

14 KiB
Raw Permalink Blame History

Serene Development

Serene's Development Resources

This document is dedicated to the process of developing Serene. It contains a collection of resources from the early days of the project and resources that need to be studied and A list of tasks and features that needs to be done. This document is written using org-mode. You can use this cheatsheet as a quick guide for the format but you will get more out of it using org-mode.

Resources

For a generic list of resources on compiler design take a look at the list of resource to create a programming language and this list

Rust

Cranelift

Garbage collection   GC

Boehm GC   Tool

MPS   Tool

MMTK   Tool

Whiro   Tool

This is not GC but a tool to debug GC and memory allocation.

Optimizations

Compiler

Branch instructions

It would be cool to have macro to instruct the compiler about the likelyhood of a branch in a conditional. Something similar to kernel's likely and unlikely macros

Pointers Are Complicated III, or: Pointer-integer casts exposed

Execution Instrumentation

The compiler should be able to embed some code in the program to collect data about the different execution paths or function instrumentation and other useful data the can help the compiler to optimize the program even further. For example Imagine a scenario which we compile a program with out any optimization ( in debug mode ) and using some test cases or real usage of the program in several iteration we collect data about the compiled application in a file (let's call it the ADF short for Analytic Data File), and the we can pass that ADF file to the compiler to let it compile and optimize the program by using the usual passes alonge side with some extra passes that operate on ADF

Cross compilation

Ideas

Destructure types

Imagine a type that is a subset of a Coll, and when we pass a Coll to its type constructor in destructs the input and construct the type base on the data that it needs only and leave the rest untouched

Hot function optimization

it would be nice for the JIT to add instrumentation to the compiled functions and detect hot functions similar to how javascript jits do it and recompile those functions with more optimization passes

Conversations

Solutions to link other libc rather than the default

From my discassion with lhames

I can think of a few approaches with different trade-offs:

  • Link your whole JIT (including LLVM) against musl rather than the default JIT'd code uses the desired libc, there's only one libc in the JIT'd process, but the cost is high (perhaps prohibitive, depending on your constraints)
  • JIT out-of-process JIT (including LLVM) uses default libc and is compiled only once, executor links the (alternative) desired libc at compile time and must be compiled each time that you want to change it JIT'd code uses the desired libc, there's only one libc in the JIT'd process, but the config is involved (requires a cross-process setup)
  • JIT in process, link desired libc via JIT Easy to set up, but now you've got two libcs in the process. I've never tested that config. It might just work, it might fail at link or runtime in weird ways.

TODOs

Strings

TODO How to concat to strings in a functional and immutable way?

Should we include an pointer to another string???

TODO Create Catch2 generators to be used in tests. Specially for the reader tests

TODO Investigate possible implementanion for Internal Errors

  • An option is to use llvm registry functionality like the one used in clang-doc instead of errorVariants var.

TODO In SereneContext::getLatestJITDylib function, make sure that the JITDylib is still valid

Make sure that the returning Dylib still exists in the JIT by calling jit->engine->getJITDylibByName(dylib_name);

TODO Provide the CLI arguments to pass the createTargetMachine.

We need a way to tweak the target machine object. It's better to provide cli tools to do so.

TODO Walk the module and register the symbols with the engine (lazy and nonlazy)   JIT

TODO Change the compilation layer to accept MLIR modules instead of LLVM IR   JIT

This way we can fine tune MLIR's passes based on the JIT settings as well

TODO Create a pass to rename functions to include the ns name

TODO Use const where ever it makes sense

TODO Create different pass pipeline for different compilation phases

So we can use them directly via command line, like -O1 for example

TODO Investigate the huge size of serenec

So far it seems that the static linking and the lack of tree shaking is the issue

DONE Add the support for ns-paths   serenecli context

CLOSED: [2021-09-25 Sat 19:22]

  • State "DONE" from "TODO" [2021-09-25 Sat 19:22]

We need to add the support for an array of paths to lookup namespaces. The ns-paths should be an array that each entry represents a path which serene has to look into in order to find a namespace. For instance, when serene wants to load the foo.bar namespace, it should walk the paths in ns-paths and look for that ns. Similar to classpath in the JVM or LOAD_PATH in python.

  • Add the support to the Context.
  • Add the support to Namespace.
  • Add the cli argument to the bin/serene.cpp

TODO Error handling

Create proper error handling for the internal infra

TODO Replace llvm::outs() with debug statements

TODO Move the generatable logic out of its files and remove them

TODO Add a CLI option to get any extra pass

TODO Add support for sourcemgr for input files

TODO Language Spec   DOCS

TODO A proper List implementation

TODO Vector implementation

TODO Hashmap implementation

TODO Meta data support

TODO Docstring support   DOCS

  • For functions and macros
  • For namespaces and projects
  • API to interact with docstrings and helps

TODO FFI interface

TODO nREPL

TODO Emacs mode   Misc

TODO Number implementation

TODO String implementation

TODO Enum implementation

TODO Protocol

TODO Struct implementation

TODO Multi arity functions

TODO QuasiQuotation

TODO Linter   Misc

TODO Document generator   DOCS Misc

TODO Spec like functionality

TODO Laziness implementation

TODO Investigate the Semantic Error tree and tracking

Basically we should be able to create an error tree on semantic analysis time and trace semantic errors on different layers and intensively. Is it a good idea ?

Standard libraries

TODO IO library

TODO Test library