14 KiB

Raw Permalink Blame History

Serene Development

Serene's Development Resources
Resources
Considerations
- Hashmaps
  - DOS attack
Ideas
- Destructure types
- Hot function optimization
Conversations
- Solutions to link other libc rather than the default
TODOs

Serene's Development Resources

This document is dedicated to the process of developing Serene. It contains a collection of resources from the early days of the project and resources that need to be studied and A list of tasks and features that needs to be done. This document is written using org-mode. You can use this cheatsheet as a quick guide for the format but you will get more out of it using org-mode.

Resources

For a generic list of resources on compiler design take a look at the list of resource to create a programming language and this list

Parser

First of all you need to read All you need to know about Parser algorithms. Then here is the list or parsers that we have considered

Lisp

Make a Lisp

Quasiquotation

Compilers

https://bernsteinbear.com/blog/compiling-a-lisp-0/

Rust

The Rust book (in EPUB format)

LLVM

Data structures

Memory management

Concurrency

Scheduling In Go (Series)

Garbage collection GC

Boehm GC Tool

MPS Tool

MMTK Tool

Whiro Tool

This is not GC but a tool to debug GC and memory allocation.

JIT

Optimizations

Canonicalization

Compiler

Stack frame layout on x86-64

Branch instructions

It would be cool to have macro to instruct the compiler about the likelyhood of a branch in a conditional. Something similar to kernel's likely and unlikely macros

How to learn compilers: LLVM Edition

https://lowlevelbits.org/how-to-learn-compilers-llvm-edition/

Pointers Are Complicated III, or: Pointer-integer casts exposed

https://www.ralfj.de/blog/2022/04/11/provenance-exposed.html

Execution Instrumentation

The compiler should be able to embed some code in the program to collect data about the different execution paths or function instrumentation and other useful data the can help the compiler to optimize the program even further. For example Imagine a scenario which we compile a program with out any optimization ( in debug mode ) and using some test cases or real usage of the program in several iteration we collect data about the compiled application in a file (let's call it the ADF short for Analytic Data File), and the we can pass that ADF file to the compiler to let it compile and optimize the program by using the usual passes alonge side with some extra passes that operate on ADF

Lang

Scheme

Utilities

Pointers Are Complicated

Emacs mode

Linker

LLVM

LLVM Internals

TableGen

Create a backend

Toolchain

https://llvm.org/docs/BuildingADistribution.html

Cross compilation

https://blog.gibson.sh/2017/11/26/creating-portable-linux-binaries/#some-general-suggestions A nice to read article on some of the common problems when linking statically with none default libc or libc++

Useful courses and resources

Considerations

Hashmaps

DOS attack

Ideas

Destructure types

Imagine a type that is a subset of a Coll, and when we pass a Coll to its type constructor in destructs the input and construct the type base on the data that it needs only and leave the rest untouched

Hot function optimization

it would be nice for the JIT to add instrumentation to the compiled functions and detect hot functions similar to how javascript jits do it and recompile those functions with more optimization passes

Conversations

Solutions to link other `libc` rather than the default

From my discassion with lhames

I can think of a few approaches with different trade-offs:

Link your whole JIT (including LLVM) against musl rather than the default – JIT'd code uses the desired libc, there's only one libc in the JIT'd process, but the cost is high (perhaps prohibitive, depending on your constraints)

JIT out-of-process – JIT (including LLVM) uses default libc and is compiled only once, executor links the (alternative) desired libc at compile time and must be compiled each time that you want to change it – JIT'd code uses the desired libc, there's only one libc in the JIT'd process, but the config is involved (requires a cross-process setup)

JIT in process, link desired libc via JIT – Easy to set up, but now you've got two libcs in the process. I've never tested that config. It might just work, it might fail at link or runtime in weird ways.

TODOs

Strings

TODO How to concat to strings in a functional and immutable way?

Should we include an pointer to another string???

TODO Create `Catch2` generators to be used in tests. Specially for the `reader` tests

TODO Investigate possible implementanion for Internal Errors

An option is to use llvm registry functionality like the one used in clang-doc instead of errorVariants var.

TODO In `SereneContext::getLatestJITDylib` function, make sure that the JITDylib is still valid

Make sure that the returning Dylib still exists in the JIT by calling jit->engine->getJITDylibByName(dylib_name);

TODO Provide the CLI arguments to pass the `createTargetMachine`.

We need a way to tweak the target machine object. It's better to provide cli tools to do so.

TODO Walk the module and register the symbols with the engine (lazy and nonlazy) JIT

TODO Change the compilation layer to accept MLIR modules instead of LLVM IR JIT

This way we can fine tune MLIR's passes based on the JIT settings as well

TODO Create a pass to rename functions to include the ns name

TODO Use `const` where ever it makes sense

TODO Create different pass pipeline for different compilation phases

So we can use them directly via command line, like -O1 for example

TODO Investigate the huge size of serenec

So far it seems that the static linking and the lack of tree shaking is the issue

DONE Add the support for `ns-paths` serenecli context

CLOSED: [2021-09-25 Sat 19:22]

State "DONE" from "TODO" [2021-09-25 Sat 19:22]

We need to add the support for an array of paths to lookup namespaces. The ns-paths should be an array that each entry represents a path which serene has to look into in order to find a namespace. For instance, when serene wants to load the foo.bar namespace, it should walk the paths in ns-paths and look for that ns. Similar to classpath in the JVM or LOAD_PATH in python.

Add the support to the Context.
Add the support to Namespace.
Add the cli argument to the bin/serene.cpp

TODO Error handling

Create proper error handling for the internal infra

TODO Replace `llvm::outs()` with debug statements

TODO Move the generatable logic out of its files and remove them

TODO Add a CLI option to get any extra pass

TODO Add support for `sourcemgr` for input files

TODO Language Spec DOCS

TODO A proper List implementation

TODO Vector implementation

TODO Hashmap implementation

TODO Meta data support

TODO Docstring support DOCS

For functions and macros
For namespaces and projects
API to interact with docstrings and helps

TODO FFI interface

TODO nREPL

TODO Emacs mode Misc

TODO Number implementation

TODO String implementation

TODO Enum implementation

TODO Protocol

TODO Struct implementation

TODO Multi arity functions

TODO QuasiQuotation

TODO Linter Misc

TODO Document generator DOCS Misc

TODO Spec like functionality

TODO Laziness implementation

TODO Investigate the Semantic Error tree and tracking

Basically we should be able to create an error tree on semantic analysis time and trace semantic errors on different layers and intensively. Is it a good idea ?

14 KiB Raw Permalink Blame History Unescape Escape

Serene Development

Serene's Development Resources

Resources

Parser

Lisp

Quasiquotation

Compilers

Rust

LLVM

Data structures

Other languages

Cranelift

Type Systems

Memory management

Concurrency

Garbage collection GC

Boehm GC Tool

MPS Tool

MMTK Tool

Whiro Tool

JIT

Optimizations

Compiler

Branch instructions

How to learn compilers: LLVM Edition

Pointers Are Complicated III, or: Pointer-integer casts exposed

Execution Instrumentation

Lang

Scheme

Utilities

Emacs mode

Linker

LLVM

TableGen

Toolchain

Cross compilation

Useful courses and resources

Considerations

Hashmaps

DOS attack

Ideas

Destructure types

Hot function optimization

Conversations

Solutions to link other libc rather than the default

TODOs

Strings

TODO How to concat to strings in a functional and immutable way?

TODO Create Catch2 generators to be used in tests. Specially for the reader tests

TODO Investigate possible implementanion for Internal Errors

TODO In SereneContext::getLatestJITDylib function, make sure that the JITDylib is still valid

TODO Provide the CLI arguments to pass the createTargetMachine.

TODO Walk the module and register the symbols with the engine (lazy and nonlazy) JIT

TODO Change the compilation layer to accept MLIR modules instead of LLVM IR JIT

TODO Create a pass to rename functions to include the ns name

TODO Use const where ever it makes sense

TODO Create different pass pipeline for different compilation phases

TODO Investigate the huge size of serenec

DONE Add the support for ns-paths serenecli context

TODO Error handling

TODO Replace llvm::outs() with debug statements

TODO Move the generatable logic out of its files and remove them

TODO Add a CLI option to get any extra pass

TODO Add support for sourcemgr for input files

TODO Language Spec DOCS

TODO A proper List implementation

TODO Vector implementation

TODO Hashmap implementation

TODO Meta data support

TODO Docstring support DOCS

TODO FFI interface

TODO nREPL

TODO Emacs mode Misc

TODO Number implementation

TODO String implementation

TODO Enum implementation

TODO Protocol

TODO Struct implementation

TODO Multi arity functions

14 KiB

Raw Permalink Blame History

Solutions to link other `libc` rather than the default

TODO Create `Catch2` generators to be used in tests. Specially for the `reader` tests

TODO In `SereneContext::getLatestJITDylib` function, make sure that the JITDylib is still valid

TODO Provide the CLI arguments to pass the `createTargetMachine`.

TODO Use `const` where ever it makes sense

DONE Add the support for `ns-paths` serenecli context

TODO Replace `llvm::outs()` with debug statements

TODO Add support for `sourcemgr` for input files