Systems and
Formalisms Lab

Articles in the student-projects category

  1. Explicit Memory Reclamation in Scala

    The JVM raw interpreter doesn't perform liveness analysis, so references sometimes persist much too long. Concretely in Scala, this leads to high memory consumption by lazy lists when a parent frame retains a reference to the root of the list. We added a mechanism to the Scala compiler to erase these references.

    The following Scala code (Scala 3.8.0-RC1-bin-20250731-fe6b7eb-NIGHTLY, OpenJDK 21) - used in the EPFL CS-214 course - is a simple program that indefinitely plays a sound represented by a sine wave at 440Hz.

    It first creates the sine as a LazyList, so that it is only evaluated when asked for. It can then play it. sine is effectively infinite: until it's stopped, the program will forever play the tune!

    @main def main =
      val sine: LazyList[Sample] = loop(sinePulse(440))
      play(sine)
    

    However, this program will quickly run into an issue and throw a runtime error... Can you see why?

  2. Adding lookbehinds to rust-lang/regex

    An annotated guide to adding captureless lookbehinds to the Rust linear-time regex engine.

    In a previous blogpost, Erik wrote about how he implemented the linear time matching algorithm from [RegElk+PLDI24] in the popular regex engine RE2. In this one, we're looking at how to do the same thing for the official regex engine from the Rust language. Namely, we add support for unbounded captureless lookbehinds.

  3. Software optimization for a RISC-V Accelerator

    A story about writing high-performance code for a custom accelerator in a RISC-V CPU, with the help of some semi-automated tools.

    Writing high-performance software these days is a challenging task. Today's hardware ecosystem is highly heterogeneous, with a trend towards application-tailored solutions. Despite years of effort in automation, by and large the burden still falls on the developer to write a program in a manner that can take advantage of this hardware. Oftentimes, the solution is to use a library carefully written by a performance engineer for the specific combination of application and hardware. But why is that? What happens when there is no library?

  4. Rewriting modulo SAADD for string diagrams

    Final project report about string diagram rewriting in Coq for CS-628 Interactive Theorem Proving, Spring 2024

    tl;dr: String diagrams are an intuitive diagrammatic representation of some pretty useful math. This being Interactive Theorem Proving, of course, I want to formalize that math in a proof assistant (Coq), in such a way that formal proofs still work like paper proofs using all the nifty properties of the diagrammatic representation. And, of course, a naïve formalization runs into problems that make the system painful to use. So this project explores how we might overcome those problems and restore the convenience of string diagrams in a formal setting.

  5. Abusing the Instruction Cache to Store Data

    Some programs use their data cache extensively, while underusing their instruction cache. We embed constant data, typically stored in memory, into the immediate fields of instructions to balance to use of both caches. We illustrate the impacts of such optimizations on the performances of a target program, describing the key parameters.

  6. SpecMerger

    An introduction to SpecMerger, a tree-diffing tool designed to facilitate mechanized specification audits.

    Specifications (such as RFCs, programming language standards) are typically written in natural language. To formally verify an implementation of such a spec, we need two things: a mechanization of the spec (in a theorem prover like Coq, Lean, Dafny, …), and a proof of correctness relating the implementation to the mechanized spec.

    The correctness of the proof is guaranteed by the theorem prover. But who checks the correctness of the mechanization? The original spec is in natural language, so the best we can do is an audit. To make these audits easier, some specs are written in a “literate” style, with comments from the spec interleaved throughout the mechanization. This helps a lot, but… who checks the comments? In this post, I present SpecMerger, a tool that checks that literate comments match the spec they came from.

  7. Adding linear-time lookbehinds to RE2

    An annotated guide to adding captureless lookbehinds to the RE2 linear-time regex engine.

    Modern regexes are fiendishly complex: beyond traditional features like alternations and loops, they include complex constructs such as backreferences, possessive groups, recursion, conditionals, or bounded (and sometimes even unbounded!) lookarounds.

    Surprisingly, the complexity characteristics of these modern regex features are not fully known. Take lookarounds, for example: until recently, if you wanted a regex flavor that supported lookarounds, you had to use a backtracking-based algorithm.