Systems and
Formalisms Lab

Articles tagged with regex

  1. SpecMerger

    An introduction to SpecMerger, a tree-diffing tool designed to facilitate mechanized specification audits.

    Specifications (such as RFCs, programming language standards) are typically written in natural language. To formally verify an implementation of such a spec, we need two things: a mechanization of the spec (in a theorem prover like Coq, Lean, Dafny, …), and a proof of correctness relating the implementation to the mechanized spec.

    The correctness of the proof is guaranteed by the theorem prover. But who checks the correctness of the mechanization? The original spec is in natural language, so the best we can do is an audit. To make these audits easier, some specs are written in a “literate” style, with comments from the spec interleaved throughout the mechanization. This helps a lot, but… who checks the comments? In this post, I present SpecMerger, a tool that checks that literate comments match the spec they came from.

  2. Adding linear-time lookbehinds to RE2

    An annotated guide to adding captureless lookbehinds to the RE2 linear-time regex engine.

    Modern regexes are fiendishly complex: beyond traditional features like alternations and loops, they include complex constructs such as backreferences, possessive groups, recursion, conditionals, or bounded (and sometimes even unbounded!) lookarounds.

    Surprisingly, the complexity characteristics of these modern regex features are not fully known. Take lookarounds, for example: until recently, if you wanted a regex flavor that supported lookarounds, you had to use a backtracking-based algorithm.