Lossless data compression is a classic research area of computer science. We use data compression not only to reduce the amount of space needed for big data structures, but also to accelerate algorithms or to find patterns in data. Oftentimes it is important that algorithms may operate directly on the compressed data (i.e.~without prior decompression). This is possible by using so-called grammar-based methods.
In this thesis we focus on the analysis of different grammar-based methods of tree compression.
In the first part of the thesis we prove that the average size of the DAG (directed acyclic graph) of a full binary tree with n inner nodes is asymptotically equal to (cn)/(sqrt(log n)) (where c is a constant). The DAG of a tree can be understood as a particular tree grammar, a so-called regular tree grammar. This result has only been sketched prior in the literature. We then generalize the proof to unranked trees and to a labelled setting. Then we compare the worst-case size of the DAG of a tree with the worst-case size of the so-called BDAG (for Binary DAG) of that tree. For this we introduce the HDAG (for Hybrid DAG), which is also interesting as a compression algorithm on its own. Then we discuss the use of grammar-based string compression for the construction of tree grammars. In the first method we compress certain sequences that arise during DAG compression and in the second method we use grammar-based string compression on the so-called traversal of a tree, which is the sequence of the tree’s node labels when the tree is traversed in depth-first/left-to-right manner. We call these kind of grammars traversal SLP (SLP stands for straight-line program, which are context-free grammars that generate precisely one word) and show that there exist trees such that the smallest traversal grammars for those trees are exponentially smaller than a smallest so-called tree straight-line program, which is a common grammar-based tree compression scheme. Next we discuss algorithms for compressed trees.
Finally we show a case in which tree compression can accelerate computations. We show by experiments that automatic termination proving of term rewriting systems is faster if tree compression on the involved terms is used as an intermediate step.