Peter Stadler: Dynamic Programming for Lazy Bastards

Posted on 17.06.2015

Vortragender: Peter Stadler von der Universität Leipzig
Location: EI 2 Pichelmayer HS


Dynamic Programming (DP) Algorithms are very common in bioinformatics. Key
computational problems in the field, such as sequence comparison,
RNA structure prediction, and many aspects of phylogenetics admit
exact DP solutions. Although conceptually simple, algorithms applicable to
practical scenaria are complicated by incorporating much biological detail,
and thus are tedious to implement and test. Algebraic dynamic programming
simplifies algorithm development through a strict
separating the generation of the search space, an evaluation algebra, and
a choice function. Since state space traversal is represented by grammars
in ADP, the framework is large restricted to strings and ordered trees,
and practical implementations are usually focussed on context-free grammars.

I will report on several recent advances. (1) Products of grammars allow
the efficient construction of multi-tape grammars from very simple low-
dimensional building blocks. In particular, complex alignment algorithms
can designed very easily in this manner. (2) The ideas of ADP readily
generalize to arbitrary Multi-Context-Free Grammars. This extension covers
in particular the extremely complicated algorithms solving RNA folding
problems for so-called pseudoknotted structures. (3) Re-interpreting the
notion of parsing (of strings) as partitioning of essentially arbitrary
data objects makes it possible to use the ideas of ADP on general
combinatorial data structures. In this general framework it becomes
clear that backward (in a HMM setting) outside (in a CFG setting)
recursions are uniquely determined by the easier forward/inside version
of the DP algorithms, i.e., by the generalized grammar defining the rules
of search space generation. The outside algorithm thus can be mechanically
computed from the inside “grammar” and does not need to be developed
separately. As an application we show how this can be used to compute
probability distributions over Hamiltonian paths. The key advantage of this
approach is that it becomes very simple and efficient in practise to
develop prototypical algorithms and in particular to modify the detailed
specification of the underlying models with very little effort.

The presentation will report joint work with
Christian Hoener zu Siederdissen, Sonja J. Prohaska, Maik Riechert, and
Johannes Waldmann.