## Research Overview

The biological sciences are undergoing a revolutionary change characterized by the emergence of experimental techniques that massively produce quantitative biological data waiting for interpretation. The broad goal of my research is to develop new methods to extract biological knowledge from such data. These methods will be derived using statistical physics tools since statistical physics provides a natural framework for the interpretation of high volume data. I am focusing on problems involving sequence data and RNA molecules. All the problems I study can be described in terms of physical systems with a complex energy landscape. Thus, beyond providing solutions of practical biological problems, the study of these systems leads to new insights into the statistical physics of disordered systems.

One particular area of my interest is **sequence alignment**. Sequence alignment is a computational technique which tries to identify similarities between two sequences which can be for example DNA or protein sequences. Sequence alignment is the most widely used computational tool in molecular biology and the identification of any newly sequenced gene depends on the accuracy with which the alignment can be performed. The crucial problem which sequence alignment shares with many of the common computational tools of molecular biology is that it will produce a result for any pair of sequences compared. Thus, the quality of a result has to be assessed in order to make a decision about its biological relevance. This is usually done by quoting the probability of getting the alignment just by chance. Being able to do this thus amounts to characterizing the background, i.e., the alignments of random sequences. Characterizing the behavior of a random system is clearly the domain of statistical physics and several of its tools can be used to tackle this problem. For sequence alignment many universal features can be studied by using a mapping between sequence alignment and surface growth as described by the Kardar-Parisi-Zhang equation. Additionally a newly established link between the Asymmetric Exclusion Process - one of the best studied exactly solvable systems of non equilibrium statistical physics - and sequence alignment can be exploited. In this context many explicit results addressing the problem of assessment of the statistical significance can be obtained while on the other hand calculating new interesting quantities for the corresponding physical systems.

The other system of my current interest is **RNA secondary structure formation**. RNA is a single-stranded heteropolymer which can gain energy by folding onto itself and forming "base pairs" between its monomers. Since only certain pairs of monomers are allowed, it is a non trivial problem to find the optimal fold for a given sequence of monomers. Thus, RNA secondary structure formation is a problem which shows many parallels to the popular protein folding problem - it can indeed be understood as the Hartree approximation of the protein folding problem. While the phenomenology of RNA secondary structure formation is very similar to protein folding, it has the enormous advantage that the bioinformatics community has worked out an algorithm to fold RNA in polynomial time. This makes it much more readily available to numerical studies. Interpreting the folding algorithm as diagrammatic equations also enables analytical approaches to the structure formation problem. Thus, RNA secondary structures are an ideal model system for studying concepts of heteropolymer structure formation. Characterizing the different phases and phase transitions involved numerically and analytically is an ongoing project. On the other hand, quantitative modeling of recent force-extension experiments on RNA molecules provides a foundation on which these experiments can be interpreted and new experimental ideas be explored.

**Recent Publications**

See a full list of Dr. Bundschuh's publications here.