Beschreibung

vor 9 Jahren
Searching sequence databases and building 3D models for proteins
are important tasks for biologists. When the structure of a query
protein is given, its function can be inferred. However,
experimental methods for structure prediction are both expensive
and time consuming. Fully automatic homology modeling refers to
building a 3D model for a query sequence from an alignment to
related homologous proteins with known structure (templates) by a
computer. Current prediction servers can provide accurate models
within a few hours to days. Our group has developed HHpred, which
is one of the top performing structure prediction servers in the
field. In general, homology based structure modeling consists of
four steps: (1) finding homologous templates in a database, (2)
selecting and (3) aligning templates to the query, (4) building a
3D model based on the alignment. In part one of this thesis, we
will present improvements of step (2) and (4). Specifically,
homology modeling has been shown to work best when multiple
templates are selected instead of only a single one. Yet, current
servers are using rather ad-hoc approaches to combine information
from multiple templates. We provide a rigorous statistical
framework for multi-template homology modeling. Given an alignment,
we employ Modeller to calculate the most probable structure for a
query. The 3D model is obtained by optimally satisfying spatial
restraints derived from the alignment and expressed as probability
density functions. We find that the query’s atomic distance
restraints can be accurately described by two-component Gaussian
mixtures. Moreover, we derive statistical weights to quantify the
redundancy among related templates. This allows us to apply the
standard rules of probability theory to combine restraints from
several templates. Together with a heuristic template selection
strategy, we have implemented this approach within HHpred and could
significantly improve model quality. Furthermore, we took part in
CASP, a community wide competition for structure prediction, where
we were ranked first in template based modeling and, at the same
time, were more than 450 times faster than all other top servers.
Homology modeling heavily relies on detecting and correctly
aligning templates to the query sequence (step (1) and (3) from
above). But remote homologies are difficult to detect and hard to
align on a pure sequence level. Hence, modern tools are based on
profiles instead of sequences. A profile summarizes the
evolutionary history of a given sequence and consists of position
specific amino acid probabilities for each residue. In addition to
the similarity score between profile columns, most methods use
extra terms that compare 1D structural properties such as secondary
structure or solvent accessibility. These can be predicted from
local profile windows. In the second part of this thesis, we
develop a new score that is independent of any predefined
structural property. For this purpose, we learn a library of 32
profile patterns that are most conserved in alignments of remotely
homologous, structurally aligned proteins. Each so called “context
state” in the library consists of a 13-residue sequence profile. We
integrate the new context score into our Hmm-Hmm alignment tool
HHsearch and improve especially the sensitivity and precision of
difficult pairwise alignments significantly. Taken together, we
introduced probabilistic methods to improve all four main steps in
homology based structure prediction.

Kommentare (0)

Lade Inhalte...

Abonnenten

15
15
:
: