George Mason University
George Mason University Mason
George Mason University

A Data-driven Evolutionary Algorithm for Mapping Multi-basin Protein Energy Landscapes

by Amarda Shehu

Publication Details MORE LESS

  • Published Date: September 3, 2015
  • Volume/Issue: 22/9
  • Publisher: Mary Ann Liebert Inc

Abstract

Evidence is emerging that many proteins involved in proteinopathies are dynamic molecules switching between stable and semistable structures to modulate their function. A detailed understanding of the relationship between structure and function in such molecules demands a comprehensive characterization of their conformation space. Currently, only stochastic optimization methods are capable of exploring conformation spaces to obtain sample-based representations of associated energy surfaces. These methods have to address the fundamental but challenging issue of balancing computational resources between exploration (obtaining a broad view of the space) and exploitation (going deep in the energy surface). We propose a novel algorithm that strikes an effective balance by employing concepts from evolutionary computation. The algorithm leverages deposited crystal structures of wildtype and variant sequences of a protein to define a reduced, low-dimensional search space from where to rapidly draw samples. A multiscale technique maps samples to local minima of the all-atom energy surface of a protein under investigation. Several novel algorithmic strategies are employed to avoid premature convergence to particular minima and obtain a broad view of a possibly multibasin energy surface. Analysis of applications on different proteins demonstrates the broad utility of the algorithm to map multibasin energy landscapes and advance modeling of multibasin proteins. In particular, applications on wildtype and variant sequences of proteins involved in proteinopathies demonstrate that the algorithm makes an important first step toward understanding the impact of sequence mutations on misfunction by providing the energy landscape as the intermediate explanatory link between protein sequence and function.

Other Contributors

Rudy Clausen
Expertise