We propose a general framework for change-of-representation based on searching for alternative representations to improve the accuracy of an underlying induction algorithm. Representations are selected as candidates by querying a strategy component, which relies on domain knowledge to suggest which alternatives to search. An evaluation component then compares these representations by applying each one to a set of examples and running the induction algorithm on the transformed examples to empirically determine the effect of the change on accuracy. This approach provides solutions to the two characteristic problems of learning in real-world domains. First, domain knowledge is used as a heuristic to guide the search for alternative representations, enabling more intelligent decisions during change-of-representation. Second, the framework provides a flexible role for knowledge that can be used with any learning algorithm and is tolerant of uncertainty.
We apply our framework for change-of-representation to the difficult, real-world domain of protein structure prediction. The best computational method to date for determining the structure of a protein from its amino acid sequence is homology modeling, which is based on sequence alignments with a protein database. Homology modeling can fail in cases where the sequence similarity is low between proteins with similar structures. However, the physical and chemical properties of amino acids are believed to relevant to protein structure. Using an implementation of our framework, we incorporate this domain knowledge to suggest ways to change the representation of amino acid sequences. Efficient search procedures are derived that lead the discovery of representations that improve the ability to predict protein structures by homology modeling.