Orta Doğu Teknik Üniversitesi / Fen Bilimleri Enstitüsü / Bilgisayar Mühendisliği Bölümü
A classification system for the problem of protein subcellular localization
Proteinlerin hücre içi yerleşimlerini bulmak için bir sınıflandırma sistemi
Gökçen Alay - 2007
Teze Git (tez.yok.gov.tr)Bu tezin tam metni bu sitede bulunmamaktadır. Teze erişmek için tıklayın. Eğer tez bulunamazsa, YÖK Tez Merkezi tarama bölümünde 201828 tez numarasıyla arayabilirsiniz.
Özet:
Summary:
The focus of this study is on predicting the subcellular localization of aprotein. Subcellular localization information is important for protein functionannotation which is a fundamental problem in computational biology. For thisproblem, a classication system is built that has two main parts: a predictorthat is based on a feature mapping technique to extract biologically meaningfulinformation from protein sequences and a client/server architecture for search-ing and predicting subcellular localizations. In the rst part of the thesis, wedescribe a feature mapping technique based on frequent patterns. In the featuremapping technique we describe, frequent patterns in a protein sequence datasetwere identied using a search technique based on a priori property and the dis-tribution of these patterns over a new sample is used as a feature vector forclassication. The eect of a number of feature selection methods on the classi-cation performance is investigated and the best one is applied. The method isassessed on the subcellular localization prediction problem with 4 compartments(Endoplasmic reticulum (ER) targeted, cytosolic, mitochondrial, and nuclear)and the dataset is the same used in P2SL. Our method improved the overallaccuracy to 91.71% which was originally 81.96% by P2SL. In the second partof the thesis, a client/server architecture is designed and implemented basedon Simple Object Access Protocol (SOAP) technology which provides a user-friendly interface for accessing the protein subcellular localization predictions.Client part is in fact a Cytoscape plug-in that is used for functional enrichmentof biological networks. Instead of the individual use of subcellular localizationinformation, this plug-in lets biologists to analyze a set of genes/proteins undersystem view.