Abstract : Our purpose is to develop a clinical decision support system to classify the patients’ diagnostics based on features gathered from Magnetic Resonance Imaging (MRI) and Expanded Disability Status Scale (EDSS). We studied 120 patients and 19 healthy individuals (not afflicted with MS) have been studied for this study. Healthy individuals in the control group do not have any complaint or drug use history. For the kernel trick, efficient performance in non-linear classification, the Convex Combination of Infinite Kernels model was developed to measure the health status of patients based on features gathered from MRI and EDSS. Our calculations show that our proposed model classifies the multiple sclerosis (MS) diagnosis level with better accuracy than single kernel, artificial neural network and other machine learning methods, and it can also be used as a decision support system for identifying MS health status of patients.
Abstract : This research is focused on the cooperative multi-task assignment problem for heterogeneous UAVs, where a set of multiple tasks, each requiring a predetermined number of UAVs, have to be completed at specific locations. We modeled this as an optimization problem to minimize the number of uncompleted tasks while also minimizing total airtime and total distance traveled by all the UAVs. By taking into account the UAV flight capacities. For the solution of the problem, we adopted a multi-Traveling Salesman Problem (mTSP) method  and designed a new genetic structure for it so that it can be applied to cooperative multi-task assignment problems. Furthermore, we developed two domain specific mutation operators to improve the quality of the solutions in terms of number of uncompleted tasks, total airtime and total distance traveled by all the UAVs. The simulation experiments showed that these operators significantly improve the solution quality. Our main contributions are the application of the Multi Structure Genetic Algorithm (MSGA) to cooperative multi-task assignment problem and the development of two novel mutation operators to improve the solution of MSGA.
Abstract : This study aims to publish a novel similarity metric to increase the speed of comparison operations. Also the new metric is suitable for distance-based operations among strings. Most of the simple calculation methods, such as string length are fast to calculate but does not represent the string correctly. On the other hand the methods like keeping the histogram over all characters in the string are slower but good to represent the string characteristics in some areas, like natural language. We propose a new metric, easy to calculate and satisfactory for string comparison. Method is built on a hash function, which gets a string at any size and outputs the most frequent K characters with their frequencies. The outputs are open for comparison and our studies showed that the success rate is quite satisfactory for the text mining operations.
Abstract : This paper proposes an information retrieval method for the economy news. The effect of economy news, are researched in the word level and stock market values are considered as the ground proof. The correlation between stock market prices and economy news is an already addressed problem for most of the countries. The most well-known approach is applying the text mining approaches to the news and some time series analysis techniques over stock market closing values in order to apply classification or clustering algorithms over the features extracted. This study goes further and tries to ask the question what are the available time series analysis techniques for the stock market closing values and which one is the most suitable? In this study, the news and their dates are collected into a database and text mining is applied over the news, the text mining part has been kept simple with only term frequency-inverse document frequency method. For the time series analysis part, we have studied 10 different methods such as random walk, moving average, acceleration, Bollinger band, price rate of change, periodic average, difference, momentum or relative strength index and their variation. In this study we have also explained these techniques in a comparative way and we have applied the methods over Turkish Stock Market closing values for more than a 2 year period. On the other hand, we have applied the term frequency-inverse document frequency method on the economy news of one of the high-circulating newspapers in Turkey.
Abstract: Leave Secure route planning in environments with obstacles and threats, which is of first priority for single and multi-Unmanned Aerial Vehicles (UAVs), is the main focus of this study. Planning the optimum route by using genetic algorithms by considering kinematic constraints and terrain conditions is investigated. The terrain used in this study consists of real 3D satellite data of NASA. Thus, access to altitude data is achieved and mountains over the terrain are detected. Real geographic coordinate system is used and the geometric shape of the earth is considered for high precision calculations. While global route planning is made by the GA, local route planning is considered for transitions between waypoints. According to experimental result, usage of multi-UAV brings a great benefit in the fulfillment of the missions in terms of time and performance.
Abstract: Depending on the market strength and structure, it is a known fact that there is a correlation between the stock market values and the content in newspapers. The correlation increases in weak and speculative markets, while they never get reduced to zero in the strongest markets. This research focuses on the correlation between the economic news published in a highly circulating newspaper in Turkey and the stock market closing values in Turkey. In the research several feature extraction methodologies are implemented on both of the data sources, which are the stock market values and economic news. Since the economic news is in natural language format, the text mining technique, term frequency-inverse document frequency is implemented. On the other hand, the time series analysis methods like random walk, Bollinger band, moving average or difference are applied over the stock market values. After the feature extraction step, the classification methods are built on the well-known classifiers support vector machine, k-nearest neighborhood and decision tree. Moreover, an ensemble classifier based on majority voting is implemented on top of these classifiers. The success rates show that the results are satisfactory to claim the methods implemented in this study can be spread to future research with similar data sets from other countries.
Abstract: Aim of this study is applying the ensemble classification methods over the stock market closing values, which can be assumed as time series and finding out the relation between the economy news. In order to keep the study background clear, the majority voting method has been applied over the three classification algorithms, which are the k-nearest neighborhood, support vector machine and the C4.5 tree. The results gathered from two different feature extraction methods are correlated with majority voting meta classifier (ensemble method) which is running over three classifiers. The results show the success rates are increased after the ensemble at least 2 to 3 percent success rate.
Abstract: Author detection is a challenging problem for articles on the Internet (forums, blogs, etc.) due to lack of substantial content. Therefore, author detection for these articles is much more difficult than books, reports and other documents. In this study, the authors of newspaper articles are categorized. We tried to detect authors of articles of test data by using frequently used words in these categories. In doing so, we applied the improved Naïve Bayesian machine learning algorithm.
Abstract: In this study, a methodology detecting the radar make caught in real time during the flight or pre-flight route planning has been developed for an autonomous unmanned aerial vehicles (UAVs). The proposed method and genetic algorithms are implemented in parallel and duration is reduced to a large extent. The developed methodology can provide fast and safe routes for autonomous single or multiple UAV or operator-assisted flight.
Abstract: This is the preliminary work for a project which will be filtering comments made on news and papers automatically. Our database has over 1 million news and comments. Due to the intensity of our data, 30.677 comments made on 15.064 articles on 44 different categories are used as experimental data. Proposed anomaly based method have been obtained fast and high accuracy results without the high storage requirement and high computational complexity with respect to other classification based methods on literature.
Abstract: In this paper, a new algorithm related with feature selection method mostly used in data mining, machine learning and pattern recognition areas is proposed. Classical Fukunaga-Koontz Transform is extended to a binary kernel classifier. We used cDNA microarrays to assess 11.000 gene expression profiles in 60 human cancer cell lines used in a drug discovery screen by the National Cancer Institute and Diffuse large B-cell lymphoma data including 62 cells and more than 4.000 genes. Proposed two stage algorithm applied on NCI60 and LYM dataset is compared with other feature selection models in details.
Abstract: Although there are many studies on computer-aided drug design in recent years, determination of proteins for drug candidates is a remarkable area for research. The first major shortcoming of this kind of problems is the feature selection representing the protein structure best, the former one is the computational complexity. We use three datasets with different sizes such as Cherkasov dataset with 2684 examples including over 160 descriptors, sdf formatted DrugDataBank dataset with 7440 examples including over 300 descriptors and Pharmeks Company’s real drug database having over 250.000 samples. A statistical multiple relief algorithm is developed in order to measure the quality of the attributes and to reduce the dimension of the dataset. We applied a new approach working on subspaces of dataset called as incremental decremental kernel learning model. As a result, we found that our new approach has better accuracy and lower computational complexity than the other traditional supervised algorithms.
Abstract: In this thesis, new kernel based supervised learning algorithms such string kernels, incremental kernel learning moedels, new graph based semi-supervised learning models, a feature reduction model which is combination of different feature selection methods measuring the quality of features in order to get rid of noisy and redundant data and Hidden Markov Model based new kernel machine method to predict the structure of protein and function classification is proposed. Firstly, four different drug datasets, methodologies and the studies related with computer aided drug design are given, and immediately other protein databases, disease data, cancer datasets, some high-dimensional data on literature are presented. In the next section, the learning models is discussed briefly. In fourth chapter, matematical background of linearly separable and soft margin kernel based Support Vector Machines are introduced. Newly proposed kernel models are given in two areas as string kernels and incremental kernel learning algorithms. The performance of these methods on above datasets introduced in the second chapter is examined in details. Especially in this chapter, the performance of the proposed kernel models on other machine learning repository is also tested and analyzed. In the next chapter, three main semi-supervised learning model to compare in our experiments and new active semi-supervised learning model dealt with details. Finally, PHA-kernel, Pairwse Hidden Markov Models Alignment kernel which is based on Hidden Markov Models mostly used in protein classification and alignment scoring by dynamic programming is mathematically defined. We use finite state machines instead of using protein sequence structures in this model. The accuracy of all proposed learning models are given by using different kernel regularization parameter, penalty parameter, slack variables and other kernel parameters Training and test errors of our algorithms are compared with other learning models in details.
Abstract: The purpose of this paper is primarily to serve self- contained introductory ideas behind QSAR based Drug Design and to give overview of newly introduced semi-supervised learning approaches and eventually to present results of semi supervised learning framework. Our experimental results show that Gaussian Random Field Method (GRFM) gives better accuracy then traditional semi supervised based algorithms and other learning procedures on Drug Design. GRFM is also an appropriate model for huge data learning because of needing less number of labeled data to classify.
Abstract: The purpose of this paper is primarily to serve self-contained introductory ideas behind feature selection methodology and kernel methods. Using these ideas it is presented the newly improved attribute subset selection criteria with details by combining ReliefF which is successful attribute estimator and mRMR which is used for selecting the attributes having highest relevance and minimal redundancy for the target class. We compare the proposed selection criteria with other seven different feature selection criteria (ReliefF, mRMR, F-statistics, F-mRMR, GNSR, A- Optimization, D-Optimization) with appropriate kernel approaches and their optimal parameters.
Abstract: Although protien classification for Drug design is one of the most widely studied area in the past few years, it is difficult to obtain high accuracy. We used a feature weighting algorithm in order to represent the whole needed feature set. Because of scarce labeled data and high computational complexity of supervised learning methods, a new semi-supervised learning algorithm extended from Gaussian Random Field methodology combined with active query learning is developed. The proposed approach is applied to newly extracted data from DrugBank database contains nearly 4800 drug entries including FDA approved drugs and synthetic drug and 2640 non-drug proteins. We found that our new approach has better accuracy then the other traditional semi-supervised methods and lower computational complexity than the supervised methods.
Abstract: Daha önceki derslerde öğretilmesi ihmal edilmiş, ders içeriklerine konulmamış gerekli mesleki standardların(ISO, IEEE-Std) öğretilmesi, Amaç-2: “Yazılım Mühendisliğine Giriş” temel kavramlarının verilmesi, Amaç-3: bizim alanımızda onyıllardır kullanılmış ve kullanılmaya devam eden “Yazılım Kulubeleri (Software Hut)” yaklaşımı ile öğrencilere takım halinde yazılım geliştirme alışkanlığı kazandırma.. Bu alanda kavramlar karıştığında, 1991 de CMU’da Mary Shaw & James Tomayko’nun  beş altkümeye ayırarak sınıflamış olduğu “yazılım mühendisliği ders projesi” ni bu beş düzeyin ortadaki üçüncü öğesi biçiminde vermek benimsediğimiz yoldur, Amaç-4: “Yazılım Üretkenliği”(Software Productivity) ve “Yazılım Kalitesi”(Software Quality) ölçümlerini en az üç yıl sure ile yaptıktan sonra yayınlamaya başlamaktır.
Abstract: Protein function prediction  is one of the most important problems in bioinformatics. Advances in solutions of this problem could lead to better understanding of how some diseases occur and how to prevent or treat them on a person to person basis. Gene Ontology (GO)  is one of the common ways of defining protein function. There are three different main GO classes: Molecular Function, Biological Process and Cellular Component. Since Molecular Function is one of the most used classes in the literature, in this study, we concentrate on it. Protein function determination is usually done based on the sequences in each class. Machine Learning algorithms need features extracted out of each sequence in order to train a classifier. Feature extraction can be done based on the a) physiochemical properties , b) sequence alignment scores between a sequence and the training data or c) the fingerprints/motifs (for example as in PRINTS  and PROSITE  databases) existing in each sequence. In this study, our aim is to compare success of function prediction when sequence physiochemical properties, alignment scores or motifs are used.
Abstract: Recently developed information retrieval technologies are based on the concept of a vector space due to the fact that it is speedy and simple to deal with zeros and ones. Generally, text databases are huge matrices defined in the vector spaces that are specially formed by indexing the titles of documents. Searching through these databases to find documents related to our desired subject takes a long time and lots of calculations have to be done. In the literature, methods are used with dimension reduction by means of SVD or PCS, then defining the documents in new dimensions and then searching all of the documents in the reduced dimensions one by one to find the relevant documents about a desired subject. In addition to say that the dimension reduction also accelerates to search documents in hierarchical tree representation because of less calculation while dealing with reduced dimension matrices. In this study, we do not use a reduction of dimension but also compare the results with other Information Retrieval Methods like LSI or Cosine Similarity Algorithm. Making the grouping as a hierarchical tree structure is proposed, so that it can be possible to direct the search to smaller groups in each step. Static M-Path Algorithm and Adaptive M-Path Algorithm search are proposed in IEEE. Transaction in Information Theory databases with three different dimensions which are 544 x 301, 1218 x 801 and 1600 x 1228. By these methods the searching complexity is reduced and the number of calculation gets smaller. Another finding is the fact that a result to a search is related to the query vector. Generally, the more keywords that a search vector contains, the larger is the probability of finding a good result (a closer document to the query vector) but the nice or worst hierarchical tree design can lead to better or worst results of our searching.
Abstract: Büyük doküman yığınları üzerinde sorgulama yaparken dokümanları vektörlere ve doküman topluluklarını matrislere indirgemek sorgulamaları çok daha hızlandırır ve kolaylaştırır. Kullanılan matris ve vektörlerin boyutlarının büyüklüğü sebebiyle sorgulamalarda ortaya çıkan yüksek hesap karmaşıklığından kaçınılması için literatürde tekil değer ayrışımı ve ana bileşen analizi gibi boyut indirgeme yöntemleri önerilmiştir. Boyut indirgemeyle beraber hesap karmaşıklığını indirgeme için ’de veritabanını sıradüzensel ağaç yapısı ile düzenleme ve bu yapı üzerinden tekli ve çoklu yollar kullanarak sorgulama önerilmiştir. Bu bildiride geri izlemeye olanak veren en iyi birincil sorgulama algortimasını statik ve uyarlanabilir çoklu yollar üzerinden sorgulama yöntemleriyle birleştirerek hesap karmaşıklık-başarım ödünleşimleri incelenmekte ve aynı zamanda karşılaştırılmaktadır.
Abstract: The representation of large document databases, consisting of Web pages, articles and book and magazine titles, in terms of matrices for the purpose of text querying and retrieval simplifies and expedites the querying process. In the literature, dimensionality reduction techniques based on singular value decomposition and principal component analysis have been proposed to reduce the high computational complexity resulting from the use of high dimension matrices and vectors. Serkan Kaya et al. (2002) proposed the organization of the text database in the form of a hierarchical tree structure, and single path and multipath querying over this structure, as a technique to reduce the computational complexity in addition to dimensionality reduction. We analyze and compare the tradeoff between the computational complexity and the performance of the static and adaptive multipath querying methods by varying the number of paths.