COMPARATIVE ANALYSIS OF CLUSTERING METHODS SUITABLE FOR PLANT VARIETIES MORPHOLOGICAL CHARACTERISTICS DATA PROCESSING

Keywords: hierarchical agglomerative methods, measure interval, Fisher’s Iris data set, classification, cross-tables

Abstract

Despite the fact that clustering is uncontrolled classification of multi-dimensional data in corresponding
clusters, the clustering problem has been addressed in many contexts and by researchers in many subjects.
One of the research areas, where clustering is useful, is morphological analysis of plant variety characteristics,
which helps to identify new varieties more accurately. That is why it is important to compare the results
of clustering and the using of other methods and measure intervals in order to determine the most suitable
methods for morphological characteristics analysis. The following methods were used during the research:
analytical, mathematical, statistical, and graphic. This paper presents a comparative analysis of clustering
methods using the famous Fisher’s Iris data set and also the classification methods, which are the most suitable
for analyzing morphological characteristics of plant varieties. As a result, this paper presents a survey
of better plant varieties clustering results influenced by different hierarchical agglomerative classification
methods (Between-Groups Linkage, Within Groups, Nearest Neighbor, Furthest Neighbor, Centroid Method,
Median Method, Wards Method) using Euclidean and non-Euclidean measure intervals. Clustering results
were evaluated by using descriptive statistics methods (cross-tables). Some clustering algorithms and technologies,
which we used during the research, were also described. The article considers possible measure
interval which is used in algorithms, and presents the most popular clustering algorithms and shows their
role in the Data mining. Numerous techniques and clustering algorithms were suggested earlier to assist
clustering of time series data streams. The clustering algorithms and their effectiveness in various applications
are compared to recognize the most suitable method to solve the existing problem of morphological
analysis and new plant varieties identification. The best results were obtained using Average Linkage (Between
Groups) with Pearson Correlation measure interval, Average Linkage (Within Group) with Cosine
measure interval, Average Linkage (Within Group) with Pearson Correlation measure interval, Ward Method
with Cosine measure interval. Frequency statistics (cross-tables) to evaluate the quality of classification
results was suggested. Thus, the conducted testing proved that there is no universal algorithm that would
ideally distribute the set of Fisher’s Irises to clusters. Therefore, clustering of plant varieties should be carried
out iteratively, consistently applying the most common clustering algorithms and carefully evaluating
clustering results in order to select the method and measure interval, which classify plant varieties most optimally
and enable to interpret the classification results correctly.

Published
2019-06-28
How to Cite
Orlenko, N. S., Mazhuha, K. M., Dushar, M. B., & Maslechkin, V. V. (2019). COMPARATIVE ANALYSIS OF CLUSTERING METHODS SUITABLE FOR PLANT VARIETIES MORPHOLOGICAL CHARACTERISTICS DATA PROCESSING. Scientific Progress & Innovations, (2), 261-269. https://doi.org/10.31210/visnyk2019.02.35
Section
ТЕХНІЧНІ НАУКИ