Optimization of K-means clustering using Artificial Bee Colony Algorithm on Big Data

Written by

Afroj Alam

Analytics, Artificial Intelligence, Data Analyst

Afroj Alam1^*(alamafroj@gmail.com)

Department of Computer Application Integral University, Lucknow(U.P) Inida, Sambhram University Jizzax Uzbekistan

Mohd Muqeem²

Department of Computer Application Integral University, Lucknow(U.P) Inida

Introduction:

From past few decades, there rapid development of the advanced technology and IoT based sensor devices which resulted with an explosive growth in data generation and storage. The amount of data which is generated is constantly growing even exponential growing and thus cannot be predicted or even cannot find the hidden information traditional way. Indeed, many new applications producing this huge amount of data, especially those where users can write, upload, post and share a lot of data, information and videos, such as social media sites like Facebook, twitter, telegram, instagram where every second every minutes huge amount of image, video and data are post and shares . Accordingly, as mentioned in [1], it is approximately up to 45 Zeta bytes digital data we have up to 2020. In the Current information technology world, this huge amount and the massive volume of data with more attributes is called “High dimensional Big Data”. A lot of important frequent-pattern, meaningful information and valuable hidden pattern can be extracted from this huge amount of data, which help the organization for improving the business intelligence, decision-making, fraud detection etc. K-means clustering is a most important and powerful un-supervised partitioning machine learning techniques for division of this big data into homogenous group i.e. cluster [2][7][8].

There are lot of limitation of K-means in big and high dimensional data: it converges to the local optimal solution, no of cluster is to be defines in advance, initialization of clusters centroid, lack of quality of clusters [3]. We have proposed a hybridized K-means with nature inspired Artificial Bee Colony global optimization algorithm that resolve the limitation of K-means clustering.

Nature inspired optimization:

There are lot of Population-based meta-heuristic Evolutionary Algorithms (EAs) global optimization algorithms which are inspired by the natural behaviour of the population evolution such as Genetic Algorithm, Artificial Bee Colony (ABC), Artificial Ant Colony and particle swarm based intelligence algorithm.

Artificial Bee Colony

ABC is a global optimization met-heuristic algorithm which is inspired by the intelligent behaviour of honey bees. This algorithm is popular due to its flexible computational time. In our proposed method we use the ABC algorithm for the initialization and selection of cluster centroids [6].

This algorithm is executed in 4 steps as given below:

Initialization
Employed Bee
On-looker Bee
Scout bees

The objective function of Artificial Bee Colony (ABC) algorithm is designed as according to the optimal number of selection of clusters for K-means.

The population of ABC is initialized by equation 1. in which i=1,2,3,…….,BN, here BN defines the total number of food sources and value of j=1,2,3,………,D. D is the number of dimensions. The upper and lower bounds of the variable j is x_min,j and x_max,j.

Updation of the bees location is as given below

In above equation r ∈ 1, 2,3, ·····,BN and j ∈ 1, 2, ·····, D are indexes and Φ is a random generated number in between [−1, 1]. If new solution is better than old solution i.e. equation (2), than old solution will replaced by new one.

The of each solution is computed by where f it_i is a probability fitness value of the i^th solution. If fitness of new solution is higher than old solution than old will replaced by new solution.

Proposed methodology:

In our proposed methodology we hybridized the K-mean with ABC (ABK) comes up with the plan that K-means algorithm provide the new solution of scout bees in every iteration. The K-means generate the new solutions as according to the employed bee and onlooker bee steps. In this way we can get more optimized results. The new solution of K-means will be added in every iteration improve the accuracy for reaching ABC to higher level.

The new solution from the K-means is generated according to the solutions of the employed bee and the onlooker bee phases. This process may increase the chances of giving more suitable solutions for the optimization problem. The addition of new solution from K-means after every cycle may enhance the reach of ABC algorithm to a different level. Our proposed idea finds the f_ivalues from the given below distance formula.

distance=min(i_i,j_j) (4)

The fitness function is the calculated be the given equation as the sum of all the distance_i values.

In the above equation the population will be survived according to the better fitness otherwise it will reject [5].

TABLE 1[4] COMPARATIVE ANALYSIS BASED ON INTRA CLUSTER DISTANCE

Reference

Ilango, S. S., Vimal, S., Kaliappan, M., & Subbulakshmi, P. (2019). Optimization using artificial bee colony based clustering approach for big data. Cluster Computing, 22(5), 12169-12177.
Alam, A., Muqeem, M., & Ahmad, S. (2021). Comprehensive review on Clustering Techniques and its application on High Dimensional Data. International Journal of Computer Science & Network Security, 21(6), 237-244.
Saini, G., & Kaur, H. (2014). A novel approach towards K-mean clustering algorithm with PSO. Int. J. Comput. Sci. Inf. Technol, 5, 5978-5986.
Krishnamoorthi, M., & Natarajan, A. M. (2013, January). A comparative analysis of enhanced Artificial Bee Colony algorithms for data clustering. In 2013 International Conference on Computer Communication and Informatics (pp. 1-6). IEEE.
Bharti, K. K., & Singh, P. K. (2014, December). Chaotic artificial bee colony for text clustering. In 2014 Fourth International Conference of Emerging Applications of Information Technology (pp. 337-343). IEEE.
Enríquez-Gaytán, J., Gómez-Castañeda, F., Moreno-Cadenas, J. A., & Flores-Nava, L. M. (2020, November). A Clustering Method Based on the Artificial Bee Colony Algorithm for Gas Sensing. In 2020 17th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE) (pp. 1-4). IEEE.
Alam, A., Rashid, I., & Raza, K. (2021). Application, functionality, and security issues of data mining techniques in healthcare informatics. In Translational Bioinformatics in Healthcare and Medicine (pp. 149-156). Academic Press.
Alam, A., Qazi, S., Iqbal, N., & Raza, K. (2020). Fog, Edge and Pervasive Computing in Intelligent Internet of Things Driven Applications in Healthcare: Challenges, Limitations and Future Use. Fog, Edge, and Pervasive Computing in Intelligent IoT Driven Applications, 1-26.

Optimization of K-means clustering using Artificial Bee Colony Algorithm on Big Data

Introduction:

Nature inspired optimization:

Artificial Bee Colony

Proposed methodology:

Reference

Comments

Leave a Reply Cancel reply

More posts

Understanding the Difference Between Qualitative and Quantitative Data: A Comprehensive Guide

Exploring the Different Types of Business Analytics: A Comprehensive Guide to Leveraging Data for Success

Understanding the Different Measurement Scales: A Comprehensive Guide

From Past to Present: Tracing the Origins and Growth of Statistics