Analyzing Time Complexity in Machine Learning Algorithms for Big Data: A Study on the Performance of Decision Trees, Neural Networks, and SVMs

Authors

  • Thirunavukkarasu Pichaimani Molina Healthcare Inc, USA Author
  • Anil Kumar Ratnala Albertsons Companies Inc Author
  • Priya Ranjan Parida Universal Music Group, USA Author

Keywords:

time complexity, decision trees

Abstract

This research paper presents an in-depth analysis of the time complexity associated with three prominent machine learning algorithms—decision trees, neural networks, and support vector machines (SVMs)—in the context of big data. With the growing influx of large-scale data in various sectors, the ability of machine learning algorithms to process and analyze this data efficiently has become paramount. In this study, we focus on evaluating the computational performance of these algorithms, with particular emphasis on how they scale when applied to big data environments. The paper begins by discussing the theoretical foundations of time complexity and its significance in machine learning, especially in scenarios involving extensive datasets. We highlight the importance of understanding time complexity not only from an algorithmic perspective but also in terms of real-world application where both accuracy and computational efficiency are critical for large-scale deployments.

The decision tree algorithm, known for its simplicity and interpretability, is widely used in various data mining and machine learning tasks. However, when dealing with large datasets, its performance can suffer due to its recursive nature and the need to search through many possible splits at each node. We analyze the time complexity of different types of decision trees, including classification and regression trees (CART) and random forests, to determine their scalability limits. The study examines how decision trees perform under various data distribution patterns and feature dimensionalities, providing insights into how their time complexity grows with increasing dataset size and feature space.

Neural networks, specifically deep learning models, have gained popularity for their ability to model complex patterns in large datasets. Despite their high accuracy, especially in tasks involving unstructured data such as images and text, their time complexity poses significant challenges. This paper provides a detailed analysis of the time complexity of feedforward neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs). Special attention is given to the number of layers, nodes per layer, and the impact of training algorithms, such as stochastic gradient descent (SGD) and backpropagation, on the overall time complexity. The analysis also explores how the increasing size of training data and the depth of neural networks affect computation time and memory usage, ultimately impacting their viability for big data applications.

Support vector machines (SVMs), another widely used algorithm, are known for their strong theoretical foundations and ability to provide high-accuracy results, particularly in classification tasks. However, SVMs tend to struggle with scalability when applied to large datasets, primarily due to their quadratic time complexity in the training phase. This research investigates the computational limitations of SVMs, focusing on both the primal and dual formulations of the algorithm. We analyze the impact of kernel functions, such as linear, polynomial, and radial basis functions (RBF), on time complexity and performance, especially when dealing with high-dimensional data. The study further explores optimization techniques, such as the use of support vector approximation and parallelization, to improve the scalability of SVMs in big data environments.

In addition to the theoretical analysis, this paper provides empirical results based on the implementation of these algorithms on large datasets from various domains, including healthcare, finance, and e-commerce. We compare the computational efficiency of decision trees, neural networks, and SVMs under different big data scenarios, evaluating factors such as dataset size, feature dimensionality, and class distribution. The results of these experiments offer valuable insights into the practical trade-offs between time complexity and model accuracy, enabling practitioners to make informed decisions when selecting machine learning algorithms for large-scale data analysis.

Furthermore, the paper discusses the role of hardware accelerators, such as graphics processing units (GPUs) and tensor processing units (TPUs), in mitigating the computational bottlenecks associated with these algorithms. We explore how parallelization and distributed computing frameworks, such as Apache Spark and Hadoop, can be leveraged to improve the performance of machine learning models in big data contexts. The integration of these technologies with machine learning algorithms can significantly reduce training and inference times, making it feasible to apply computationally intensive models, such as deep neural networks, to massive datasets without sacrificing performance.

The findings of this study contribute to a deeper understanding of the computational complexities associated with decision trees, neural networks, and SVMs, particularly in the context of big data applications. By providing both theoretical and empirical insights, the research offers a comprehensive evaluation of the trade-offs between algorithmic accuracy, computational efficiency, and scalability. Ultimately, the paper underscores the importance of selecting appropriate machine learning models based on their time complexity, especially when dealing with the growing demands of big data. The analysis presented here is intended to guide data scientists, machine learning engineers, and researchers in the development of more efficient and scalable machine learning solutions for large-scale data processing.

Downloads

Download data is not yet available.

Downloads

Published

18-01-2024

How to Cite

[1]
T. Pichaimani, A. K. Ratnala, and P. R. Parida, “Analyzing Time Complexity in Machine Learning Algorithms for Big Data: A Study on the Performance of Decision Trees, Neural Networks, and SVMs ”, J. Sci. Tech., vol. 5, no. 1, pp. 164–205, Jan. 2024, Accessed: Mar. 07, 2026. [Online]. Available: https://thesciencebrigade.org/jst/article/view/454

Most read articles by the same author(s)