Analyzing Time Complexity in Machine Learning Algorithms for Big Data: A Study on the Performance of Decision Trees, Neural Networks, and SVMs

Thirunavukkarasu Pichaimani; Anil Kumar Ratnala; Priya Ranjan Parida

Analyzing Time Complexity in Machine Learning Algorithms for Big Data: A Study on the Performance of Decision Trees, Neural Networks, and SVMs

Authors

Thirunavukkarasu Pichaimani Molina Healthcare Inc, USA Author
Anil Kumar Ratnala Albertsons Companies Inc Author
Priya Ranjan Parida Universal Music Group, USA Author

Keywords:

time complexity, decision trees

Abstract

This research paper presents an in-depth analysis of the time complexity associated with three prominent machine learning algorithms—decision trees, neural networks, and support vector machines (SVMs)—in the context of big data. With the growing influx of large-scale data in various sectors, the ability of machine learning algorithms to process and analyze this data efficiently has become paramount. In this study, we focus on evaluating the computational performance of these algorithms, with particular emphasis on how they scale when applied to big data environments. The paper begins by discussing the theoretical foundations of time complexity and its significance in machine learning, especially in scenarios involving extensive datasets. We highlight the importance of understanding time complexity not only from an algorithmic perspective but also in terms of real-world application where both accuracy and computational efficiency are critical for large-scale deployments.

The decision tree algorithm, known for its simplicity and interpretability, is widely used in various data mining and machine learning tasks. However, when dealing with large datasets, its performance can suffer due to its recursive nature and the need to search through many possible splits at each node. We analyze the time complexity of different types of decision trees, including classification and regression trees (CART) and random forests, to determine their scalability limits. The study examines how decision trees perform under various data distribution patterns and feature dimensionalities, providing insights into how their time complexity grows with increasing dataset size and feature space.

Neural networks, specifically deep learning models, have gained popularity for their ability to model complex patterns in large datasets. Despite their high accuracy, especially in tasks involving unstructured data such as images and text, their time complexity poses significant challenges. This paper provides a detailed analysis of the time complexity of feedforward neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs). Special attention is given to the number of layers, nodes per layer, and the impact of training algorithms, such as stochastic gradient descent (SGD) and backpropagation, on the overall time complexity. The analysis also explores how the increasing size of training data and the depth of neural networks affect computation time and memory usage, ultimately impacting their viability for big data applications.

Support vector machines (SVMs), another widely used algorithm, are known for their strong theoretical foundations and ability to provide high-accuracy results, particularly in classification tasks. However, SVMs tend to struggle with scalability when applied to large datasets, primarily due to their quadratic time complexity in the training phase. This research investigates the computational limitations of SVMs, focusing on both the primal and dual formulations of the algorithm. We analyze the impact of kernel functions, such as linear, polynomial, and radial basis functions (RBF), on time complexity and performance, especially when dealing with high-dimensional data. The study further explores optimization techniques, such as the use of support vector approximation and parallelization, to improve the scalability of SVMs in big data environments.

In addition to the theoretical analysis, this paper provides empirical results based on the implementation of these algorithms on large datasets from various domains, including healthcare, finance, and e-commerce. We compare the computational efficiency of decision trees, neural networks, and SVMs under different big data scenarios, evaluating factors such as dataset size, feature dimensionality, and class distribution. The results of these experiments offer valuable insights into the practical trade-offs between time complexity and model accuracy, enabling practitioners to make informed decisions when selecting machine learning algorithms for large-scale data analysis.

Furthermore, the paper discusses the role of hardware accelerators, such as graphics processing units (GPUs) and tensor processing units (TPUs), in mitigating the computational bottlenecks associated with these algorithms. We explore how parallelization and distributed computing frameworks, such as Apache Spark and Hadoop, can be leveraged to improve the performance of machine learning models in big data contexts. The integration of these technologies with machine learning algorithms can significantly reduce training and inference times, making it feasible to apply computationally intensive models, such as deep neural networks, to massive datasets without sacrificing performance.

The findings of this study contribute to a deeper understanding of the computational complexities associated with decision trees, neural networks, and SVMs, particularly in the context of big data applications. By providing both theoretical and empirical insights, the research offers a comprehensive evaluation of the trade-offs between algorithmic accuracy, computational efficiency, and scalability. Ultimately, the paper underscores the importance of selecting appropriate machine learning models based on their time complexity, especially when dealing with the growing demands of big data. The analysis presented here is intended to guide data scientists, machine learning engineers, and researchers in the development of more efficient and scalable machine learning solutions for large-scale data processing.

Downloads

Download data is not yet available.

Downloads

Published

18-01-2024

Issue

Vol. 5 No. 1 (2024): Journal of Science & Technology

Section

Review Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

License Terms

Ownership and Licensing:

Authors of this research paper submitted to the journal owned and operated by The Science Brigade Group retain the copyright of their work while granting the journal certain rights. Authors maintain ownership of the copyright and have granted the journal a right of first publication. Simultaneously, authors agreed to license their research papers under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.

License Permissions:

Under the CC BY-NC-SA 4.0 License, others are permitted to share and adapt the work, as long as proper attribution is given to the authors and acknowledgement is made of the initial publication in the Journal. This license allows for the broad dissemination and utilization of research papers.

Additional Distribution Arrangements:

Authors are free to enter into separate contractual arrangements for the non-exclusive distribution of the journal's published version of the work. This may include posting the work to institutional repositories, publishing it in journals or books, or other forms of dissemination. In such cases, authors are requested to acknowledge the initial publication of the work in this Journal.

Online Posting:

Authors are encouraged to share their work online, including in institutional repositories, disciplinary repositories, or on their personal websites. This permission applies both prior to and during the submission process to the Journal. Online sharing enhances the visibility and accessibility of the research papers.

Responsibility and Liability:

Authors are responsible for ensuring that their research papers do not infringe upon the copyright, privacy, or other rights of any third party. The Science Brigade Publishers disclaim any liability or responsibility for any copyright infringement or violation of third-party rights in the research papers.

How to Cite

[1]

T. Pichaimani, A. K. Ratnala, and P. R. Parida, “Analyzing Time Complexity in Machine Learning Algorithms for Big Data: A Study on the Performance of Decision Trees, Neural Networks, and SVMs ”, J. Sci. Tech., vol. 5, no. 1, pp. 164–205, Jan. 2024, Accessed: Apr. 25, 2026. [Online]. Available: https://thesciencebrigade.org/jst/article/view/454

Download Citation

Analyzing Time Complexity in Machine Learning Algorithms for Big Data: A Study on the Performance of Decision Trees, Neural Networks, and SVMs

Authors

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

License Terms

How to Cite

Most read articles by the same author(s)