Analysis of Software Effort Estimation by Machine Learning Techniques

Meharunnisa; Saqlain, Muhammad; Abid, Muhammad; Awais, Muhammad; Stević, Željko

Analysis of Software Effort Estimation by Machine Learning Techniques

Датотеке

Analysis of Software Effort Estimation by Machine Learning Techniques.pdf(1.36 MB)

Датум

2023

Аутори

Издавач

LIETA

Апстракт

Software effort estimation is a crucial activity in software project management that involves predicting the level of effort required to develop or maintain software applications. Accurate estimates enable effective planning and staffing which are key to on-time and on-budget delivery of software projects. This paper presents an analysis of using machine learning techniques for improving software effort estimation based on empirical datasets. Five public datasets from various sources were used - ISBSG, NASA93, COCOMO, Maxwell, and Desharnais. The data was preprocessed by handling missing values, converting categorical features, and splitting into train-test sets. Four machine learning regression algorithms were evaluated-linear regression, Gradient Boosting, Random Forest, and Decision Tree. Additionally, correlation-based feature selection was applied to select relevant subset of features and reduce dimensionality. The comparative analysis focused on two key metrics -R2 and root mean squared error (RMSE) to evaluate prediction accuracy. The results indicate that linear regression and Random Forest models perform significantly better than other approaches for this effort estimation task when using correlation to select features. The best R2 scores were achieved for NASA93, COCOMO, Maxwell, and Desharnais datasets. RMSE was lowest for the Desharnais dataset indicating high accuracy. The findings suggest that correlation- based feature selection can improve machine learning models for software effort estimation. The strengths of linear regression and Random Forest models make them suitable for developing reliable estimation tools. The insights from this comparative analysis establish a strong baseline for future research. Software project planners can leverage these findings to build intelligent data-driven effort prediction systems

Кључне речи

estimation, machine learning, software, data-driven, linear regression, gradient boosting, random forest, root mean squared error (RMSE)

URI

https://vaseljena.ues.rs.ba/handle/123456789/1321

Колекције

Саобраћајни факултет [Научни радови] / Faculty of Transport and Traffic Engineering [Scientific papers]

Страница целе ставке