Analysis of Software Effort Estimation by Machine Learning Techniques
Учитавање...
Датум
2023
Наслов журнала
Журнал ISSN
Наслов волумена
Издавач
LIETA
Апстракт
Software effort estimation is a crucial activity in software project management that involves predicting the level of effort required to develop or maintain software applications. Accurate estimates enable effective planning and staffing which are key to on-time and on-budget delivery of software projects. This paper presents an analysis of using machine learning techniques for improving software effort estimation based on empirical datasets. Five public datasets from various sources were used - ISBSG, NASA93, COCOMO, Maxwell, and Desharnais. The data was preprocessed by handling missing values, converting categorical features, and splitting into train-test sets. Four machine learning regression algorithms were evaluated-linear regression, Gradient Boosting, Random Forest, and Decision Tree. Additionally, correlation-based feature selection was applied to select relevant subset of features and reduce dimensionality. The comparative analysis focused on two key metrics -R2 and root mean squared error (RMSE) to evaluate prediction accuracy. The results indicate that linear regression and Random Forest models perform significantly better than other approaches for this effort estimation task when using correlation to select features. The best R2 scores were achieved for NASA93, COCOMO, Maxwell, and Desharnais datasets. RMSE was lowest for the Desharnais dataset indicating high accuracy. The findings suggest that correlation- based feature selection can improve machine learning models for software effort estimation. The strengths of linear regression and Random Forest models make them suitable for developing reliable estimation tools. The insights from this comparative analysis establish a strong baseline for future research. Software project planners can leverage these findings to build intelligent data-driven effort prediction systems
Опис
Кључне речи
estimation, machine learning, software, data-driven, linear regression, gradient boosting, random forest, root mean squared error (RMSE)