A K-Prototypes-Based Approach for Modeling Student Segmentation Based on Learning Strategies to Support Academic Decision-Making
Article Metrics
Abstract view : 29 timesAbstract
This study aims to model student segmentation based on learning strategies using the K-Prototypes clustering algorithm. The data used consist of mixed-type variables, including categorical variables (gender and major) and numerical variables such as grade point average (GPA), learning habits, motivation, learning environment, health and social support, academic involvement, and academic achievement.
The analysis was conducted through several stages, including data preprocessing, exploratory data analysis, and clustering using the K-Prototypes algorithm. The optimal number of clusters was determined using the Elbow and Silhouette methods, both of which indicated that four clusters provide the best clustering structure.
The results show that students can be grouped into four distinct clusters with different characteristics. Cluster 3 represents highly motivated and high-achieving students with strong engagement, while Cluster 1 consists of students with good academic performance supported by favorable learning conditions. Cluster 4 includes students with moderate characteristics, and Cluster 2 represents students with lower performance and weaker learning strategies.
The clustering results were further validated using t-SNE visualization, which shows a reasonably clear distribution of clusters despite some overlap. Overall, this study demonstrates that the K-Prototypes algorithm is effective in handling mixed-type educational data and can provide meaningful insights to support data-driven academic decision-making and the development of targeted learning strategies.
References
Aljawarneh, S. (2021). Reviewing and exploring innovative ubiquitous learning tools in higher education. Journal of Computing in Higher Education, 33, 1–20.
Aggarwal, C. C. (2021). Data mining: The textbook. Springer.
Andre, A., et al. (2023). Clustering approach in educational data mining.
Banerjee, R., et al. (2025). Analyzing student achievement with clustering techniques.
Broadbent, J., & Poon, W. L. (2021). Self-regulated learning strategies and academic achievement in online higher education learning environments: A systematic review. Internet and Higher Education, 27, 1–13.
Chen, L. (2024). Application of data mining in higher education systems.
García, E., & Weiss, E. (2020). Student engagement and academic performance in higher education. Educational Research Review, 30, 100–120.
Huang, Z. (2020). Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery.
Khodijah, S., et al. (2025). Student clustering based on academic performance.
Kodinariya, T. M., & Makwana, P. R. (2021). Review on determining number of cluster in K-means clustering.
Lu, Y., et al. (2025). Educational data mining and student performance analysis. Education Sciences.
Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(3).
Rousseeuw, P. J. (2021). Silhouettes: A graphical aid to the interpretation of cluster analysis.
Safitri, S. N., et al. (2022). Educational data mining using cluster analysis methods. Jurnal RESTI.
Utami, I. Q., et al. (2024). Student behavior clustering based on learning data.
Copyright (c) 2026 NURUL AIN FARHANA

This work is licensed under a Creative Commons Attribution 4.0 International License.











