Unveiling Student Perspectives on Project-Based Learning through comprehensive NLP Analysis in Statistics and Data Science Education
Conference
Regional Statistics Conference 2026
Format: CPS Abstract - Malta 2026
Keywords: data-science education, nlp, project-based
Session: CPS 32 Students II
Friday 5 June 11 a.m. - noon (Europe/Malta)
Abstract
Project-Based Learning (PBL) in statistics and data science courses helps students transform abstract concepts into meaningful experiences. Through team projects, students learn to design analyses, interpret results, and communicate insights—core competencies that prepare them for both academic and professional contexts. Understanding students’ perspectives on project-based learning is vital for designing effective and inclusive courses. Their reflections reveal not only attitudes toward content and methods but also deeper emotional and cognitive responses to learning through projects. Unlike closed-ended questionnaires, open-ended feedback provides rich, nuanced insights into how students experience challenges, collaboration, and personal growth, capturing the meanings and emotions behind these experiences.
To uncover these perspectives, feedback from 439 undergraduate students in non-STEM departments in a community college was analyzed using a rigorous multi-stage NLP-based framework that integrated text preprocessing, semantic embeddings, clustering, and sentiment analysis. Responses were cleaned to remove noise and transformed. Dimensionality reduction with UMAP and clustering with Gaussian Mixture Model (optimized using DBCV, silhouette, and BIC indices) revealed latent attitudinal structures. Sentiment polarity and subjectivity were assessed using VADER and TextBlob, while supervised models (Random Forest, LightGBM, SVM) validated the stability of clusters. Subgroup analyses by gender, faculty, ADHD status, math background, and course type added interpretive depth.
The analysis revealed two dominant dimensions shaping students’ experiences: math or data science anxiety and an interaction or hands-on orientation. Four thematic clusters emerged, reflecting different combinations of these dimensions. About half of the students (53%) expressed some form of anxiety, while nearly three-quarters (74%) emphasized the importance of collaboration and applied hands-on learning. Sentiment analysis showed an overall positive tone, with lower positivity among anxious students. Demographic patterns indicated higher anxiety among students with weaker math or language backgrounds and greater interaction orientation among female and pre-academic students.
These findings highlight the importance of integrating students’ voices into the evaluation of teaching methods. The nuanced understanding derived from open-ended feedback enables educators to design scaffolded, authentic, and inclusive learning experiences that address both emotional and cognitive aspects of learning. Effective PBL design should combine structured support for anxious learners with real-world, collaborative experiences for confident students, emphasizing early achievements, clear milestones, and visible skill growth. Through this evidence-based understanding of students’ perspectives, educators can create data science learning environments that are both intellectually rigorous and emotionally supportive.