Download PDF

Analyzing Medium and Long Text Indonesian Tourism Feedback Using Topic Modeling and Sentiment Analysis

Author

Sulisetyo Puji Widodo

Conference

2026 IAOS Conference

Format: CPS Poster - IAOS 2026

Keywords: nlp, sentiment analysis

Session: Poster Session

Tuesday 12 May 12:30 p.m. - 2:30 p.m. (Europe/Vilnius)

Abstract

Tourism is a key sector in Indonesia’s economic development. While digital surveys effectively capture structured quantitative data, the rich and contextual insights from open-ended user feedback remain underutilized. This type of feedback reflects real public perceptions, concerns, and expectations that are often not captured by quantitative results, making it valuable for improving survey design and informing tourism policies. However, due to its unstructured and narrative nature, large-scale manual analysis is inefficient. To address this challenge, NLP-based approaches—particularly topic modeling and sentiment analysis—offer an effective way to identify key themes and public attitudes from tourism feedback data. By combining these methods, this study aims to extract strategic issues, identify suitable analytical models, and support more participatory, adaptive, and evidence-based tourism policy development in Indonesia.

To achieve these objectives, the proposed workflow begins with data collection in CSV format, followed by a preprocessing stage. Short texts containing fewer than 11 words are excluded to ensure sufficient contextual information, while the remaining feedback is categorized into medium-length texts (11–30 words) and long texts (more than 30 words). Each text category is then analyzed using multiple topic modeling techniques, including GSDMM, BERTopic, Top2Vec, kBERT, kUSE, NMF, Agglomerative Clustering, and LDA. The most suitable model for each text length is selected based on coherence scores. Subsequently, sentiment analysis is conducted using RoBERTa, DistilBERT, BERT, ALBERT, and XLM-RoBERTa, with the final model chosen based on overall model agreement. This process produces sentiment-labeled topics for both medium and long texts.

The results show that, for medium-length texts, BERTopic achieved the best performance by generating well-defined topics such as tourist facilities and tourism governance, with tourist facilities receiving predominantly positive feedback. In contrast, for long texts, NMF performed more consistently and was better at capturing complex issues, including travel costs, accessibility, environmental concerns, and tourism digitalization. Although tourist destinations were generally perceived positively, some recurring complaints related to costs and access were also identified.

Consistent with these findings, RoBERTa emerged as the most stable sentiment analysis model across both datasets. As a result, the combination of BERTopic for medium-length texts, NMF for long texts, and RoBERTa for sentiment analysis represents the most optimal analytical configuration. Overall, this study demonstrates that user feedback can effectively reveal key thematic and sentiment patterns, providing valuable support for improving survey evaluation and enabling more evidence-based tourism policy decisions.

Figures/Tables

Research workflow

Medium Dataset

Long Dataset