SAPIENT: A multi-agent framework for corporate reputation intelligence through sentinel monitoring and LLM-based synthetic population simulation
Dosyalar
Tarih
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
Corporate reputation teams rely on media monitoring and qualitative research, both limited in speed and coverage when digital narratives form rapidly. This paper proposes SAPIENT (Sentinel-Augmented Population Intelligence for Emerging Narrative Tracking), a multiagent system that links a sentinel layer over public text streams with a simulation layer that runs moderated, repeatable in silico focus-group sessions. The sentinel layer ingests social media, news, and forum text to produce a compact signal state (topics, sentiment, anomaly scores, risk labels), which conditions the simulation layer through an orchestrator. Persona agents and a moderator follow an Agentic Focus Group (AFG) protocol with repeated runs, variance reporting, and human review gates. We describe four sustainability communication scenarios: greenwashing backlash prediction, greenhushing risk assessment, campaign pre-testing, and crisis communication simulation. Nine experiments span 280 AFG runs across 20 conditions, three LLM backends (Claude Sonnet 4, GPT-4o, and Gemini 2.5 Flash), and a preregistered pilot human validation study with 54 participants. Signal conditioning improved simulation specificity (p = 0.012). Cross-lingual sessions revealed a sentiment asymmetry between English and Turkish (p = 0.001) with preserved persona rank ordering (r = 0.81, p = 0.015). Cross-model comparison showed consistent persona differentiation across all three backends (Pearson r > 0.92, p < 0.002 for all pairs). Sentiment was robust to prompt paraphrasing (p = 0.061, n.s.), though credibility was sensitive to prompt wording (p < 0.001). All significant results from Experiments 1–8 survived Benjamini–Hochberg correction. A preregistered pilot with 54 human participants on Prolific replicated the predicted credibility ranking across framing variants (p = 0.004) but not the sentiment ranking, identifying a specific calibration target for future work.










