STP-LRL 2026 Abstracts


Area 1 - STP-LRL

Full Papers
Paper Nr: 10
Title:

Comparing Automatic Speech Recognition Quality for Brazilian Portuguese in Multimodal Large Models

Authors:

Lucas de Souza Lanaro, Gabriel K. Moraes, Pedro Augusto Luiz, Charles S. Oliveira, Pedro Henrique Rodrigues Salzani, Giovanni Carbinatti, Leonardo Feltrin, Fabiana C. Q. de O. Marucci, Isabela de Lima Santos, Renata De Paris and Wandemberg Gibaut

Abstract: This study presents a comparative evaluation of multimodal Large Language Models (MMs) for speech-to-text transcription in Brazilian Portuguese, addressing the challenges posed by accent variability, intonation, and contextual nuances. Using two benchmark datasets-CORAA ASR v1.1 and Common Voice Delta Segment 21.0-the research assesses transcription accuracy and semantic preservation across approximately 14,000 audio samples. Three models were analyzed: Whisper (OpenAI), serving as the baseline for automatic speech recognition, and two multimodal models, Qwen 2.5 Omni (Alibaba Cloud) and Phi-4 Multimodal (Microsoft). Performance was measured using Word Error Rate (WER) and Character Error Rate (CER), both derived from Levenshtein distance. Results indicate that Phi-4 consistently achieved superior accuracy across both datasets, followed by Whisper, while Qwen exhibited the highest error rates, particularly in short utterances. These findings underscore the potential of multimodal architectures to enhance speech understanding in linguistically diverse contexts, contributing to the development of inclusive and robust natural language processing systems for non-English languages.

Paper Nr: 13
Title:

A Rule-Based Computational Model for Gàidhlig Morphology

Authors:

Peter J. Barclay

Abstract: Language models and software tools are essential to support the continuing vitality of lesser-used languages; however, currently popular neural models require considerable data for training, which normally is not available for such low-resource languages. This paper describes work-in-progress to construct a rule-based model of Gàidhlig morphology using data from Wiktionary, arguing that rule-based systems effectively leverage limited sample data, support greater interpretability, and provide insights useful in the design of teaching materials. The use of SQL for querying the occurrence of different lexical patterns is investigated, and a declarative rule-base is presented that allows Python utilities to derive inflected forms of Gàidhlig words. This functionality could be used to support educational tools that teach or explain language patterns, for example, or to support higher level tools such as rule-based dependency parsers. This approach adds value to the data already present in Wiktionary by adapting it to new use-cases.

Paper Nr: 15
Title:

Architecture-Agnostic Curriculum Learning for Document Understanding: Empirical Evidence from Text-Only and Multimodal Paradigms

Authors:

Mohammed Hamdan, Vincenzo Dentamaro, Giuseppe Pirlo and Mohamed Cheriet

Abstract: This study investigates whether the efficiency gains afforded by curriculum learning transfer across architecturally distinct document understanding models. Through a series of 24 controlled experiments comparing BERT (text-only) and LayoutLMv3 (multimodal) on the FUNSD and CORD benchmarks, we demonstrate that progressive data scheduling following a 33%→67%→100% trajectory yields consistent training speedups of 34.9% for BERT and 29.2% for LayoutLMv3, both achieving statistical significance at p < 0.001. The modest 5.8 percentage point differential between these architectures provides compelling evidence for architectureagnostic curriculum benefits. Notably, both model families exhibit approximately twofold higher final training loss while maintaining equivalent downstream F1 performance, suggesting that the loss differential reflects optimization dynamics rather than representational degradation. Extended validation across six document domains confirms the cross-domain transferability of these findings. These results indicate that curriculum learning operates primarily at the data distribution level, enabling practitioners to deploy uniform progressive schedules across heterogeneous model portfolios without requiring architecture-specific tuning.

Short Papers
Paper Nr: 11
Title:

Comparing RAG, DPO and Agentic Approaches in Systems Performance on Q&A about Brazilian Labor Legislation

Authors:

Gabriel K. Moraes, Pedro Augusto Luiz, Gabriel Dias, Vitor G. C. B. de Farias, Fabiana C. Q. de O. Marucci, Vitor L. Fabris, Matheus H. R. Vicente, Leonardo R. do Nascimento, Charles S. Oliveira, Leonardo T. dos Santos, Renata De Paris and Wandemberg Gibaut

Abstract: This study evaluates three complementary strategies for deploying LLM-based assistants to address in legal queries related to Brazil’s Consolidation of Labor Laws (CLT): Direct Preference Optimization (DPO) fine-tuning, Retrieval-Augmented Generation (RAG), and specialized multi-agent coordination. A dataset of 736 human-preference triplets was used to train a Low-Rank Adaptation (LoRA) adapter on a 4-bit quantized LLaMA-3 8B model. Performance was assessed on expert-crafted queries, both with and without RAG, using statistical metrics and human evaluations. Results show that DPO alone improves factual accuracy and semantic similarity, while RAG paired with a single agent can reduce precision. In contrast, combining RAG with a multi-agent workflow significantly improves answer precision, human-judged quality, and Recall-Oriented Understudy for Gisting Evaluation (ROUGE-L) scores. These findings indicate that DPO provides lightweight method for improving accuracy, and that agent decomposition is essential for safely and effectively leveraging external knowledge, providing a scalable and reliable framework for legal question answering under the CLT and similar regulatory regimes.

Paper Nr: 12
Title:

A Multimodal DeBERTa-Based Recommender System for Low-Resource and Sparse Data Environments

Authors:

Malek Ghanem, Wided Guezguez and Raouia Ayachi

Abstract: Recommender Systems (RS) are critical for personalization across digital platforms. However, two fundamental limitations persist: data sparsity and linguistic low-resource conditions. The first refers to insufficient user-item interactions, while the second concerns the lack of linguistic resources such as annotated data, lexicons, or pre-trained embeddings. Both problems degrade the representational capacity of models and hinder accurate predictions. In this work, we propose a Multimodal recommender system based on DeBERTa, integrating multiple feature sources (ratings, textual reviews, and metadata) through an early fusion strategy to mitigate both challenges simultaneously. Experiments conducted on MovieLens, Amazon datasets, and the multilingual IndicHash dataset demonstrate the superiority of our approach over traditional models like GRU, SVD, and Transformer baselines, as well as specialized multilingual systems, particularly under extreme sparsity and low-resource conditions.