Abstracts Track 2024


Area 1 - Artificial Intelligence

Nr: 385
Title:

Extraction of Changes in Language Use Through First Conversations on Social Networking Services Using Non-Negative Matrix Factorization

Authors:

Atsuko Mutoh, Koki Takahashi, Kosuke Shima, Koichi Moriyama, Azusa Yokogoshi, Eiko Yoshida and Nobuhiro Inuzuka

Abstract: In the field of sociolinguistics, it has been shown that people manipulate their social distance from others and increase the efficiency of communication by changing their language in conversation based on accommodation theory. It can be assumed that people manipulate their relationships with others through their first conversations by improving their impression of the other person and changing their language as the amount of conversation increases. On the other hand, the recent development of social networking services has had a significant impact on the field of sociolinguistics, with research using data such as X (formerly Twitter) enabling large-scale observation of human behavior. Replies can be regarded as conversations between individuals on X, which enables the observation of people's behavior on a large scale. In this study, we propose and analyze a model to quantitatively extract changes in the stylistic characteristics of language usage with the amount of conversation through first conversations on X. In the proposed model, the stylistic features of the conversation data for each amount of conversation between individuals are represented as a text feature matrix, and a non-negative matrix factorization (NMF) is used to extract the important text features that increase or decrease with the amount of conversation, and to analyze the changes in language use. In a related work, Wajima et al. [1] used NMF to extract Japanese features from a simplified corpus but did not deal with feature variation. In the experiment, Japanese conversations between individuals starting with '@account name Nice to meet you' in X replies were extracted for two months and about 6,000 stylistic features were compressed by NMF into 20 features and found to have increased and decreased over the period of the data. Experimental results showed several linguistic findings. Features that increased as the number of conversations increased included less polite and more communicative expressions. Conversely, features that decreased as the number of conversations increased included the number of line breaks, the number of pictograms, the number of characters and the number of sentences. These findings show that people behave in a way that reduces social distance and streamlines communication by their language use from the first conversation to the number of times they meet each other. On the other hand, as this experiment was conducted in Japanese conversations, we would like to extend it to handle English in the future. [1] Koji Wajima, Kei Koqure, Toshihiro Furukawa, Tetsuji Satoh: Extract of Japanese Text Characteristics of Simplified Corpora using Non-negative Matrix Factorization, Journal of Data Intelligence. 1(1): 75-98 (2020), Rinton Press.

Nr: 387
Title:

Reconstruction of Facial Geometry from Face-Masked Images Using Voice Cues

Authors:

Tetsumaru Akatsuka, Ryohei Orihara, Yuichi Sei, Yasuyuki Tahara and Akihiko Ohsuga

Abstract: The COVID-19 pandemic has led to widespread mask wearing, hampering facial recognition by obscuring parts of the face. We propose a novel multimodal approach that leverages voice features in addition to masked face images to estimate the complete unmasked face. The key idea is that voice has a strong correlation with the shape of facial features obscured by masks, like the mouth and nose. This is because vocal production involves structures like vocal cords, nasal cavity, and facial muscles that relate to facial shape. This method is based on state-of-the-art techniques using 3D Morphable Face Models (3DMM) as an intermediate representation to complete masked faces. 3DMMs parameterize facial shape in 3D space, allowing representation of texture, illumination, pose, and other attributes. Hence approaches utilizing 3DMMs can improve consistency of texture quality and head pose over methods purely estimating from masked inputs. However, while these approaches generate sharp faces, they do not guarantee accurate restoration grounded in the original identity. This can result in images of different individuals with severely compromised identifiability. Therefore, we hypothesized that incorporating voice, which has innate correlations to facial shape, as auxiliary information could enhance reconstruction fidelity in occluded facial regions. In the proposed model, voice embeddings extracted from a pre-trained encoder are fused with intermediate CNN features from the image stream. This conditioning aims to inject voice information to improve 3DMM shape prediction under the mask region. A skin attention mask is also introduced to focus shape losses on skin-colored face regions directly related to facial geometry. This avoids distraction from irrelevant textures like beards or sunglasses. The new network augmented with voice was trained on a newly collected dataset of over 100k face image and voice pairs extracted from VGGFace and VoxCeleb. Since obtaining real-world masked faces is difficult, MaskTheFace was utilized to synthetically overlay masks onto face images. Experiments on approximately 20K test images demonstrate superior performance over the 3DMM baseline without voice. The method was evaluated on quantitative metrics including L1 error, SSIM, PSNR and cosine similarity. In all metrics, the proposed approach outperformed the baseline, indicating that voice-guided 3DMM synthesis yields unmasked faces more faithful to the ground truth. Qualitative inspection also shows more accurate reconstruction of shapes around critical areas like the nose, mouth, and contours. This further validates the hypothesis that voice assists single view shape estimation under occlusion. Future work will explore temporal video and aging effects between images and voice, as well as better network fusion techniques. Advancing the current simple voice embedding with representations capturing expression is another direction. Overall, fusing speech with images to guide 3D facial geometry shows promise for overcoming mask obstructions that hamper recognition reliability. This cross-modal synthesis could improve security and surveillance where subject identification is crucial.

Nr: 277
Title:

Toward Reasoning About Geological Structure Using Symbolic Expression

Authors:

Yuta Taniuchi and Kazuko Takahashi

Abstract: We propose a method for creating a cross-section map from a given pair of stratigraphic columns based on a symbolic expression. When considering disaster prevention and efficient infrastructure development, it is significant to understand geological structure. Generally, it is difficult to obtain sufficient investigation data on some areas, and currently it is common for experts to create geological cross-section maps on a limited data depending on individual interpretations. It causes various interpretations to appear with less logical explanation. In this study, we take a logical approach without using numerical data to create a cross-section map. We show the method to draw the borderline of layers and to judge the inclination of layers, and also show a symbolic expression to a cross-section map between a pair of stratigraphic columns. First, we treat the geological structure where the inclinations of all the layers are the same without fault nor unconformity. The inclinations are divided into three types: even, rising to the right or left. We create the cross-section and judge the inclination of the layers in the following manner. We draw borderlines of the layers so that none of them cross with each other: first we draw a line between the same adjacent pairs of layers in both columns, and then draw a line from the border of the layers to the upper or lower side if there is no such a pair. It is deterministic whether an end point exists on the upper side or the lower side, and we can judge the inclination of the layers from the result. Next, we extend the method so that we can treat the geological structure that has angular unconformity. Since there not always exist the same adjacent pairs of layers in both columns, we draw a line from the border of a layer not only to the upper or lower side but possibly to another borderline. In this case, borderlines are not uniquely determined in general, and multiple solutions are obtained. We use a symbolic expression defined in [Taniuchi23] for representing a structure of the cross-section between a pair of stratigraphic columns. A structure of a stratigraphic column is given as a sequence of symbols each of which stands for each layer. A structure of a cross-section is obtained as a sequence of layer symbols encountered in tracing the frame of the section clockwisely starting from the top-left, and separated by parentheses for each side of the rectangle. It can provide a compact symbolic expression on which logical reasoning is available. To discriminate the structures with angular unconformity, we take additional sequences from a cross-section. They are obtained by tracing the frame of the outside of the structure incrementally excluding layers from the top. It enables to represent structural difference of the cross-section at the refined level. We have implemented our method for simple geological structures, whereas the implementation for geological structure which has angular unconformity remains. Moreover, we applied our method to practical data for an evaluation of the method. [Taniuchi23] Taniuchi, Y. and Takahashi, K.: "Qualitative Spatial Representation and Reasoning about Fold Strata," ICAART2023, pp.211-220, 2023.

Nr: 315
Title:

Attention over Drawing Movements in Parkinson’s Disease

Authors:

Manuel Gil-Martín, Sergio Esteban-Romero, Fernando Fernández-Martínez and Rubén San-Segundo

Abstract: Parkinson’s Disease (PD) has been widely analysed using inertial signals to detect motion symptoms like tremor or freezing of gait. However, anomalies in the kinematics of handwriting are one of the initial signs observed in PD. Moreover, patients in early stages of PD could manifest tremor only during some intervals of time instead of showing a continuous tremor. For this reason, this work is focused on detecting an overall assessment at patient-level combining information extracted from short time windows. This combination is performed by including an attention mechanism in a deep learning model that integrates tremor features extracted from short windows. For this work, we selected a dataset called “Parkinson Disease Spiral Drawings Using Digitized Graphics Tablet dataset”, which contains spiral drawings from 77 individuals: 62 have PD and 15 healthy people form the control group. A tablet and a digital pen were used to draw different spirals, capturing the information from five time series: X-Y-Z coordinates, pressure, and grip angle. This work used a signal processing module to segment these signals into consecutive windows of 3 seconds and 0.5-second step and compute the Fast Fourier Transform magnitude coefficients of these windows to highlight the tremor information in the frequency domain. The spectrum points were used (instead of the raw data directly) as inputs to the deep learning model. PD tremor becomes more apparent in the frequency domain: in some signals (X, Y) it was possible to see peaks of energy corresponding to the tremor frequency (between 3–9 Hz) and its harmonics. The deep learning architecture used in this work has a feature learning subset composed of convolutional and maxpooling layers, and a classification subset composed of fully-connected layers. After convolutional and fully connected layers, Dropout layers are included to avoid over fitting. The last layer has only one output with a sigmoid function target to classify between two classes. This architecture used TimeDistributed layers in order to wrap the input consecutive windows and an Attention layer to join the W analysed windows. The inputs of the deep learning architecture have W × N × M dimensions, where W denotes the number of windows analysed, N indicates the number of signals used and M denotes the number of FFT magnitude coefficients used. This attention layer could enhance the user-level performance until reaching a 100% of accuracy in tremor detection by selectively emphasizing input windows where tremors are noticeable. Through training, the attention mechanism learns to assign higher weights to windows containing relevant tremor patterns, allowing the model to focus on crucial segments of the input sequence. This targeted attention enables the network to effectively capture and interpret features associated with tremors, maximizing sensitivity in detecting instances of interest while potentially skipping less informative windows.

Nr: 388
Title:

Enhance Data Usefulness in Privacy Protection Under Considering IoT Measurement Error

Authors:

Riho Isawa, Yuichi Sei, Yasuyuki Tahara and Akihiko Ohsuga

Abstract: The use of big data is essential for artificial intelligence modelling and multi-agent system building. In some cases, because sensitive personal information is handled, privacy protection is necessary. Recently, Privacy protection technology using differential privacy is widely used. The Laplace mechanism is a well-known protection method that uses differential privacy. The method can be achieved by adding noise to the target value that needs to be protected. It is used in systems that handle big data such as federated learning and data analysis and agents while protecting the privacy of data on the client side. Laplace mechanism as the privacy protection is easy to introduce systems because the calculation cost is low but it adds noise to the data and reduces data usefulness. Therefore, it is desirable to minimize the amount of noise added for the pre-specified privacy level in advance to maintain the usefulness. In this research, we focus on the fact that the values obtained with IoT measurement values include errors and propose a method that minimizes the amount of noise for the pre-specified privacy level budget. Due to IoT measurement errors, extra protection may be provided for the pre-specified privacy level. A conventional method is true-value-based differential privacy (TDP). This is a method that assumes that the IoT measurement error is based on a normal distribution, finds an appropriate threshold value w, and does not add Laplace noise below the value of the threshold value w. This time, we propose a method to experimentally find an appropriate threshold w even when the IoT measurement error is not based on a normal distribution. The novelty of the proposed method is that it can accommodate various error distributions. Simulation systems for the IoT measurement errors are necessary for checking the guarantee of differential privacy. To confirm that differential privacy is satisfied, a total noise probability density function is derived by combining the measurement error and privacy noise. Because calculating the probability density function is difficult, it is derived by randomly generating samples and creating histograms as an experiment. The proposed method experimentally finds the largest threshold w by trying various values of w. The conventional method is when threshold w is 0.0. The larger threshold w increases the data usefulness. In the experiment, for example, when the number of samples is 10^8 and the measurement errors are randomly obtained from lognormal distribution with mean 0.0 and standard deviation 1.0 and the noise based on the Laplace mechanism is epsilon = 1.0, for the sum of the measurement error and the Laplace noise as the total error, the total error average was reduced by 10% by our proposed method while maintaining the pre-specified privacy level. In addition, we validated that the proposed method reduced the noise average by 36% for the conventional location privacy protection methods including geo-indistinguishability on location simulation of the human behavior of 10^4 users on a map using a typical human behavior model while maintaining the pre-specified privacy level. The proposed method was proven to provide efficient privacy protection while maintaining data usefulness for the pre-specified privacy level budget. The future tasks are to investigate the rate of improvement in user usefulness in building actual artificial intelligence models and federated learning models.

Nr: 392
Title:

Smart Traffic Signal Optimization with an Artificial Bee Colony Algorithm

Authors:

Atlantik Limani, Kadri Sylejmani, Uran Lajçi, Elvir Misini, Erzen Krasniqi and Bujar Krasniqi

Abstract: Traffic congestion remains a critical issue in urban areas, necessitating innovative solutions for efficient traffic signal control—a central component in enhancing traffic flow and reducing delays for all participants. In the literature, this challenge is referred to as the Intersection Traffic Signal Control Problem (ITSCP) (Eom and Kim, 2020), which involves the optimization of three primary components: the cycle, representing the overall time allocated for a signalization period; the phase sequence over this period, indicating the order of individual directional movements at the intersection; and the duration, encapsulating the total green signal timings for specific phases, during which signalling remains constant. In our case, we tackle the ITSCP variant that was introduced within the qualification round of Google Hash Code Competition in 2021 under the name Traffic Signaling Problem. Our approach to tackling the Traffic Signaling Problem is based on an Artificial Bee Colony (ABC) optimization method inspired by the foraging behaviour of honeybees (Pham et al., 2006). In our framework, a point in the search space, representing a solution, is defined using an array whose length corresponds to the number of intersections in the given problem instance. Each element within this array is a tuple object, storing information about the phase order and signaling time (i.e., green time) for each incoming street at the respective intersection. The neighbourhood comprises three operators: ShufflePhaseOrder, which randomly shuffles phase orders for selected intersections, with a selection probability of 47.5%; SwapPhaseOrder, which swaps positions of two randomly chosen incoming streets' green time phases at selected intersections, with a probability of 47.5%; and ChangeGreenTime, which randomly modifies the green time for one incoming street at selected intersections, with a probability of 5%. In each iteration, the waggle dance mechanism is employed to distinguish between various regions in the search space, represented by elite, best, and the remaining scout bees. These entities determine regions deemed very promising, moderately promising, or not promising at all. While the first two are used for exploitation purposes, the last one is used for exploration because they are replaced with new initialized solutions. For experimentation purposes, we utilized a challenging dataset comprising 10 problem instances, with 5 introduced in the Google Hash Code Competition and another 5 generated by our group. The size of these instances ranges up to 10,000 intersections, 95,000 streets, and 1,000 cars. The results indicate that our approach is competitive with state-of-the-art solvers. Among the 10 instances, the gap from the best-known solutions is less than 0.05% for four instances. For two instances, the gap is around 1%, for three instances it is around 5%, whereas for the remaining two instances, the gaps are at 12% and 39%, respectively. Furthermore, by implementing our approach in conjunction with the state-of-the-art solver, we were able to discover two best new solutions, one exhibiting a slight advantage of 0.001% and the other with a margin of 0.187%. Such results seem promising, with potential applications in smart city initiatives and the integration of intelligent traffic management systems in real-life cases. The solver source code and solutions are available on github.com/atlantiklimani/Traffic-Signaling/tree/new-idea-for-bee-hive.

Nr: 393
Title:

Analysis of the Echo Chamber Caused by Unexpected Opinions

Authors:

Akira Nakagawa, Yuichi Sei, Yasuyuki Tahara and Akihiko Ohsuga

Abstract: The concept of echo chamber involves immersing oneself in opinions that align with their own, leading to the reinforcement of those ideas. This occurrence has induced societal issue such as the spread of misinformation about the COVID-19 vaccine and the dissemination of rumors during the 2016 U.S. presidential election. Accurate detection of echo chambers can help prevent the exacerbation of such problems. While the conventional definition of an echo chamber attributes its emergence to the occurrence of like-minded thoughts, our analysis suggests that this definition is insufficient. Instead, it should be defined as a situation where individuals become absorbed in opinions that invoke interest and curiosity, independent of their own thoughts, potentially leading to a shift or reinforcement of perspectives. As demonstrated in existing research, individuals are naturally drawn to content that is unexpected. Unexpectedness here is defined as having singularity and providing an emotion of surprise. To assess the degree of echo chamber closure, cosine similarity between comments within the same video was computed, and the sum of positive cosine similarities was defined as the echo chamber score. For experimentation, data were collected from 4,970,000 comments on videos from the top five major Japanese news channels on YouTube related to COVID-19. To validate unexpectedness, singularity was calculated for each video based on cosine similarity, and all video contents were classified into emotions ("joy," "sadness," "expectation," "surprise," "anger," "fear," "disgust," "trust") using the luke-japanese-large-lite model fine-tuned on the Wrime dataset. Furthermore, users' stances on the videos were determined based on sentiment analysis (negative to positive), and shifts and reinforcement of positions were detected. Using these data, we compared the traditional scenario of opinions being reinforced with the proposed scenario of opinions changing to examine the accuracy of the definition of an echo chamber. We investigated differences in singularity, emotions induced by video content, and echo chamber scores between instances where opinions were reinforced and those where they changed. Results from a one-tailed test indicated that singularity during a change in position was higher than during reinforcement. In addition, the most frequently induced emotion by video content was "surprise." Besides, a comparison of the distribution of echo chamber scores between opinions changing and reinforcing revealed a uniform distribution when positions changed, indicating a relatively easily induced echo chamber. From these findings, it is evident that content with unexpected elements, unrestricted by past behaviors or interests, not only reinforces opinions but also has the potential to induce changes, making echo chambers more likely to occur. This underscores the need to reconsider the definition of echo chambers. By employing the proposed definition is anticipated that the occurrence of societal problems driven by echo chambers can be further reduced.