Conversational AI Academic Research by AI Experts

Post-editing has proven effective in improving the quality of text generated by large language models (LLMs) such as GPT-3.5 or GPT- 4, particularly when direct updating of their parameters to enhance text quality is infeasible or expensive. However, relying solely on smaller language models for post-editing can limit the LLMs’ ability to generalize across domains.

Large language models (LLMs) have demonstrated impressive capabilities in natural language generation. However, their output quality can be inconsistent, posing challenges for generating natural language from logical forms (LFs). This task requires the generated outputs to embody the exact semantics of LFs, without missing any LF semantics or creating any hallucinations.

Most Conversational AI agents in today's marketplace are unimodal in which only text is exchanged between the user and the bot. However, employing additional modes (e.g., image) in the interaction improves customer experience, potentially increasing efficiency and profits in applications such as online shopping.

Eva is a multimodal conversational system that helps users to accomplish their domain goals through collaborative dialogue. The system does this by inferring users’ intentions and plans to achieve those goals, detects whether obstacles are present to their achievement, finds plans to overcome those obstacles or to achieve higher-level goals, and plans its actions, including speech acts, to help users accomplish them

Distributed Computing (DC) involves a collection of tasks (or modules) executed in parallel on different compute nodes connected through a network. Cloud Service providers (CSP) such as Azure[1], Amazon[2], and Google[3] are providing DC platforms as PaaS (Platform As A Service) offerings. These cloud platforms reduce implementation costs but have a significant drawback as these services can be configured to spawn only a single type of compute node for executing all the tasks in the DC environment.

With the recent advancements in automated communication technology, many traditional businesses that rely on face-to-face communication have shifted to online portals. However, these online platforms often lack the personal touch essential for customer service. Research has shown that face-to- face communication is essential for building trust and empathy with customers.

Recent years have seen a huge increase in the popularity of information retrieval(IR) systems, which enable users to hold natural language conversations. IR Systems such as conversational agents are typically goal-oriented and use predefined queries to retrieve information from backend systems. Researchers have improved these agents to adapt to different modalities, such as images, sound, and video, to enhance the conversational experience.

The Handbook of Multimodal-Multisensor Interfaces provides the first authoritative resource on what has become the dominant paradigm for new computer interfaces-user input involving new media (speech, multi-touch, hand and body gestures, facial expressions, writing) embedded in multimodal-multisensor interfaces.

The Handbook of Multimodal-Multisensor Interfaces provides the first authoritative resource on what has become the dominant paradigm for new computer interfaces-- user input involving new media (speech, multi-touch, gestures, writing) embedded in multimodal-multisensor interfaces. These interfaces support smart phones, wearables, in-vehicle and robotic applications, and many other areas that are now highly competitive commercially.

Comprehensive resource that explains the W3C standards for multimodal interaction clear and straightforward way

Includes case studies of the use of the standards on a wide variety of devices, including mobile devices, tablets, wearables, and robots, in applications such as assisted living, language learning, and healthcare

This document describes the architecture of the Multimodal Interaction (MMI) framework [MMIF] and the interfaces between its constituents. The MMI Working Group is aware that multimodal interfaces are an area of active research and that commercial implementations are only beginning to emerge. Therefore we do not view our goal as standardizing a hypothetical existing common practice, but rather providing a platform to facilitate innovation and technical development.

AI Academic Research Matters

Recent AI Research

Eva: An Explainable Collaborative Dialogue System using a Theory of Mind

IMPROVING CROSS-DOMAIN LOW-RESOURCE TEXT GENERATION THROUGH LLM POST-EDITING: A PROGRAMMER-INTERPRETER APPROACH

Additional AI Research

Improving Cross-Domain Low-Resource Text Generation through LLM Post-Editing: A Programmer-Interpreter Approach (Feb. '24)

Read More

Reranking for Natural Language Generation from Logical Forms: A Study based on Large Language Models (Sept. '23)

Read More

Cross-modal multi-headed attention for long multimodal conversations (MAY. '23)

Read More

Eva: A Planning-Based Explanatory Collaborative Dialogue System (Feb. '23)

Read More

TreeOptimizer: A classifier-based task scheduling framework (Jan. '23)

Read More

Multimodal Embodied Conversational Agents: A discussion of architectures, frameworks and modules for commercial applications (Dec. '22)

Read More

Conversational Information Retrieval using Knowledge Graphs (Oct. '22)

Read More

Commercialization of multimodal systems (Jul. '19)

Read More

Standardized representations and markup languages for multimodal interaction (Jul. '19)

Read More

The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions (Jun. '19)

Read More

The Handbook of Multimodal-Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations (Jun. '17)

Read More

Multimodal Interaction with W3C Standards (Nov. '16)

Read More

Multimodal Architecture and Interfaces (Oct. '12)

Read More