Skip to content
Openstream.ai Research

AI Academic Research Matters

Explore our curated collection of academic research papers that the visionary minds from Openstream.ai have authored. Our research pedigree spans several academic areas including multimodality, artificial intelligence, linguistics, and more

Recent AI Research

Openstream.ai's experts are at the cutting edge of multimodal innovation and standardization. We are happy to share their published works with you as you learn more about Conversational AI.

multimodal,-context-aware-AI-1
Eva: A Planning-Based Explanatory Collaborative Dialogue System 
by Philip R. Cohen and Lucian Galescu

IMPROVING CROSS-DOMAIN LOW-RESOURCE TEXT GENERATION THROUGH LLM POST-EDITING: A PROGRAMMER-INTERPRETER APPROACH 

by Zhuang Li, Levon Haroutunian, Raj Tumuluri, Philip Cohen, Gholamreza Haffari 

NATURAL-LANGUAGE-GENERATION

Improving Cross-Domain Low-Resource Text Generation through LLM Post-Editing: A Programmer-Interpreter Approach (Feb. '24)

Post-editing has proven effective in improving the quality of text generated by large language models (LLMs) such as GPT-3.5 or GPT- 4, particularly when direct updating of their parameters to enhance text quality is infeasible or expensive. However, relying solely on smaller language models for post-editing can limit the LLMs’ ability to generalize across domains. 

Read More
Reranking for Natural Language Generation from Logical Forms: A Study based on Large Language Models (Sept. '23)

Large language models (LLMs) have demonstrated impressive capabilities in natural language generation. However, their output quality can be inconsistent, posing challenges for generating natural language from logical forms (LFs). This task requires the generated outputs to embody the exact semantics of LFs, without missing any LF semantics or creating any hallucinations.

Read More
Cross-modal multi-headed attention for long multimodal conversations (MAY. '23)

Most Conversational AI agents in today's marketplace are unimodal in which only text is exchanged between the user and the bot. However, employing additional modes (e.g., image) in the interaction improves customer experience, potentially increasing efficiency and profits in applications such as online shopping. 

Read More
Eva: A Planning-Based Explanatory Collaborative Dialogue System (Feb. '23)

Eva is a multimodal conversational system that helps users to accomplish their domain goals through collaborative dialogue. The system does this by inferring users’ intentions and plans to achieve those goals, detects whether obstacles are present to their achievement, finds plans to overcome those obstacles or to achieve higher-level goals, and plans its actions, including speech acts, to help users accomplish them

Read More
TreeOptimizer: A classifier-based task scheduling framework (Jan. '23)

Distributed Computing (DC) involves a collection of tasks (or modules) executed in parallel on different compute nodes connected through a network. Cloud Service providers (CSP) such as Azure[1], Amazon[2], and Google[3] are providing DC platforms as PaaS (Platform As A Service) offerings. These cloud platforms reduce implementation costs but have a significant drawback as these services can be configured to spawn only a single type of compute node for executing all the tasks in the DC environment.

Read More
Multimodal Embodied Conversational Agents: A discussion of architectures, frameworks and modules for commercial applications (Dec. '22)

With the recent advancements in automated communication technology, many traditional businesses that rely on face-to-face communication have shifted to online portals. However, these online platforms often lack the personal touch essential for customer service. Research has shown that face-to- face communication is essential for building trust and empathy with customers. 

Read More
Conversational Information Retrieval using Knowledge Graphs (Oct. '22)

Recent years have seen a huge increase in the popularity of information retrieval(IR) systems, which enable users to hold natural language conversations. IR Systems such as conversational agents are typically goal-oriented and use predefined queries to retrieve information from backend systems. Researchers have improved these agents to adapt to different modalities, such as images, sound, and video, to enhance the conversational experience.

Read More
Commercialization of multimodal systems (Jul. '19)
Standardized representations and markup languages for multimodal interaction (Jul. '19)
The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions (Jun. '19)

The Handbook of Multimodal-Multisensor Interfaces provides the first authoritative resource on what has become the dominant paradigm for new computer interfaces-user input involving new media (speech, multi-touch, hand and body gestures, facial expressions, writing) embedded in multimodal-multisensor interfaces.

Read More
The Handbook of Multimodal-Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations (Jun. '17)

The Handbook of Multimodal-Multisensor Interfaces provides the first authoritative resource on what has become the dominant paradigm for new computer interfaces-- user input involving new media (speech, multi-touch, gestures, writing) embedded in multimodal-multisensor interfaces. These interfaces support smart phones, wearables, in-vehicle and robotic applications, and many other areas that are now highly competitive commercially.

Read More
Multimodal Interaction with W3C Standards (Nov. '16)

Comprehensive resource that explains the W3C standards for multimodal interaction clear and straightforward way

Includes case studies of the use of the standards on a wide variety of devices, including mobile devices, tablets, wearables, and robots, in applications such as assisted living, language learning, and healthcare

Read More
Multimodal Architecture and Interfaces (Oct. '12)

This document describes the architecture of the Multimodal Interaction (MMI) framework [MMIF] and the interfaces between its constituents. The MMI Working Group is aware that multimodal interfaces are an area of active research and that commercial implementations are only beginning to emerge. Therefore we do not view our goal as standardizing a hypothetical existing common practice, but rather providing a platform to facilitate innovation and technical development. 

Read More
Gartner Logo