Search this site
Embedded Files
HUX AI - HIVE IITK
  • HUX AI
  • HIVE
HUX AI - HIVE IITK
  • HUX AI
  • HIVE
  • More
    • HUX AI
    • HIVE

From The HIVE Lab at Indian Institute of Technology, Kanpur

Heads Up eXperience (HUX)

Your Always-On AI Companion for Human Computer Environment Interaction.

Sukanth K

Sudhiksha Kandavel Rajan

Rajashekhar V S

Gowdham Prabhakar

Paper

arXiv

YouTube

HIVE Lab, IITK

Abstract

While current personal smart devices excel in digital domains, they remain limited in supporting users during interactions with their physical surroundings. This paper introduces Heads Up eXperience (HUX), an AI system designed to address this limitation by serving as a continuous companion across extended reality (XR) environments. By monitoring users’ eye gaze, analyzing the surrounding environment, and interpreting verbal contexts, the system integrates and augments multimodal data to deliver comprehensive context understanding and real-time task-specific memory capture. This integrated approach enables more natural, empathetic, and intelligent interactions between the user and the HUX AI, advancing the vision of human–computer–environment interaction. Designed for deployment in smart glasses and XR headsets, HUX AI aspires to function as a personalized and effective AI companion in daily life. By unifying digital assistance with enhanced physical-world interaction, this technology has the potential to transform human–AI collaboration in both personal and professional settings, paving the way for the next generation of personal smart devices.


Scan the QR

Table of Contents

Heads Up eXperience (HUX)
Abstract
Table of Contents
Demonstration
Demonstrated Features
Architecture and Processes
Test Environment
Cite this Work - BibTex

From The HIVE Lab at Indian Institute of Technology, Kanpur

Demonstration

Sukanth K et al., 2024

Demonstration-1: Hello World with HUX_AI-Mk.1

From The HIVE Lab at Indian Institute of Technology, Kanpur

Demonstrated Features

Sukanth K et al., 2024

HUX AI Can Perceive Your Environment


“What do you see?” 

"Do you see any fruits?" 

"How many foods are present?"


Just ask questions about your surroundings, and let HUX AI provide the answers.

Ask Questions with Your Eye Gaze and Speech in Real Time


“What am I looking at?”

Simply look at what intrigues you, and ask multiple questions without referring to the object verbally. HUX understands. - just pure, intuitive interaction.

Ask About Changes in Your Environment in Real Time


“Did you observe anything change in the environment?”

We all miss things. It's human nature (selective attention). But what if you had an attentive companion?

HUX is always watching, always alert. It catches what you might miss in your immediate environment, keeping you informed and aware when you are focused on your task.

In a world where a single detail can make all the difference, HUX ensures you're always in the know.

Ask Multimodal Contextual Questions


”Does this go well with a Pizza?”

(without referring the object by speech)


”What are some recipes with it?”

(without referring the object by speech)


By combining multiple streams of information, HUX provides answers that are relevant and insightful.

This isn't just a chatbot. It's a multimodal context-aware collaborator, enhancing your understanding and decision-making like never before.

Recollect Multimodal Context Rich Conversations in Real-Time


"Can you repeat the recipe suggestion you gave me again?"


”Go through all the multi-modal data you collected during our conversation”


HUX doesn’t just listen and answer. It goes beyond instance-based multimodal question and answering by building on both previous and current multimodal data, including speech, eye gaze, and your scene. 

Need to reference something from five minutes ago and build questions on top of it? HUX has got you covered. It enriches your current conversation with immediate, relevant context, making every interaction more meaningful and productive.

Store Long Term Multimodal Context Rich Memories


"Snap person, his name is Imam. He is a new engineer in our lab."

Now, let's think bigger. Much bigger.

HUX doesn't just remember your current conversation; it creates a vast library of your experiences that you intend to store - “Snap.” 

Trying to recall that person you met last year in a crucial meeting? Or that brilliant idea you had on vacation while watching the sunset? Just ask. HUX will find it, complete with all the rich, multimodal context.

It's about having a powerful, searchable archive of your life's most important moments. Why search images and texts manually when multimodal memories can be created and retrieved by HUX using just mere contextual cues?

Enhance Your Visual Feed for Task-Specific Activities and Ask Questions About It


"Are there any defects in the circuit?"


In the world of HUX, your environment adapts to you.

Whether you're performing surgery or assembling a rocket or tinkering around with a PCB, HUX transforms your visual world to highlight what matters most.

And the best part? It's always ready to answer your questions about what you see.

This isn't just augmented reality - it's your reality, optimized.

Interaction:

1. User: "What am I seeing?"  

Agent: The image depicts a complex printed circuit board (PCB) with various electronic components, including resistors, capacitors, and integrated circuits (ICs). The PCB is green, typical for PCBs, and features soldered connections, wires, and joints.


2. User: "Are there any defects in the circuit?" 

Agent Activates Task Specific Perception (Right Image) using the above interaction context.

Agent: There appear to be three defects, all of which are mouse bites located at the bottom right and top left corners of the circuit board.


3. User: "What does this mean?"  

Agent: A mouse bite is a type of defect that occurs when there’s an imperfection or damage to the copper foil on the circuit board. It can be caused by physical stress, such as bending or flexing, which can lead to cracking or lifting of the copper layer. Mouse bites can cause electrical connections to fail, leading to errors or malfunctions in the system.


4. User: "How many problems did you see?"  

Agent: There are three defects identified as "Mouse bite" located at the bottom right and top left of the circuit board.

From The HIVE Lab at Indian Institute of Technology, Kanpur

Architecture and Processes

Sukanth K et al., 2024

System Architecture

Heads Up eXperience (HUX): Always-On AI Companion for Human Computer Environment Interaction

HUX AI Architecture: The origin of multi-modal data, data processing, multi-modal context processing, the generation of outputs. The ”LIOU Stack” means Last-In-Only-Used Stack”  

Real-Time Video Feed Filtering based on Events

Real-time Event-based Video Context Processing using VLMs for detected events of interest (EOIs). 

Task Specific Scene Enhancement

Task Specific Scene Enhancement, Question and Answering. Task-1, Task-2,... Task 5 indicates the different Task Specific Modes. (below)

Heads Up eXperience (HUX): Always-On AI Companion for Human Computer Environment Interaction

Left: PCB in the unlabelled condition. Right: PCB labelled with a computer vision model whose only goal is to find the defects in the PCB

Eye Gaze Processing

Processing of Eye Gaze based Regions of Interest in parallel with Scene Processing (Events of Interests)

Multi-Modal Contextual Memory Creation

Process of Multi-Modal Contextual Memory Pipeline (below)

Data Structure of a multi-modal Contextual Memory (below)

From The HIVE Lab at Indian Institute of Technology, Kanpur

Test Environment

Sukanth K et al., 2024

From The HIVE Lab at Indian Institute of Technology, Kanpur

Cite this Work - BibTex

Sukanth K et al., 2024

@article{rajan2024heads,

  title={Heads Up eXperience (HUX): Always-On AI Companion for Human Computer Environment Interaction},

  author={Sukanth, K and Rajan, Sudhiksha Kandavel and Rajashekhar, V S and Prabhakar, Gowdham},

  journal={arXiv preprint arXiv:2407.19492},

  year={2024},

}


Sukanth K

Sudhiksha Kandavel Rajan

Rajashekhar V S

Gowdham Prabhakar

From The HIVE Lab at Indian Institute of Technology, Kanpur

Heads Up eXperience (HUX)

Your Always-On Companion for Human Computer Environment Interaction.

arXiv

Paper

© Copyright HIVE. All Rights Reserved 

HIVE

Google Sites
Report abuse
Page details
Page updated
Google Sites
Report abuse