Publications

2025
Journal Papers
Chunyue Xue, Letian Chen, and Matthew Gombolay Learning Disentangled Rewards from Heterogeneous, Suboptimal Demonstrations IEEE Robotics and Automation Letters. [Accepted April 26th, 2025]
Vanya Cohen, Geraud Nangue Tasse, Nakul Gopalan, Steven James, Matthew Gombolay, Ray Mooney, Benjamin Rosman Compositional Instruction Following with Language Models and Reinforcement Learning \| Preprint \| Abstract Combining reinforcement learning with language grounding is challenging as the agent needs to explore the environment while simultaneously learning multiple language-conditioned tasks. To address this, we introduce a novel method: the compositionally-enabled reinforcement learning language agent (CERLLA). Our method reduces the sample complexity of tasks specified with language by leveraging compositional policy representations and a semantic parser trained using reinforcement learning and in-context learning. We evaluate our approach in an environment requiring function approximation and demonstrate compositional generalization to novel tasks. Our method significantly outperforms the previous best non-compositional baseline in terms of sample complexity on 162 tasks designed to test compositional generalization. Our model attains a higher success rate and learns in fewer steps than the non-compositional baseline. It reaches a success rate equal to an oracle policy’s upper-bound performance of 92%. With the same number of environment steps, the baseline only reaches a success rate of 80%. Transactions on Machine Learning Research (TMLR).
Byeolyi Han, Minwoo Cho, Letian Chen, Rohan Paleja, Zixuan Wu, Sean Ye, Esmaeil Seraj, David Sidoti, and Matthew Gombolay Learning Multi-Agent Coordination for Replenishment At Sea \| Abstract Optimizing large-scale logistics is computationally challenging due to its scale and requirement to be robust to stochastic and time-varying weather disturbances. However, prior research in multi-agent reinforcement learning (MARL) does not address scenarios that capture complexity of logistics operations influenced by dynamic weather patterns. To address this gap, we suggest a new MARL environment, Marine that has two types of agents equipped with limited resources and integrates real wave data to model the influences of weather on the replenishment at sea (RAS) operation. To this end, we propose SchedHGNN, a novel MARL algorithm that incorporates a heterogeneous graph neural network and an intrinsic reward scheme to enhance agent coordination and mitigate challenges induced by environment non-stationarity. Our results show that the combination of effective RAS scheduling and improved communication enables our model to outperform competitive baselines by up to 37.8%. This achievement marks a significant advancement in applying MARL to complex, real-world logistics scenarios. IEEE Robotics and Automation Letters, Volume 10, Issue 2, Pages 1018-1025.
Conference Papers
Zhang Xi-Jia, Yue Guo, Shufei Chen, Simon Stepputtis, Matthew Gombolay, Katia Sycara, Joseph Campbell Model-Agnostic Policy Explanations with Large Language Models \| Preprint \| Abstract Reinforcement learning (RL) has demonstrated compelling performance in robotic tasks, but its success often hinges on the design of complex, ad hoc reward functions. Intelligent agents, such as robots, are increasingly deployed in real-world, human-centric environments. To foster appropriate human trust and meet legal and ethical standards, these agents must be able to explain their behavior. However, state-ofthe-art agents are typically driven by black-box models like deep neural networks, limiting their interpretability. We propose a method for generating natural language explanations of agent behavior based only on observed states and actions – without access to the agent’s underlying model. Our approach learns a locally interpretable surrogate model of the agent’s behavior from observations, which then guides a large language model to generate plausible explanations with minimal hallucination. Empirical results show that our method produces explanations that are more comprehensible and correct than those from baselines, as judged by both language models and human evaluators. Furthermore, we find that participants in a user study more accurately predicted the agent’s future actions when given our explanations, suggesting improved understanding of agent behavior. In Proc. Conference on Language Modeling (COLM). [Accepted July 7th, 2025; To Appear]
Letian Chen and Matthew Gombolay ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Interpretable Reward Design in Robotics \| Preprint \| Abstract Reinforcement learning (RL) has demonstrated compelling performance in robotic tasks, but its success often hinges on the design of complex, ad hoc reward functions. Researchers have explored how Large Language Models (LLMs) could enable non-expert users to specify reward functions more easily. However, LLMs struggle to balance the importance of different features, generalize poorly to out-of-distribution robotic tasks, and cannot represent the problem properly with only text-based descriptions. To address these challenges, we propose ELEMENTAL (intEractive LEarning froM dEmoNstraTion And Language), a novel framework that combines natural language guidance with visual user demonstrations to align robot behavior with user intentions better. By incorporating visual inputs, ELEMENTAL overcomes the limitations of text-only task specifications, while leveraging inverse reinforcement learning (IRL) to balance feature weights and match the demonstrated behaviors optimally. ELEMENTAL also introduces an iterative feedback-loop through self-reflection to improve feature, reward, and policy learning. Our experiment results demonstrate that ELEMENTAL outperforms prior work by 42.3% on task success, and achieves 41.3% better generalization in out-of-distribution tasks, highlighting its robustness in LfD. In Proc. International Conference on Machine Learning (ICML). [Accepted May 1st, 2025; To Appear] [26.9% Acceptance Rate]
Erin Hedlund-Botti, Julianna Schalkwyk, Nina Moorman, Chuxuan Yang, Lakshmi Seelam, and Matthew Gombolay Learning Interpretable Features from Interventions In Proc. Robotics: Science and Systems (RSS). [Accepted April 11th, 2025; To Appear]
Qingyu Xiao, Zixuan Wu, Matthew Gombolay Learning Dynamics of a Ball with Differentiable Factor Graph and Roto-Translational Invariant Representations \| Preprint \| Abstract Robots in dynamic environments need fast, accurate models of how objects move in their environments to support agile planning. In sports such as ping pong, analytical models often struggle to accurately predict ball trajectories with spins due to complex aerodynamics, elastic behaviors, and the challenges of modeling sliding and rolling friction. On the other hand, despite the promise of data-driven methods, machine learning struggles to make accurate, consistent predictions without precise input. In this paper, we propose an end-to-end learning framework that can jointly train a dynamics model and a factor graph estimator. Our approach leverages a GramSchmidt (GS) process to extract roto-translational invariant representations to improve the model performance, which can further reduce the validation error compared to data augmentation method. Additionally, we propose a network architecture that enhances nonlinearity by using self-multiplicative bypasses in the layer connections. By leveraging these novel methods, our proposed approach predicts the ball’s position with an RMSE of 37.2 mm of the paddle radius at the apex after the first bounce, and 71.5 mm after the second bounce. In Proc. International Conference on Robotics and Automation (ICRA).
Zixuan Wu, Zulfiqar Zaidi, Adithya Patil, Qingyu Xiao, Matthew Gombolay Learning Wheelchair Tennis Navigation from Broadcast Videos with Domain Knowledge Transfer and Diffusion Motion Planning* \| Preprint \| Abstract In this paper, we propose a novel and generalizable zero-shot knowledge transfer framework that distills expert sports navigation strategies from web videos into robotic systems with adversarial constraints and out-of-distribution image trajectories. Our pipeline enables diffusion-based imitation learning by reconstructing the full 3D task space from multiple partial views, warping it into 2D image space, closing the planning loop within this 2D space, and transfer constrained motion of interest back to task space. Additionally, we demonstrate that the learned policy can serve as a local planner in conjunction with position control. We apply this framework in the wheelchair tennis navigation problem to guide the wheelchair into the ball-hitting region. Our pipeline achieves a navigation success rate of 97.67% in reaching realworld recorded tennis ball trajectories with a physical robot wheelchair, and achieve a success rate of 68.49% in a realworld, real-time experiment on a full-sized tennis court. In Proc. International Conference on Robotics and Automation (ICRA).
Kin Man Lee, Sean Ye, Qingyu Xiao, Zixuan Wu, Zulfiqar Zaidi, David B. D’Ambrosio, Pannag R. Sanketi, Matthew Gombolay Learning Diverse Robot Striking Motions with Diffusion Models and Kinematically Constrained Gradient Guidance \| Preprint \| Abstract Advances in robot learning have enabled robots to generate skills for a variety of tasks. Yet, robot learning is typically sample inefficient, struggles to learn from data sources exhibiting varied behaviors, and does not naturally incorporate constraints. These properties are critical for fast, agile tasks such as playing table tennis. Modern techniques for learning from demonstration improve sample efficiency and scale to diverse data, but are rarely evaluated on agile tasks. In the case of reinforcement learning, achieving good performance requires training on high-fidelity simulators. To overcome these limitations, we develop a novel diffusion modeling approach that is offline, constraint-guided, and expressive of diverse agile behaviors. The key to our approach is a kinematic constraint gradient guidance (KCGG) technique that computes gradients through both the forward kinematics of the robot arm and the diffusion model to direct the sampling process. KCGG minimizes the cost of violating constraints while simultaneously keeping the sampled trajectory in-distribution of the training data. We demonstrate the effectiveness of our approach for time-critical robotic tasks by evaluating KCGG in two challenging domains: simulated air hockey and real table tennis. In simulated air hockey, we achieved a 25.4% increase in block rate, while in table tennis, we achieved a 17.3% increase in success rate compared to imitation learning baselines. In Proc. International Conference on Robotics and Automation (ICRA).
Varshith Sreeramdass, Rohan Paleja, Letian Chen, Sanne van Waveren, and Matthew Gombolay Generalized Behavior Learning from Diverse Demonstrations \| Preprint \| Abstract Learning robot control policies through Reinforcement Learning can be challenging due to the complexity of designing rewards, which often result in unexpected behaviors. Imitation Learning overcomes this issue by using demonstrations to create policies that mimic expert behaviors. However, experts often demonstrate varied approaches to tasks. Capturing this variability is crucial for understanding and adapting to diverse scenarios. Prior methods capture variability by optimizing for behavior diversity alongside imitation. Yet, naive formulations of diversity can result in meaningless representation of latent factors, hindering generalization to novel scenarios. We propose Guided Strategy Discovery (GSD), a novel regularization method that specifically promotes expert-specified, task-relevant diversity. In the recovery of unseen expert behaviors, GSD improves 11% over the next best baseline across three continuous control tasks on average. Code is available online at https://github.com/CORE-Robotics-Lab/GSD. In Proc. International Conference on Learning Representations (ICLR). [32.08% Acceptance Rate]
Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Zaidi, Megan Langwasser, Wei Xu, and Matthew Gombolay Generating CAD Code with Vision-Language Models for 3D Designs \| Preprint \| Abstract Generative AI has transformed the fields of Design and Manufacturing by providing efficient and automated methods for generating and modifying 3D objects. One approach involves using Large Language Models (LLMs) to generate Computer- Aided Design (CAD) scripting code, which can then be executed to render a 3D object; however, the resulting 3D object may not meet the specified requirements. Testing the correctness of CAD generated code is challenging due to the complexity and structure of 3D objects (e.g., shapes, surfaces, and dimensions) that are not feasible in code. In this paper, we introduce CADCodeVerify, a novel approach to iteratively verify and improve 3D objects generated from CAD code. Our approach works by producing ameliorative feedback by prompting a Vision-Language Model (VLM) to generate and answer a set of validation questions to verify the generated object and prompt the VLM to correct deviations. To evaluate CADCodeVerify, we introduce, CADPrompt, the first benchmark for CAD code generation, consisting of 200 natural language prompts paired with expert-annotated scripting code for 3D objects to benchmark progress. Our findings show that CADCodeVerify improves VLM performance by providing visual feedback, enhancing the structure of the 3D objects, and increasing the success rate of the compiled program. When applied to GPT-4, CADCodeVerify achieved a 7.30% reduction in Point Cloud distance and a 5.0% improvement in success rate compared to prior work. In Proc. International Conference on Learning Representations (ICLR). [32.08% Acceptance Rate]
2024
Journal Papers
Nina Moorman, Aman Singh, Manisha Natarajan, Erin Hedlund-Botti, Mariah Schrum, Chuxuan Yang, Lakshmi Seelam, Matthew Gombolay, and Nakul Gopalan Investigating strategies enabling novice users to teach plannable hierarchical tasks to robots \| Abstract Learning from demonstration (LfD) seeks to democratize robotics by enabling non-experts to intuitively program robots to perform novel skills through human task demonstration. Yet, LfD is challenging under a task and motion planning (TAMP) setting, as solving long-horizon manipulation tasks requires the use of hierarchical abstractions. Prior work has studied mechanisms for eliciting demonstrations that include hierarchical specifications for robotics applications but has not examined whether non-roboticist end-users are capable of providing such hierarchical demonstrations without explicit training from a roboticist for each task. We characterize whether, how, and which users can do so. Finding that the result is negative, we develop a series of training domains that successfully enable users to provide demonstrations that exhibit hierarchical abstractions. Our first experiment shows that fewer than half (35.71%) of our subjects provide demonstrations with hierarchical abstractions when not primed. Our second experiment demonstrates that users fail to teach the robot with adequately detailed TAMP abstractions, when not shown a video demonstration of an expert’s teaching strategy. Our experiments reveal the need for fundamentally different approaches in LfD to enable end-users to teach robots generalizable long-horizon tasks without being coached by experts at every step. Toward this goal, we developed and evaluated a set of TAMP domains for LfD in a third study. Positively, we find that experience obtained in different, training domains enables users to provide demonstrations with useful, plannable abstractions on new, test domains just as well as providing a video prescribing an expert’s teaching strategy in the new domain. The International Journal of Robotics Research
Eric Cole, Mariah Schrum, Enrico Opri, Arthur Nascimento, Paola Testini, Robert Gross, Matthew Gombolay, and Svjetlana Miocinovic ID: 332654 Automated Deep Brain Stimulation Parameter Selection via Meta-Active Learning of Evoked Potentials \| Abstract Methods To evaluate the relationship between mEP and side-effects, we applied 319 different stimulation settings at both low (< 30 Hz) and high (130 Hz) frequency and respectively measured mEP’s on multiple facial/limb muscles (2-6 channels per patient) and patient-reported motor side-effects for each setting. We similarly computed DLEP and mEP biomarkers for low-frequency stimulation data collected from 52 patients during DBS implantation surgery (total 1364 settings). Results We show that the presence of mEP can predict which DBS parameters induce motor side-effects (86% accuracy), that the muscles displaying mEP are consistent with side effect location, and we validate an automated mEP detection algorithm (92% accuracy vs. visual detection). We then apply meta-learning to predict DLEP and mEP values as a function of stimulation parameters, showing that a meta-trained neural network can predict biomarker values for unseen parameters from novel patients with low cross-validation error (DLEP: p< 0.0001, mEP: p< 0.005). We then use this modeled data to provide the basis for a simulation environment where meta-AL can practice DBS programming, showing that meta-AL can learn to find optimal parameters of new simulated patients in silico within 10 tested parameters (vs. 20-40 parameters per clinic visit). Last, we demonstrate an open-source system for real-time closed-loop biomarker recording and parameter selection in Python. Conclusion We show that motor evoked potentials are an accurate biomarker for DBS-induced side-effects, and that meta-active learning can efficiently find optimal parameters in silico. Our next step … Neuromodulation: Technology at the Neural Interface, Volume 27, Issue 7, Pages S154-S155.
YeSeul Kim, Seongyong Kim, Yilong Chen, HyunJin Yang, Seungwoo Kim, Sehoon Ha, Matthew Gombolay, Yonghan Ahn, and Yong Kwon Cho Understanding human-robot proxemic norms in construction: How do humans navigate around robots? \| Abstract As an increasing number of mobile robots are envisioned to work and interact with humans in construction workspaces, it becomes critical that robots’ spatial behaviors align with the expectations of human coworkers to ensure safe and efficient co-navigation. Yet, we have a limited understanding of what robotic spatial behaviors are perceived as socially normative under different work contexts. This paper investigated perceived appropriate proxemic behaviors of robots by examining actual construction practitioners’ spatial behaviors during interactions with robots. We developed a virtual environment of a typical indoor construction job site and explored the role of work conditions and human-robot relations on their proxemic behaviors. The findings reveal that participants tend to maintain a larger separation distance in more crowded work conditions compared to normal ones and when encountering a robot as a passerby compared to encountering a human. We further discuss the implications of our results for the development of robot path planning with appropriate distancing strategies. Automation in Construction, Volume 164, Page 105455.
Roger D Dias, Ryan E Harari, Marco A Zenati, Geoffrey Rance, Rithy Srey, Letian Chen, and Matthew Gombolay A Clinician-Centered Explainable Artificial Intelligence Framework for Decision Support in the Operating Theatre The Hamlyn Symposium on Medical Robotics, Volume 16, Page 35-36.
Eric Cole, Thomas Eggers, David Weiss, Mark Connolly, Matthew Gombolay, Nealen Laxpati, and Robert Gross Irregular optogenetic stimulation waveforms can induce naturalistic patterns of hippocampal spectral activity \| Abstract Objective. Therapeutic brain stimulation is conventionally delivered using constant-frequency stimulation pulses. Several recent clinical studies have explored how unconventional and irregular temporal stimulation patterns could enable better therapy. However, it is challenging to understand which irregular patterns are most effective for different therapeutic applications given the massively high-dimensional parameter space. Approach. Here we applied many irregular stimulation patterns in a single neural circuit to demonstrate how they can enable new dimensions of neural control compared to conventional stimulation, to guide future exploration of novel stimulation patterns in translational settings. We optogenetically excited the septohippocampal circuit with constant-frequency, nested pulse, sinusoidal, and randomized stimulation waveforms, systematically varying their amplitude and frequency parameters. Main results. We first found equal entrainment of hippocampal oscillations: all waveforms provided similar gamma-power increase, whereas no parameters increased theta-band power above baseline (despite the mechanistic role of the medial septum in driving hippocampal theta oscillations). We then compared each of the effects of each waveform on high-dimensional multi-band activity states using dimensionality reduction methods. Strikingly, we found that conventional stimulation drove predominantly ‘artificial’ (different from behavioral activity) effects, whereas all irregular waveforms induced activity patterns that more closely resembled behavioral activity. Significance. Our findings suggest that irregular stimulation patterns are not useful when the desired mechanism is to suppress or enhance a single frequency band. However, novel stimulation patterns may provide the greatest benefit for neural control applications where entraining a particular mixture of bands (e.g. if they are associated with different symptoms) or behaviorally-relevant activity is desired. Journal of Neural Engineering, Volume 21, Issue 3.
Grace Y Gombolay, Andrew Silva, Mariah Schrum, Nakul Gopalan, Jamika Hallman-Cooper, Monideep Dutt, and Matthew Gombolay Effects of Explainable Artificial Intelligence in Neurology Decision Support \| Abstract Objective Artificial intelligence (AI)-based decision support systems (DSS) are utilized in medicine but underlying decision-making processes are usually unknown. Explainable AI (xAI) techniques provide insight into DSS, but little is known on how to design xAI for clinicians. Here we investigate the impact of various xAI techniques on a clinician’s interaction with an AI-based DSS in decision-making tasks as compared to a general population. Methods We conducted a randomized, blinded study in which members of the Child Neurology Society and American Academy of Neurology were compared to a general population. Participants received recommendations from a DSS via a random assignment of an xAI intervention (decision tree, crowd sourced agreement, case-based reasoning, probability scores, counterfactual reasoning, feature importance, templated language, and no explanations). Primary outcomes included test performance and perceived explainability, trust, and social competence of the DSS. Secondary outcomes included compliance, understandability, and agreement per question. Results We had 81 neurology participants with 284 in the general population. Decision trees were perceived as the more explainable by the medical versus general population (P < 0.01) and as more explainable than probability scores within the medical population (P < 0.001). Increasing neurology experience and perceived explainability degraded performance (P = 0.0214). Performance was not predicted by xAI method but by perceived explainability. Interpretation xAI methods have different impacts on a medical versus general population; thus, xAI is not uniformly beneficial, and there is no one-size-fits-all approach. Further user-centered xAI research targeting clinicians and to develop personalized DSS for clinicians is needed. Annals of Clinical and Translational Neurology, Volume 11, Issue 5, Pages 1224-1235.
Manisha Natarajan and Matthew Gombolay Trust and Dependence on Robotic Decision Support Systems \| Abstract This article investigates people’s trust and dependence on robotic decision support systems (DSSs), which provide cognitive assistance through suggestions. Robotic DSSs may not always offer optimal suggestions, requiring people to rely carefully to maximize performance. We analyze user reliance on suboptimal robots for solving instantaneous and sequential decision-making tasks with a math and card game, respectively. In instantaneous tasks, we find that the users’ perceived anthropomorphism (p<.001) and the robot's behavior after a decision support failure (p<.001) significantly impact user trust. In a sequential task where the effectiveness of the human–robot team is not revealed until after several decisions, we find that introducing a user-initiated decision proposal before the robot reveals its recommendation can mitigate overreliance (p<.05) and users' task expertise is critical in determining appropriate dependence on the robot's suggestions (p<.01). Combined, these studies are synergistic and the first to jointly examine the influence of various factors on user trust and dependence, offering guidance for designing robotic DSSs to maximize human–robot task performance. IEEE Transactions on Robotics (T-RO), Volume 40, Pages 4670-4689.
Yi Ting Sam, Erin Hedlund-Botti, Manisha Natarajan, Jamison Heard, and Matthew Gombolay The Impact of Stress and Workload on Human Performance in Robot Teleoperation Tasks. \| Abstract Advances in robot teleoperation have enabled groundbreaking innovations in many fields, such as space exploration, healthcare, and disaster relief. The human operator’s performance plays a key role in the success of any teleoperation task, with prior evidence suggesting that operator stress and workload can impact task performance. As robot teleoperation is currently deployed in safety-critical domains, it is essential to analyze how different stress and workload levels impact the operator. We are unaware of any prior work investigating how both stress and workload impact teleoperation performance. We conducted a novel study (n=24) to jointly manipulate users’ stress and workload and analyze the user’s performance through objective and subjective measures. Our results indicate that, as stress increased, over 70% of our participants performed better up to a moderate level of stress; yet, the majority of participants performed worse as the workload increased. Importantly, our experimental design elucidated that stress and workload have related yet distinct impacts on task performance, with workload mediating the effects of distress on performance (p<.05). IEEE Transactions on Robotics (T-RO), Volume 40, Pages 4725 – 4744.
Esmaeil Seraj, Rohan Paleja, Luis Pimentel, Kin Man Lee, Zheyuan Wang, Daniel Martin, Matthew Sklar, John Zhang, Zahi Kakish, and Matthew Gombolay Heterogeneous Policy Networks for Composite Robot Team Communication and Coordination \| Abstract High-performing human–human teams learn intelligent and efficient communication and coordination strategies to maximize their joint utility. These teams implicitly understand the different roles of heterogeneous team members and adapt their communication protocols accordingly. Multiagent reinforcement learning (MARL) has attempted to develop computational methods for synthesizing such joint coordination–communication strategies, but emulating heterogeneous communication patterns across agents with different state, action, and observation spaces has remained a challenge. Without properly modeling agent heterogeneity, as in prior MARL work that leverages homogeneous graph networks, communication becomes less helpful and can even deteriorate the team’s performance. In the past, we proposed heterogeneous policy networks (HetNet) to learn efficient and diverse communication models for coordinating cooperative heterogeneous teams. In this extended work, we extend HetNet to support scaling heterogeneous robot teams. Building on heterogeneous graph-attention networks, we show that HetNet not only facilitates learning heterogeneous collaborative policies, but also enables end-to-end training for learning highly efficient binarized messaging. Our empirical evaluation shows that HetNet sets a new state-of-the-art in learning coordination and communication strategies for heterogeneous multiagent teams by achieving an 5.84% to 707.65% performance improvement over the next-best baseline across multiple domains while simultaneously achieving a 200× reduction in the required communication bandwidth. IEEE Transactions on Robotics (T-RO), Volume 40, Pages 3833-3849.
Mariah Schrum, Emily Sumner, Matthew Gombolay, and Andrew Best MAVERIC: A Data-Driven Approach to Personalized Autonomous Driving \| Preprint \| Abstract Personalization of autonomous vehicles (AV) may significantly increase trust, use, and acceptance. In particular, we hypothesize that the similarity of an AV’s driving style compared to the end-user’s driving style will have a major impact on end user’s willingness to use the AV. To investigate the impact of driving style on user acceptance, we 1) develop a data-driven approach to personalize driving style and 2) demonstrate that personalization significantly impacts attitudes towards AVs. Our approach learns a high-level model that tunes low-level controllers to ensure safe and personalized control of the AV. The key to our approach is learning an informative, personalized embedding that represents a user’s driving style. Our framework is capable of calibrating the level of aggression so as to optimize driving style based upon driver preference. Across two human subject studies (n = 54), we first demonstrate our approach mimics the driving styles of end-users and can tune attributes of style (e.g., aggressiveness). Second, we investigate the factors (e.g., trust, personality etc.) that impact homophily, i.e. an individual’s preference for a driving style similar to their own. We find that our approach generates driving styles consistent with end-user styles (p < .001) and participants rate our approach as more similar to their level of aggressiveness (p = .002). We find that personality (p < .001), perceived similarity (p < .001), and high-velocity driving style (p = .0031) significantly modulate the effect of homophily. IEEE Transactions on Robotics (T-RO)., Volume 40, Pages 1952-1965.
Rayan Ebnali Harari, Roger Dias, Lauren Kennedy-Metz, Giovanna Varni, Matthew Gombolay, Steven Yule, Eduardo Salas, and Marco Zenati Deep Learning Analysis of Surgical Video Recordings to Assess Nontechnical Skills \| Abstract This cross-sectional study of 30 cardiac surgical procedures found specific OR team members’ motion features, such as average trajectory and displacement acceleration, to positively correlate with higher nontechnical skills performance as assessed by the Non-Technical Skills for Surgeons (NOTSS) assessment tool, while displacement entropy was negatively correlated with NOTSS scores. These findings suggest that certain patterns of team motion in the OR are associated with a team’s nontechnical skills. JAMA Network Open, Volume 7, Issue 7.
Pradyumna Tambwekar and Matthew Gombolay Towards Reconciling Usability and Usefulness of Explainable AI Methodologies \| Abstract Safefy-critical domains often employ autonomous agents which follow a sequential decision-making setup, whereby the agent follows a policy to dictate the appropriate action at each step. AI-practitioners often employ reinforcement learning algorithms to allow an agent to find the best policy. However, sequential systems often lack clear and immediate signs of wrong actions, with consequences visible only in hindsight, making it difficult to humans to understand system failure. In reinforcement learning, this is referred to as the credit assignment problem. To effectively collaborate with an autonomous system, particularly in a safety-critical setting, explanations should enable a user to better understand the policy of the agent and predict system behavior so that users are cognizant of potential failures and these failures can be diagnosed and mitigated. However, humans are diverse and have innate biases or preferences which may enhance or impair the utility of a policy explanation of a sequential agent. Therefore, in this paper, we designed and conducted human-subjects experiment to identify the factors which influence the perceived usability with the objective usefulness of policy explanations for reinforcement learning agents in a sequential setting. Our study had two factors: the modality of policy explanation shown to the user (Tree, Text, Modified Text, and Programs) and the “first impression” of the agent, i.e., whether the user saw the agent succeed or fail in the introductory calibration video. Our findings characterize a preference-performance tradeoff wherein participants perceived language-based policy explanations to be significantly more useable; however, participants were better able to objectively predict the agent’s behavior when provided an explanation in the form of a decision tree. Our results demonstrate that user-specific factors, such as computer science experience (p < 0.05), and situational factors, such as watching agent crash (p < 0.05), can significantly impact the perception and usefulness of the explanation. This research provides key insights to alleviate prevalent issues regarding innapropriate compliance and reliance, which are exponentially more detrimental in safety-critical settings, providing a path forward for XAI developers for future work on policy-explanations. Frontiers in Robotics and AI, Volume 11, Page 1375490.
Panagiotis Tsiotras, Matthew Gombolay, and Jakob Foerster Editorial: Decision-Making and Planning for Multi-Agent Systems \| Abstract Multi-agent systems are widely applicable to real-world applications ranging from warehouse automation to environmental monitoring, autonomous driving, and even computer game simulations. Compared to single agents, coordinated multi-agent systems have greater potential to tackle time-sensitive, complex, and large-scale problems. However, orchestrating the behaviors of multi-agent systems for cooperative or non-cooperative tasks is a difficult computational optimization problem. Even though decision-making and reinforcement learning (RL) techniques for single-agent scenarios have seen tremendous achievements in recent years, we have not yet seen a widespread translation of these single-agent techniques to the multi-agent domain. Unfortunately, translating single-agent techniques to the multi-agent setting is not straightforward, and many challenges exist, stemming primarily from the intrinsic nature of the multi-agent systems, including complex interaction dynamics, constrained inter-agent communication, various notions of optimality, heterogeneity, as well as the potential presence of adversaries. The objective of this Research Topic is to report on the recent advances in multi-agent planning and decision-making. While decision-making and planning for a single agent has been extensively studied, the multi-agent version of the problem has not received the same attention in the literature and is much less understood. In addition to sensing and planning challenges common to the single and multi-agent settings, multi-agent systems must also deal with the additional requirements of communication and coordination among the agents. Further, several multi-agent architectures assume very large numbers of agents, each having limited computational resources. Decentralized architectures that consider these limitations are essential for most practical applications, such as in low-size, -weight, and -power (low-SWaP) settings. Moreover, coordination and cooperation among the agents can be either altruistic or individualistic, and deciding between these two options (and when) is not straightforward. By reporting on the latest advances in the field, this Research Topic aims to make the community aware of the existing challenges of the multi-agent decision-making problem, and disseminate recent and novel research trends in this area. Frontiers in Robotics and AI, Volume 11, Page 1422344.
Lakshita Dodeja, Pradyumna Tambwekar, Erin Hedlund-Botti, and Matthew Gombolay Towards the Design of User-Centric Strategy Recommendation Systems for Collaborative Human-AI Tasks \| Abstract Artificial Intelligence is being employed by humans to collaboratively solve complicated tasks for search and rescue, manufacturing, etc. Efficient teamwork can be achieved by understanding user preferences and recommending different strategies for solving the particular task to humans. Prior work has focused on personalization of recommendation systems for relatively well-understood tasks in the context of e-commerce or social networks. In this paper, we seek to understand the important factors to consider while designing user-centric strategy recommendation systems for decision-making. We conducted a human-subjects experiment (n=60) for measuring the preferences of users with different personality types towards different strategy recommendation systems. We conducted our experiment across four types of strategy recommendation modalities that have been established in prior work: (1) Single strategy recommendation, (2) Multiple similar recommendations, (3) Multiple diverse recommendations, (4) All possible strategies recommendations. While these strategy recommendation schemes have been explored independently in prior work, our study is novel in that we employ all of them simultaneously and in the context of strategy recommendations, to provide us an in-depth overview of the perception of different strategy recommendation systems. We found that certain personality traits, such as conscientiousness, notably impact the preference towards a particular type of system (p < 0.01). Finally, we report an interesting relationship between usability, alignment, and perceived intelligence wherein greater perceived alignment of recommendations with one’s own preferences leads to higher perceived intelligence (p < 0.01) and higher usability (p < 0.01). International Journal of Human-Computer Studies (IJHCS).
Conference Papers
Matthew Gombolay Human-robot alignment through interactivity and interpretability: don’t assume a “spherical human” \| Abstract Interactive and interpretable robot learning can help to democratize robots, placing the power of assistive robotic systems in the hands of endusers. While machine learning-based approaches to robotics have achieved impressive results, robot learning is still a feat of costly engineering performed in controlled settings and relying upon impractical assumptions about humans. To achieve a vision in which robots can be integrated sustainably into our daily lives for robotic assistance, researchers must take a human-centered approach and develop novel approaches for human-robot alignment of robot values and behaviors. This paper amalgamates recent human factors insights and computational techniques that can support human-robot alignment through interactive and interpretable robot learning and teaming. In Proc. The Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI), Issue 976, Pages 8523-8528.
Erin Hedlund-Botti, Lakshmi Seelam, Chuxuan Yang, Nathaniel Belles, Zulfiqar Zaidi, and Matthew Gombolay Developing Design Guidelines for Older Adults with Robot Learning from Demonstration \| Abstract Assistive in-home robots have the potential to enable older adults to age in place by offloading mentally or physically demanding tasks to a robot. However, one challenge for in-home robots is that each individual will have differing needs, preferences, and home environments, which can all change over time. Learning from Demonstration (LfD) is one solution to enable non-expert users to communicate their differing and changing preferences to a robot, but LfD has not been evaluated with a population of older adults. In a human-subjects experiment where participants teach a robot via LfD, we characterize disparities between older and younger adult participants in terms of robot performance, usability, and participant perceptions. We find that older adults are significantly more critical of the robot’s performance and found the LfD process less usable than younger adults. Based on participant performance and feedback, we present design guidelines that will enable roboticists to increase LfD accessibility across demographics. In Proc. Robotics: Science and Systems (RSS). [31% Acceptance Rate]
Tianyu Li, Hyunyoung Jung, Matthew Gombolay, Yong Kwon Cho, and Sehoon Ha CrossLoco: Human Motion Driven Control of Legged Robots via Guided Unsupervised Reinforcement Learning \| Abstract Human motion driven control (HMDC) is an effective approach for generating natural and compelling robot motions while preserving high-level semantics. However, establishing the correspondence between humans and robots with different body structures is not straightforward due to the mismatches in kinematics and dynamics properties, which causes intrinsic ambiguity to the problem. Many previous algorithms approach this motion retargeting problem with unsupervised learning, which requires the prerequisite skill sets. However, it will be extremely costly to learn all the skills without understanding the given human motions, particularly for high-dimensional robots. In this work, we introduce CrossLoco, a guided unsupervised reinforcement learning framework that simultaneously learns robot skills and their correspondence to human motions. Our key innovation is to introduce a cycle-consistency-based reward term designed to maximize the mutual information between human motions and robot states. We demonstrate that the proposed framework can generate compelling robot motions by translating diverse human motions, such as running, hopping, and dancing. We quantitatively compare our CrossLoco against the manually engineered and unsupervised baseline algorithms along with the ablated versions of our framework and demonstrate that our method translates human motions with better accuracy, diversity, and user preference. We also showcase its utility in other applications, such as synthesizing robot movements from language input and enabling interactive robot control. In Proc. International Conference on Learning Representations (ICLR). [31% Acceptance Rate]
Rohan R Paleja, Michael Joseph Munje, Kimberlee Chestnut Chang, Reed Jensen, and Matthew Gombolay Designs for Enabling Collaboration in Human-Machine Teaming via Interactive and Explainable Systems \| Abstract Collaborative robots and machine learning-based virtual agents are increasingly entering the human workspace with the aim of increasing productivity and enhancing safety. Despite this, we show in a ubiquitous experimental domain, Overcooked-AI, that state-of-the-art techniques for human-machine teaming (HMT), which rely on imitation or reinforcement learning, are brittle and result in a machine agent that aims to decouple the machine and human’s actions to act independently rather than in a synergistic fashion. To remedy this deficiency, we develop HMT approaches that enable iterative, mixed-initiative team development allowing end-users to interactively reprogram interpretable AI teammates. Our 50-subject study provides several findings that we summarize into guidelines. While all approaches underperform a simple collaborative heuristic (a critical, negative result for learning-based methods), we find that white-box approaches supported by interactive modification can lead to significant team development, outperforming white-box approaches alone, and that black-box approaches are easier to train and result in better HMT performance highlighting a tradeoff between explainability and interactivity versus ease-of-training. Together, these findings present three important future research directions: 1) Improving the ability to generate collaborative agents with white-box models, 2) Better learning methods to facilitate collaboration rather than individualized coordination, and 3) Mixed-initiative interfaces that enable users, who may vary in ability, to improve collaboration. In Proc. The Conference on Neural Information Processing Systems (NeurIPS). [25.8% Acceptance Rate] [Poster]
Sean Ye and Matthew Gombolay Efficient Trajectory Forecasting and Generation with Conditional Flow Matching \| Preprint \| Abstract Trajectory prediction and generation are vital for autonomous robots navigating dynamic environments. While prior research has typically focused on either prediction or generation, our approach unifies these tasks to provide a versatile framework and achieve state-of-the-art performance. Diffusion models, which are currently state-of-the-art for learned trajectory generation in long-horizon planning and offline reinforcement learning tasks, rely on a computationally intensive iterative sampling process. This slow process impedes the dynamic capabilities of robotic systems. In contrast, we introduce Trajectory Conditional Flow Matching (T-CFM), a novel datadriven approach that utilizes flow matching techniques to learn a solver time-varying vector field for efficient and fast trajectory generation. We demonstrate the effectiveness of TCFM on three separate tasks: adversarial tracking, real-world aircraft trajectory forecasting, and long-horizon planning. Our model outperforms state-of-the-art baselines with an increase of 35% in predictive accuracy and 142% increase in planning performance. Notably, T-CFM achieves up to 100× speedup compared to diffusion-based models without sacrificing accuracy, which is crucial for real-time decision making in robotics. In Proc. The 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). [47.5% Acceptance Rate]
Qingyu Xiao, Zulfiqar Zaidi, and Matthew Gombolay Multi-Camera Asynchronous Ball Localization and Trajectory Prediction with Factor Graphs and Human Poses \| Preprint \| Abstract The rapid and precise localization and prediction of a ball are critical for developing agile robots in ball sports, particularly in sports like tennis characterized by high-speed ball movements and powerful spins. The Magnus effect induced by spin adds complexity to trajectory prediction during flight and bounce dynamics upon contact with the ground. In this study, we introduce an innovative approach that combines a multi-camera system with factor graphs for real-time and asynchronous 3D tennis ball localization. Additionally, we estimate hidden states like velocity and spin for trajectory prediction. Furthermore, to enhance spin inference early in the ball’s flight, where limited observations are available, we integrate human pose data using a temporal convolutional network (TCN) to compute spin priors within the factor graph. This refinement provides more accurate spin priors at the beginning of the factor graph, leading to improved early-stage hidden state inference for prediction. Our result shows the trained TCN can predict the spin priors with RMSE of 5.27 Hz. Integrating TCN into the factor graph reduces the prediction error of landing positions by over 63.6% compared to a baseline method that utilized an adaptive extended Kalman filter. In Proc. The International Conference on Robotics and Automation (ICRA).
Manisha Natarajan, Chunyue Xue, Sanne van Waveren, Karen Feigh, and Matthew Gombolay Mixed-Initiative Human-Robot Teaming under Suboptimality with Online Bayesian Adaptation \| Abstract For effective human-agent teaming, robots and other artificial intelligence (AI) agents must infer their human partner’s abilities and behavioral response patterns and adapt accordingly. Most prior works make the unrealistic assumption that one or more teammates can act near-optimally. In real-world collaboration, humans and autonomous agents can be suboptimal, especially when each only has partial domain knowledge. In this work, we develop computational modeling and optimization techniques for enhancing the performance of human-agent teams, where both the human and the robotic agent have asymmetric capabilities and act suboptimally due to incomplete environmental knowledge. We adopt an online Bayesian approach that enables a robot to infer people’s willingness to comply with its assistance in a sequential decision-making game. Our user studies show that user preferences and team performance vary with robot intervention styles, and our approach for mixed-initiative collaboration enhances objective team performance ($p<.001$) and subjective measures, such as user's trust (p<.001) and perceived likeability of the robot (p<.001). In Proc. The International Conference on Autonomous Agents and Multiagent Systems (AAMAS). [Oral Talk; 25% Acceptance Rate]
Andrew Silva, Pradyumna Tambwekar, Mariah Schrum, and Matthew Gombolay Towards Balancing Preference and Performance through Adaptive Personalized Explainability \| Abstract As robots and digital assistants are deployed in the real world, these agents must be able to communicate their decision-making criteria to build trust, improve human-robot teaming, and enable collaboration. While the field of explainable artificial intelligence (xAI) has made great strides to enable such communication, these advances often assume that one xAI approach is ideally suited to each problem (e.g., decision trees to explain how to triage patients in an emergency or feature-importance maps to explain radiology reports). This fails to recognize that users have diverse experiences or preferences for interaction modalities. In this work, we present two user-studies set in a simulated autonomous vehicle (AV) domain. We investigate (1) population-level preferences for xAI and (2) personalization strategies for providing robot explanations. We find significant differences between xAI modes (language explanations, feature-importance maps, and decision trees) in both preference (p < 0.01) and performance (p < 0.05). We also observe that a participant's preferences do not always align with their performance, motivating our development of an adaptive personalization strategy to balance the two. We show that this strategy yields significant performance gains (p < 0.05), and we conclude with a discussion of our findings and implications for xAI in human-robot interactions. In Proc. ACM/IEEE International Conference on Human-Robot Interaction. [24.9% Acceptance Rate]
Yue Yang, Letian Chen, Zulfiqar Zaidi, Sanne van Waveren, Arjun Krishna, and Matthew Gombolay Enhancing Safety in Learning from Demonstration Algorithms via Control Barrier Function Shielding \| Preprint \| Abstract Learning from Demonstration (LfD) is a powerful method for non-roboticists end-users to teach robots new tasks, enabling them to customize the robot behavior. However, modern LfD techniques do not explicitly synthesize safe robot behavior, which limits the deployability of these approaches in the real world. To enforce safety in LfD without relying on experts, we propose a new framework, ShiElding with Control barrier fUnctions in inverse REinforcement learning (SECURE), which learns a customized Control Barrier Function (CBF) from end-users that prevents robots from taking unsafe actions while imposing little interference with the task completion. We evaluate SECURE in three sets of experiments. First, we empirically validate SECURE learns a high-quality CBF from demonstrations and outperforms conventional LfD methods on simulated robotic and autonomous driving tasks with improvements on safety by up to 100%. Second, we demonstrate that roboticists can leverage SECURE to outperform conventional LfD approaches on a real-world knife-cutting, meal-preparation task by 12.5% in task completion while driving the number of safety violations to zero. Finally, we demonstrate in a user study that non-roboticists can use SECURE to effectively teach the robot safe policies that avoid collisions with the person and prevent coffee from spilling. In Proc. ACM/IEEE International Conference on Human-Robot Interaction. [24.9% Acceptance Rate]
Workshop/Symposium Papers and Doctoral Consortia
Erin Hedlund-Botti, Nina Moorman, Julianna Schalkwyk, Chuxuan Yang, Lakshmi Seelam, Sanne van Waveren, Russell Perkins, Paul Robinette, Matthew Gombolay Towards Learning Interpretable Features from Interventions \| Abstract Assistive in-home robots are one solution to help an aging population. To provide effective care, robots must be adaptable and personalizable, as everyone has different needs and preferences. We want to enable people to communicate their preferences to a robot intuitively. In this work, we propose a method where users critique the robot by intervening when the robot makes a mistake or does not follow the user’s preference, in a learning from demonstration setting. The robot then learns interpretable features about the users’ goals and preferences based on the intervention. We propose a series of user studies to inform and validate our framework. In Proc. The IEEE/ACM International Conference on Human-Robot Interaction (HRI) LEAP Workshop
Julianna Schalkwyk, Sanne van Waveren, Mariah Schrum, Alex London, Paul Robinette and Matthew Gombolay Ethics of Paternalistic Robots in the Care of Older Adults In Proc. AAAI Fall Symposium Series on AI for Aging in Place.
Rynaa Grover, Aryan Vats, Nina Moorman, Aviral Agrawal, and Matthew Gombolay Better Apprenticeship Learning with LLM Explanations \| Abstract As the population ages, care robots will play an increasing role in assisting caregiving by taking on repetitive or physically cumbersome activities. To effectively provide care, robotic agents must be able to meet the needs and preferences of care receivers. However, these needs and preferences may change over time, making it intractable to pre-define the way the care robot should act before deployment. Instead, the care robot should be able to learn directly from non-expert end-user demonstrations. However, prior work investigating the feasibility of learning a policy from older adult demonstrations finds that older adult demonstrators desire a better understanding of what the robot needs them to do, and how. To help demonstrators understand how to improve on suboptimal or heterogeneous demonstrations, we propose to utilize a Large Language Model to provide human-interpretable explanations of Shapley values of a policy. These explanations enable the demonstrator to understand how the policy is performing, and what changes are needed, informing their corrective demonstrations. We showcase our framework’s performance in deterministic and stochastic versions of Wumpus World. In Proc. AAAI Fall Symposium Series on AI for Aging in Place.
Zhaoxin Li, Manisha Natarajan, Letian Chen, Zixuan Wu, Paul Ogara, Paulo Borges, Geoff Rance, Rithy Srey, Ryan E. Harari, Marco A. Zenati, Roger D. Dias, and Matthew Gombolay Using ML for Perfusionists’ Decision Prediction for Robotic-Assisted Cardiopulmonary Bypass in Cardiac Surgery In Proc. Hamlyn Symposium on Medical Robotics 2024 Workshop on Hybrid Human-Machine Interaction in Surgery.
arXiv Papers
Letian Chen and Matthew Gombolay ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics \| Abstract Reinforcement learning (RL) has demonstrated compelling performance in robotic tasks, but its success often hinges on the design of complex, ad hoc reward functions. Researchers have explored how Large Language Models (LLMs) could enable non-expert users to specify reward functions more easily. However, LLMs struggle to balance the importance of different features, generalize poorly to out-of-distribution robotic tasks, and cannot represent the problem properly with only text-based descriptions. To address these challenges, we propose ELEMENTAL (intEractive LEarning froM dEmoNstraTion And Language), a novel framework that combines natural language guidance with visual user demonstrations to align robot behavior with user intentions better. By incorporating visual inputs, ELEMENTAL overcomes the limitations of text-only task specifications, while leveraging inverse reinforcement learning (IRL) to balance feature weights and match the demonstrated behaviors optimally. ELEMENTAL also introduces an iterative feedback-loop through self-reflection to improve feature, reward, and policy learning. Our experiment results demonstrate that ELEMENTAL outperforms prior work by 42.3% on task success, and achieves 41.3% better generalization in out-of-distribution tasks, highlighting its robustness in LfD. In arXiv Preprint.
Qingyu Xiao, Zixuan Wu, and Matthew Gombolay Learning Dynamics of a Ball with Differentiable Factor Graph and Roto-Translational Invariant Representations \| Abstract Robots in dynamic environments need fast, accurate models of how objects move in their environments to support agile planning. In sports such as ping pong, analytical models often struggle to accurately predict ball trajectories with spins due to complex aerodynamics, elastic behaviors, and the challenges of modeling sliding and rolling friction. On the other hand, despite the promise of data-driven methods, machine learning struggles to make accurate, consistent predictions without precise input. In this paper, we propose an end-to-end learning framework that can jointly train a dynamics model and a factor graph estimator. Our approach leverages a Gram-Schmidt (GS) process to extract roto-translational invariant representations to improve the model performance, which can further reduce the validation error compared to data augmentation method. Additionally, we propose a network architecture that enhances nonlinearity by using self-multiplicative bypasses in the layer connections. By leveraging these novel methods, our proposed approach predicts the ball’s position with an RMSE of 37.2 mm of the paddle radius at the apex after the first bounce, and 71.5 mm after the second bounce. In arXiv Preprint.
Zixuan Wu, Zulfiqar Zaidi, Adithya Patil, Qingyu Xiao, and Matthew Gombolay Learning Wheelchair Tennis Navigation from Broadcast Videos with Domain Knowledge Transfer and Diffusion Motion Planning \| Abstract In this paper, we propose a novel and generalizable zero-shot knowledge transfer framework that distills expert sports navigation strategies from web videos into robotic systems with adversarial constraints and out-of-distribution image trajectories. Our pipeline enables diffusion-based imitation learning by reconstructing the full 3D task space from multiple partial views, warping it into 2D image space, closing the planning loop within this 2D space, and transfer constrained motion of interest back to task space. Additionally, we demonstrate that the learned policy can serve as a local planner in conjunction with position control. We apply this framework in the wheelchair tennis navigation problem to guide the wheelchair into the ball-hitting region. Our pipeline achieves a navigation success rate of 97.67% in reaching real-world recorded tennis ball trajectories with a physical robot wheelchair, and achieve a success rate of 68.49% in a real-world, real-time experiment on a full-sized tennis court. In arXiv Preprint.
Kin Man Lee, Sean Ye, Qingyu Xiao, Zixuan Wu, Zulfiqar Zaidi, David B D’Ambrosio, Pannag R Sanketi, and Matthew Gombolay Learning Diverse Robot Striking Motions with Diffusion Models and Kinematically Constrained Gradient Guidance \| Abstract Advances in robot learning have enabled robots to generate skills for a variety of tasks. Yet, robot learning is typically sample inefficient, struggles to learn from data sources exhibiting varied behaviors, and does not naturally incorporate constraints. These properties are critical for fast, agile tasks such as playing table tennis. Modern techniques for learning from demonstration improve sample efficiency and scale to diverse data, but are rarely evaluated on agile tasks. In the case of reinforcement learning, achieving good performance requires training on high-fidelity simulators. To overcome these limitations, we develop a novel diffusion modeling approach that is offline, constraint-guided, and expressive of diverse agile behaviors. The key to our approach is a kinematic constraint gradient guidance (KCGG) technique that computes gradients through both the forward kinematics of the robot arm and the diffusion model to direct the sampling process. KCGG minimizes the cost of violating constraints while simultaneously keeping the sampled trajectory in-distribution of the training data. We demonstrate the effectiveness of our approach for time-critical robotic tasks by evaluating KCGG in two challenging domains: simulated air hockey and real table tennis. In simulated air hockey, we achieved a 25.4% increase in block rate, while in table tennis, we saw a 17.3% increase in success rate compared to imitation learning baselines. In arXiv Preprint.
Zhaoxin Li, Letian Chen, Rohan Paleja, Subramanya Nageshrao, and Matthew Gombolay Faster Model Predictive Control via Self-Supervised Initialization Learning \| Abstract Optimization for robot control tasks, spanning various methodologies, includes Model Predictive Control (MPC). However, the complexity of the system, such as non-convex and non-differentiable cost functions and prolonged planning horizons often drastically increases the computation time, limiting MPC’s real-world applicability. Prior works in speeding up the optimization have limitations on solving convex problem and generalizing to hold out domains. To overcome this challenge, we develop a novel framework aiming at expediting optimization processes. In our framework, we combine offline self-supervised learning and online fine-tuning through reinforcement learning to improve the control performance and reduce optimization time. We demonstrate the effectiveness of our method on a novel, challenging Formula-1-track driving task, achieving 3.9% higher performance in optimization time and 3.6% higher performance in tracking accuracy on challenging holdout tracks. In arXiv Preprint.
Zixuan Wu, Sean Ye, Manisha Natarajan, and Matthew Gombolay Faster Model Predictive Control via Self-Supervised Initialization Learning \| Abstract Reinforcement Learning- (RL-)based motion planning has recently shown the potential to outperform traditional approaches from autonomous navigation to robot manipulation. In this work, we focus on a motion planning task for an evasive target in a partially observable multi-agent adversarial pursuit-evasion games (PEG). These pursuit-evasion problems are relevant to various applications, such as search and rescue operations and surveillance robots, where robots must effectively plan their actions to gather intelligence or accomplish mission tasks while avoiding detection or capture themselves. We propose a hierarchical architecture that integrates a high-level diffusion model to plan global paths responsive to environment data while a low-level RL algorithm reasons about evasive versus global path-following behavior. Our approach outperforms baselines by 51.2% by leveraging the diffusion model to guide the RL algorithm for more efficient exploration and improves the explanability and predictability. In arXiv Preprint.
Books
Esmaeil Seraj, Kin Man Lee, Zulfiqar Zaidi, Qingyu Xiao, Zhaoxin Li, Arthur Scaquetti do Nascimento, Sanne Van Waveren, Pradyumna Tambwekar, Rohan Paleja, Devleena Das, and Matthew Gombolay Interactive and Explainable Robot Learning: A Comprehensive Review \| Abstract This monograph embarks on a comprehensive exploration of approaches, evaluation methods, and ethical considerations in explainable and interactive systems for robotic applications, distinctly focusing on intelligent systems that are specifically designed for learning automated agents. Given the increasing integration of robots in daily life, it is crucial to focus on intelligent systems that not only can learn and adapt, but also can offer clarity and comprehension for their actions. The interactive component of these systems is thoroughly examined, evaluating the algorithms, the modalities used in interaction, and the significance of mixed-initiative and shared autonomy. Adaptive and adaptable methods, emphasizing the centrality of user-inspired research and personalized approaches in interactive robotics are highlighted. Also included is a rigorous examination of safety and ethical considerations of these intelligent systems, including aspects of transparency, privacy, accountability, biases, and psychological well-being. The monograph evaluates existing metrics and benchmarking standards for such systems and explores their practical applications across domains such as healthcare, domestic tasks, and industrial automation. The monograph concludes with key insights and directions for future research, and design guidelines and points of consensus for each subject are included in order to equip readers with a nuanced understanding of current trends and tools in explainable and interactive robotic systems, paving the way for informed research and application in this dynamic field. Vol. 12, No. 2-3, pp 75–349.
2023
Journal Papers
Laura Strickland and Matthew Gombolay Coordinating Team Tactics for Swarm-vs.-Swarm Adversarial Games \| Abstract While swarms of UAVs have received much attention in the last few years, adversarial swarms (i.e., competitive, swarm-vs.-swarm games) have been less well studied. In this dissertation, I investigate the factors influential in team-vs.-team UAV aerial combat scenarios, elucidating the impacts of force concentration and opponent spread in the engagement space. Specifically, this dissertation makes the following contributions: (1) Tactical Analysis: Identifies conditions under which either explicitly-coordinating tactics or decentralized, greedy tactics are superior in engagements as small as 2-vs.-2 and as large as 10-vs.-10, and examines how these patterns change with the quality of the teams’ weapons; (2) Coordinating Tactics: Introduces and demonstrates a deep-reinforcement-learning framework that equips agents to learn to use their own and their teammates’ situational context to decide which pre-scripted tactics to employ in what situations, and which teammates, if any, to coordinate with throughout the engagement; the efficacy of agents using the neural network trained within this framework outperform baseline tactics in engagements against teams of agents employing baseline tactics in N-vs.-N engagements for N as small as two and as large as 64; and (3) Bio-Inspired Coordination: Discovers through Monte-Carlo agent-based simulations the importance of prioritizing the team’s force concentration against the most threatening opponent agents, but also of preserving some resources by deploying a smaller defense force and defending against lower-penalty threats in addition to high-priority threats to maximize the remaining fuel within the defending team’s fuel reservoir. Journal of Aerospace Information Systems. [Accepted on September 3rd, 2023]
Manisha Natarajan, Esmaeil Seraj, Batuhan Altundas, Rohan Paleja, Sean Ye, Letian Chen, Reed Jensen, Kimberlee Chestnut Chang, and Matthew Gombolay Human-Robot Teaming: Grand Challenges \| Abstract Purpose of Review: Current real-world interaction between humans and robots is extremely limited. We present challenges that, if addressed, will enable humans and robots to collaborate fluently. Recent Findings: Humans and robots have unique advantages best leveraged in Human-Robot Teams. However human and robot collaboration is challenging, and creating algorithmic advances to support teaming requires careful consideration. Prior research on Human-Robot Interaction, Multi-Agent Robotics, and Human-Centered Artificial Intelligence is often limited in scope or application due to unique challenges in combining humans and robots into teams. Identifying the key challenges that apply to a broad range of Human-Robot Teaming applications allows for a focused and collaborative development of a future towards a world where humans and robots can work together in every layer of society. Summary: In order to realize the potential of Human-Robot Teaming while avoiding potential societal harm, several key challenges must be addressed: 1) Communication, 2) Modeling Human Behavior, 3) Long-Term Interaction, 4) Scalability, 5) Safety, 6) Privacy, 7) Ethics, 8) Metrics and Benchmarking, 9) Human Social and Psychological Wellbeing. Current Robotics Reports.
Lina Zhou, Cynthia Rudin, Matthew Gombolay, Jim Spohrer, and Michelle Zhou From Artificial Intelligence (AI) to Intelligence Augmentation (IA): Design Principles, Potential Risks, and Emerging Issues \| Abstract We typically think of artificial intelligence (AI) as focusing on empowering machines with human capabilities so that they can function on their own, but, in truth, much of AI focuses on intelligence augmentation (IA), which is to augment human capabilities. We propose a framework for designing intelligent augmentation (IA) systems and it addresses six central questions about IA: why, what, who/whom, how, when, and where. To address the how aspect, we introduce four guiding principles: simplification, interpretability, human-centeredness, and ethics. The what aspect includes an IA architecture that goes beyond the direct interactions between humans and machines by introducing their indirect relationships through data and domain. The architecture also points to the directions for operationalizing the IA design simplification principle. We further identify some potential risks and emerging issues in IA design and development to suggest new questions for future IA research and to foster its positive impact on humanity. Transactions on Human-Computer Interaction (THCI), 15(1), Pages 111-135.
Grace Y Gombolay, Nakul Gopalan, Andrea Bernasconi, Rima Nabbout, Jonathan T Megerian, Benjamin Siegel, Jamika Hallman-Cooper, Sonam Bhalla, and Matthew Gombolay Review of Machine Learning and Artificial Intelligence (ML/AI) for the Pediatric Neurologist \| Abstract Artificial intelligence (AI) and a popular branch of AI known as machine learning (ML) are increasingly being utilized in medicine and to inform medical research. This review provides an overview of AI and ML (AI/ML), including definitions of common terms. We discuss the history of AI and provide instances of how AI/ML can be applied to pediatric neurology. Examples include imaging in neuro-oncology, autism diagnosis, diagnosis from charts, epilepsy, cerebral palsy, and neonatal neurology. Topics such as supervised learning, unsupervised learning, and reinforcement learning are discussed. Pediatric Neurology.
Pradyumna Tambwekar, Andrew Silva, Nakul Gopalan, and Matthew Gombolay Natural Language Specification of Reinforcement Learning Policies Through Differentiable Decision Trees \| Abstract Human-AI policy specification is a novel procedure we define in which humans can collaboratively warm-start a robot’s reinforcement learning policy. This procedure is comprised of two steps; (1) Policy Specification, i.e. humans specifying the behavior they would like their companion robot to accomplish, and (2) Policy Optimization, i.e. the robot applying reinforcement learning to improve the initial policy. Existing approaches to enabling collaborative policy specification are often unintelligible black-box methods, and are not catered towards making the autonomous system accessible to a novice end-user. In this letter, we develop a novel collaborative framework to allow humans to initialize and interpret an autonomous agent’s behavior. Through our framework, we enable humans to specify an initial behavior model via unstructured, natural language (NL), which we convert to lexical decision trees. Next, we leverage these translated specifications, to warm-start reinforcement learning and allow the agent to further optimize these potentially suboptimal policies. Our approach warm-starts an RL agent by utilizing non-expert natural language specifications without incurring the additional domain exploration costs. We validate our approach by showing that our model is able to produce >80% translation accuracy, and that policies initialized by a human can match the performance of relevant RL baselines in two domains. IEEE Robotics and Automation Letters, Volume 8, Issue 6, Pages 3621-3628.
Zulfiqar Zaidi, Daniel Martin, Nathaniel Belles, Viacheslav Zakharov, Arjun Krishna, Kin Man Lee, Peter Wagstaff, Sumedh Naik, Matthew Sklar, Sugju Choi, Yoshiki Kakehi, Ruturaj Patil, Divya Mallemadugula, Florian Pesce, Peter Wilson, Wendell Hom, Matan Diamond, Bryan Zhao, Nina Moorman, Rohan Paleja, Letian Chen, Esmaeil Seraj, and Matthew Gombolay Athletic Mobile Manipulator System for Robotic Wheelchair Tennis \| Abstract Athletics are a quintessential and universal expression of humanity. From French monks who in the 12th century invented jeu de paume, the precursor to modern lawn tennis, back to the K’iche’ people who played the Maya Ballgame as a form of religious expression over three thousand years ago, humans have sought to train their minds and bodies to excel in sporting contests. Advances in robotics are opening up the possibility of robots in sports. Yet, key challenges remain, as most prior works in robotics for sports are limited to pristine sensing environments, do not require significant force generation, or are on miniaturized scales unsuited for joint human-robot play. In this paper, we propose the first open-source, autonomous robot for playing regulation wheelchair tennis. We demonstrate the performance of our full-stack system in executing ground strokes and evaluate each of the system’s hardware and software components. The goal of this paper is to (1) inspire more research in human-scale robot athletics and (2) establish the first baseline for a reproducible wheelchair tennis robot for regulation singles play. Our paper contributes to the science of systems design and poses a set of key challenges for the robotics community to address in striving towards robots that can match human capabilities in sports. IEEE Robotics and Automation Letters, Volume 8, Issue 4, Pages 2245-2252.
Conference Papers
Zixuan Wu, Sean Ye, Manisha Natarajan, Letian Chen, Rohan Paleja, and Matthew Gombolay Adversarial Search and Tracking with Multiagent Reinforcement Learning in Sparsely Observable Environment \| Abstract We study a search and tracking (S&T) problem where a team of dynamic search agents must collaborate to track an adversarial, evasive agent. The heterogeneous search team may only have access to a limited number of past adversary trajectories within a large search space. This problem is challenging for both model-based searching and reinforcement learning (RL) methods since the adversary exhibits reactionary and deceptive evasive behaviors in a large space leading to sparse detections for the search agents. To address this challenge, we propose a novel Multi-Agent RL (MARL) framework that leverages the estimated adversary location from our learnable filtering model. We show that our MARL architecture can outperform all baselines and achieves a 46% increase in detection rate. In Proc. 2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), Pages 43-49.
Zixuan Wu, Sean Ye, Byeolyi Han, and Matthew Gombolay Hijacking Robot Teams Through Adversarial Communication \| Preprint \| Abstract Communication is often necessary for robot teams to collaborate and complete a decentralized task. Multi-agent reinforcement learning (MARL) systems allow agents to learn how to collaborate and communicate to complete a task. These domains are ubiquitous and include safety-critical domains such as wildfire fighting, traffic control, or search and rescue missions. However, critical vulnerabilities may arise in communication systems as jamming the signals can interrupt the robot team. This work presents a framework for applying black-box adversarial attacks to learned MARL policies by manipulating only the communication signals between agents. Our system only requires observations of MARL policies after training is complete, as this is more realistic than attacking the training process. To this end, we imitate a learned policy of the targeted agents without direct interaction with the environment or ground truth rewards. Instead, we infer the rewards by only observing the behavior of the targeted agents. Our framework reduces reward by 201% compared to an equivalent baseline method and also shows favorable results when deployed in real swarm robots. Our novel attack methodology within MARL systems contributes to the field by enhancing our understanding on the reliability of multi-agent systems. In Proc. Conference on Robot Learning (CoRL). [Oral Presentation, 6.6% Acceptance Rate]
Sravan Jayanthi, Letian Chen, Nadya Balabanska, Van Duong, Erik Scarlatescu, Ezra Ameperosa, Zulfiqar Haider Zaidi, Daniel Martin, Taylor Del Matto, Masahiro Ono, and Matthew Gombolay DROID: Learning from Offline Heterogeneous Demonstrations via Reward-Policy Distillation \| Abstract Offline Learning from Demonstrations (OLfD) is valuable in domains where trial-and-error learning is infeasible or specifying a cost function is difficult, such as robotic surgery, autonomous driving, and path-finding for NASA’s Mars rovers. However, two key problems remain challenging in OLfD: 1) heterogeneity: demonstration data can be generated with diverse preferences and strategies, and 2) generalizability: the learned policy and reward must perform well beyond a limited training regime in unseen test settings. To overcome these challenges, we propose Dual Reward and policy Offline Inverse Distillation (DROID), where the key idea is to leverage diversity to improve generalization performance by decomposing common-task and individual-specific strategies and distilling knowledge in both the reward and policy spaces. We ground DROID in a novel and uniquely challenging Mars rover path-planning problem for NASA’s Mars Curiosity Rover. We also curate a novel dataset along 163 Sols (Martian days) and conduct a novel, empirical investigation to characterize heterogeneity in the dataset. We find DROID outperforms prior SOTA OLfD techniques, leading to a 26% improvement in modeling expert behaviors and 92% closer to the task objective of reaching the final destination. We also benchmark DROID on the OpenAI Gym Cartpole environment and find DROID achieves 55% (significantly) better performance modeling heterogeneous demonstrations. In Proc. Conference on Robot Learning (CoRL). [39.9% Acceptance Rate]
Sean Ye, Manisha Natarajan, Zixuan Wu, Rohan Paleja, Letian Chen, and Matthew Gombolay Learning Models of Adversarial Agent Behavior under Partial Observability* \| Abstract The need for opponent modeling and tracking arises in several real-world scenarios, such as professional sports, video game design, and drug-trafficking interdiction. In this work, we present Graph based Adversarial Modeling with Mutual Information (GrAMMI) for modeling the behavior of an adversarial opponent agent. GrAMMI is a novel graph neural network (GNN) based approach that uses mutual information maximization as an auxiliary objective to predict the current and future states of an adversarial opponent with partial observability. To evaluate GrAMMI, we design two large-scale, pursuit-evasion domains inspired by real-world scenarios, where a team of heterogeneous agents is tasked with tracking and interdicting a single adversarial agent, and the adversarial agent must evade detection while achieving its own objectives. With the mutual information formulation, GrAMMI outperforms all baselines in both domains and achieves 31.68% higher log-likelihood on average for future adversarial state predictions across both domains. In Proc. International Conference on Intelligent Robots and Systems (IROS)
Pradyumna Tambwekar, Lakshita Dodeja, Nathan Vaska, Wei Xu, and Matthew Gombolay A Computational Interface to Translate Strategic Intent from Unstructured Language in a Low-Data Setting \| Abstract Many real-world tasks involve a mixed-initiative setup, wherein humans and AI systems collaboratively perform a task. While significant work has been conducted towards enabling humans to specify, through language, exactly how an agent should complete a task (i.e., low-level specification), prior work lacks on interpreting the high-level strategic intent of the human commanders. Parsing strategic intent from language will allow autonomous systems to independently operate according to the user’s plan without frequent guidance or instruction. In this paper, we build a computational interface capable of translating unstructured language strategies into actionable intent in the form of goals and constraints. Leveraging a game environment, we collect a dataset of over 1000 examples, mapping language strategies to the corresponding goals and constraints, and show that our model, trained on this dataset, significantly outperforms human interpreters in inferring strategic intent (i.e., goals and constraints) from language (p < 0.05). Furthermore, we show that our model (125M parameters) significantly outperforms ChatGPT for this task (p < 0.05) in a low-data setting. In Findings of the Association for Computational Linguistics: EMNLP 2023, Pages 12801–12819.
Andrew Silva, Pradyumna Tambwekar, and Matthew Gombolay FedPerC: Federated Learning for Language Generation with Personal and Context Preference Embeddings \| Abstract Federated learning is a training paradigm that learns from multiple distributed users without aggregating data on a centralized server, promising the ability to deploy machine-learning to a diverse population of users without first collecting large, labeled datasets.As federated learning involves averaging gradient updates across a decentralized population, there is a growing need for personalization of federated learning systems (i.e. conversational agents must personalize to individual users and the context of an interaction).In this work, we propose a new direction for personalization research within federated learning, leveraging both personal embeddings and shared context embeddings.We also present an approach to predict these “preference” embeddings, enabling personalization without backpropagation. Compared to state-of-the-art personalization baselines, our approach achieves a 50% improvement in test-time perplexity using 0.001% of the memory required by baseline approaches, and achieving greater sample- and compute-efficiency. In Findings of the Association for Computational Linguistics: EACL 2023, Pages 869–882.
Kin Man Lee, Arjun Krishna, Zulfiqar Zaidi, Rohan Paleja, Letian Chen, Erin Hedlund-Botti, Mariah Schrum, and Matthew Gombolay The Effect of Robot Skill Level and Communication in Rapid, Proximate Human-Robot Collaboration \| Abstract As high-speed, agile robots become more commonplace, these robots will have the potential to better aid and collaborate with humans. However, due to the increased agility and functionality of these robots, close collaboration with humans can create safety concerns that alter team dynamics and degrade task performance. In this work, we aim to enable the deployment of safe and trustworthy agile robots that operate in proximity with humans. We do so by 1) Proposing a novel human-robot doubles table tennis scenario to serve as a testbed for studying agile, proximate human-robot collaboration and 2) Conducting a user-study to understand how attributes of the robot (e.g., robot competency or capacity to communicate) impact team dynamics, perceived safety, and perceived trust, and how these latent factors affect human-robot collaboration (HRC) performance. We find that robot competency significantly increases perceived trust (𝑝 < .001), extending skill-to-trust assessments in prior studies to agile, proximate HRC. Furthermore, interestingly, we find that when the robot vocalizes its intention to perform a task, it results in a significant decrease in team performance (𝑝 = .037) and perceived safety of the system (𝑝 = .009). In Proc. International Conference on Human-Robot Interaction (HRI). [25.2% Acceptance Rate]
Nina Moorman, Erin Hedlund-Botti, Mariah Schrum, Manisha Natarajan, and Matthew Gombolay Impacts of Robot Learning on User Attitude and Behavior \| Abstract With an aging population and a growing shortage of nurses and caregivers, the need for in-home robots is increasing. However, it is intractable for robots to have all functionalities pre-programmed prior to deployment. Instead, it is more realistic for robots to engage in supplemental, on-site learning about the user’s needs and preferences and particularities of the environment. As a result, robots require the ability to adapt to and learn from their users. Such learning may occur in the presence of or involve the user, and the observation of the robot learning may impact the end-user’s perceptions of the robot. In this work, we investigate the impacts on end-users of in situ robot learning through a series of human-subject experiments. We investigate how different learning methods influence both in-person and remote participants’ perceptions of the robot. While we find that the degree of user involvement in the robot’s learning method impacts perceived anthropomorphism (𝑝 = .001), we find that it is the participants’ perceived success of the robot that impacts the participants’ trust in (𝑝 < .001) and perceived usability of the robot (𝑝 < .001) rather than the robot’s learning method. Therefore, when presenting robot learning, the performance of the learning method appears more important than the degree of user involvement in the learning. Furthermore, we find that the physical presence of the robot impacts perceived safety (𝑝 < .001), trust (𝑝 < .001), and usability (𝑝 < .014). In Proc. International Conference on Human-Robot Interaction (HRI). [25.2% Acceptance Rate]
Roger Dias, Lauren Kennedy-Metz, Rithy Srey, Geoffrey Rance, Mahdi Ebnali, David Arney, Matthew Gombolay, and Marco Zenati Using Digital Biomarkers for Objective Assessment of Perfusionists’ Workload and Acute Stress during Cardiac Surgery \| Abstract The cardiac operating room (OR) is a high-risk, high-stakes environment inserted into a complex socio-technical healthcare system. During cardiopulmonary bypass (CPB), the most critical phase of cardiac surgery, the perfusionist has a crucial role within the interprofessional OR team, being responsible for optimizing patient perfusion while coordinating other tasks with the surgeon, anesthesiologist, and nurses. The aim of this study was to investigate objective digital biomarkers of perfusionists’ workload and stress derived from heart rate variability (HRV) metrics captured via a wearable physiological sensor in a real cardiac OR. We explored the relationships between several HRV parameters and validated self-report measures of surgical task workload (SURG-TLX) and acute stress (STAI-SF), as well as surgical processes and outcome measures. We found that the frequency-domain HRV parameter HF relative power – FFT (%) presented the strongest association with task workload (correlation coefficient: −0.491, p-value: 0.003). We also found that the time-domain HRV parameter RMSSD (ms) presented the strongest correlation with perfusionists’ acute stress (correlation coefficient: −0.489, p-value: 0.005). A few workload and stress biomarkers were also associated with bypass time and patient length of stay in the hospital. The findings from this study will inform future research regarding which HRV-based biomarkers are best suited for the development of cognitive support systems capable of monitoring surgical workload and stress in real time. In Proc. International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO), Volume 13919, Pages 443-454.
Workshop/Symposium Papers and Doctoral Consortia
Varshith Sreeramdass, Rohan Paleja, Letian Chen, Sanne van Waveren, and Matthew Gombolay Generalized Behavior Learning from Diverse Demonstrations \| Abstract Learning robot control policies through reinforcement learning can be challenging due to the complexity of designing rewards, which often result in unexpected behaviors. Imitation Learning overcomes this issue by using demonstrations to create policies that mimic expert behaviors. However, experts often demonstrate varied approaches to tasks. Capturing this variability is crucial for understanding and adapting to diverse scenarios. Prior methods capture variability by optimizing for behavior diversity alongside imitation. Yet, naive formulations of diversity can result in meaningless representation of latent factors, hindering generalization to novel scenarios. We propose Guided Strategy Discovery (GSD), a novel regularization method that specifically promotes expert-specified, task-relevant diversity. In the recovery of unseen expert behaviors, GSD improves 11% over the next best baseline across three continuous control tasks on average. In Proc. First Workshop on Out-of-Distribution Generalization in Robotics at Conference on Robot Learning (CoRL).
Brandon Ho, Batuhan Altundas and Matthew Gombolay Towards Learning Scalable Agile Dynamic Motion Planning for Robosoccer Teams with Policy Optimization \| Abstract In fast-paced, ever-changing environments, dynamic Motion Planning for Multi-Agent Systems in the presence of obstacles is a universal and unsolved problem. Be it from path planning around obstacles to the movement of robotic arms, or in planning navigation of robot teams in settings such as Robosoccer, dynamic motion planning is needed to avoid collisions while reaching the targeted destination when multiple agents occupy the same area. In continuous domains where the world changes quickly, existing classical Motion Planning algorithms such as RRT* and A* become computationally expensive to rerun at every time step. Many variations of classical and well-formulated non-learning path-planning methods have been proposed to solve this universal problem but fall short due to their limitations of speed, smoothness, optimally, etc. Deep Learning models overcome their challenges due to their ability to adapt to varying environments based on past experience. However, current learning motion planning models use discretized environments, do not account for heterogeneous agents or replanning, and build up to improve the classical motion planners’ efficiency, leading to issues with scalability. To prevent collisions between heterogenous team members and collision to obstacles while trying to reach the target location, we present a learning-based dynamic navigation model and show our model working on a simple environment in the concept of a simple Robosoccer Game. In Proc. Conference on Robot Learning (CoRL) RoboLetics Workshop.
Rohan Paleja, Yaru Niu, Andrew Silva, Chace Ritchie, Sugju Choi, and Matthew Gombolay Learning Interpretable, High-Performing Policies for Continuous Control \| Abstract Gradient-based approaches in reinforcement learning (RL) have achieved tremendous success in learning policies for continuous control problems. While the performance of these approaches warrants real-world adoption in domains, such as in autonomous driving and robotics, these policies lack interpretability, limiting deployability in safety-critical and legally-regulated domains. Such domains require interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33% in autonomous driving scenarios while achieving a 300x-600x reduction in the number of policy parameters against deep learning baselines. In Proc. International Conference on Intelligent Robots and Systems (IROS) RL-CONFORM Workshop.
Nina Moorman, Nakul Gopalan, Aman Singh, Erin Hedlund-Botti, Mariah Schrum, Chuxuan Yang, Lakshmi Seelam, and Matthew Gombolay Investigating the Impact of Experience on a User’s Ability to Perform Hierarchical Abstraction \| Abstract The field of Learning from Demonstration enables end-users, who are not robotics experts, to shape robot behavior. However, using human demonstrations to teach robots to solve long-horizon problems by leveraging the hierarchical structure of the task is still an unsolved problem. Prior work has yet to show that human users can provide sufficient demonstrations in novel domains without showing the demonstrators explicit teaching strategies for each domain. In this work, we investigate whether non-expert demonstrators can generalize robot teaching strategies to provide necessary and sufficient demonstrations to robots zero-shot in novel domains. We find that increasing participant experience with providing demonstrations improves their demonstration’s degree of sub-task abstraction (p<.001), teaching efficiency (p<.001), and sub-task redundancy (p<.05) in novel domains, allowing generalization in robot teaching. Our findings demonstrate for the first time that non-expert demonstrators can transfer knowledge from a series of training experiences to novel domains without the need for explicit instruction, such that they can provide necessary and sufficient demonstrations when programming robots to complete task and motion planning problems. In Proc. RSS 2023 Workshop on Learning for Task and Motion Planning.
Manisha Natarajan, Chunyue Xue, Karen Feigh, and Matthew Gombolay Design of Human-Aware Robotic Decision Support Systems \| Abstract Advances in robotics and artificial intelligence (AI) have enabled the possibility of human-robot teaming. One potential avenue for collaborative robots is to provide decision-support for human partners in complex decision-making tasks. However, such agents are imperfect in real-world scenarios and may provide incorrect or suboptimal recommendations. Thus, it is imperative for human collaborators to understand when to trust the robot’s suggestions for maximizing task performance. Explainable AI (xAI) attempts to improve user understanding by providing explanations or rationales for agent recommendations. However, constantly providing explanations is unnecessary and can induce cognitive overload among users. In this work, we propose a POMDP framework that allows the robot to infer the users’ latent trust and preferences to provide appropriate and timely explanations for maximizing human-robot team performance in a sequential decision-making game. In Proc. ICRA Workshop on Cognitive Modeling in Robot Learning for Adaptive Human-Robot Interaction.
Pradyumna Tambwekar, Andrew Silva, and Matthew Gombolay The Design and Preliminary Results of a User Study Measuring Diverse Explainability Preferences \| Abstract As robots and digital assistants are deployed in the real world, these agents must be able to communicate their decision-making criteria to build trust, improve human-robot teaming, and enable collaboration. While the field of explainable artificial intelligence has made great strides in building a set of mechanisms to enable such communication, these advancements often assume that one approach is ideally suited to one or more problems (e.g., decision trees are best for explaining how to triage patients in an emergency room), failing to recognize that individual users may have different past experiences or preferences for interaction modalities. In this work, we present the design and results of a user study in a virtual self-driving car domain, in which the car presents navigational assistance to the human and uses varying explanation modalities to justify its suggestions. We find significant differences between explanation baselines for subjective ranking preferences (𝑝 < 0.01) and objective performance with respect to incorrect compliance (𝑝 < 0.05). However, we find that some participants have strong preferences that go against our population-level findings, which makes suggesting the majority-preference an inappropriate solution. Our analysis shows that personalization is crucial to maximize the subjective and objective benefits of explanations with diverse users. In Proc. Lifelong Learning and Personalization in Long-Term Human-Robot Interaction Workshop (LEAP-HRI).
Mariah Schrum and Matthew C. Gombolay Privacy and Personalization: Transparency, Acceptance, and the Ethics of Personalized Robots \| Abstract To effectively support humans, machines must be capable of recognizing individual desires, abilities, and characteristics and adapt to account for differences across individuals. However, personalization does not come without a cost. In many domains, for robots to effectively personalize their behavior, the robot must solicit often private and intimate information about an end-user so as to optimize the interaction. However, not all end-users may be comfortable sharing this information, especially if the end-user is not provided with insight into why the robot is requesting it. As HRI researchers, we have the responsibility of ensuring the robots we create do not infringe upon the privacy rights of end-users and that end-users are provided with the means to make informed decisions about the information they share with robots. While prior work has investigated willingness to share information in the context of consumerism, no prior work has investigated the impact of domain, type of requested information, or explanations on end-user’s comfort and acceptance of a personalized robot. To gain a better understanding of these questions, we propose an experimental design in which we investigate the impact of domain, nature of personal information requested, and the role of explanations on robot transparency and end-user willingness to share information. Our goal of this study is to provide guidance for HRI researchers who are conducting work in personalization by examining the factors that may impact transparency and acceptance of personalized robots In Proc. CONCATENATE 2023 Workshop at the ACM/IEEE International Conference on Human Robot Interaction (HRI).
Erin Hedlund and Matthew C. Gombolay Investigating Learning from Demonstration in Imperfect and Real World Scenarios \| Abstract As the world’s population is aging and there are growing shortages of caregivers, research into assistive robots is increasingly important. Due to differing needs and preferences, which may change over time, end-users will need to be able to communicate their preferences to a robot. Learning from Demonstration (LfD) is one method that enables non-expert users to program robots. While a powerful tool, prior research in LfD has made assumptions that break down in real-world scenarios. In this work, we investigate how to learn from suboptimal and heterogeneous demonstrators, how users react to failure with LfD, and the feasibility of LfD with a target population of older adults. In Proc. International Conference on Human Robot Interaction (HRI) Pioneers Workshop. [25.3% Acceptance Rate]
Esmaeil Seraj and Matthew Gombolay Embodied, Intelligent Communication for Multi-Agent Cooperation \| Abstract High-performing human teams leverage intelligent and efficient communication and coordination strategies to collaboratively maximize their joint utility. Inspired by teaming behaviors among humans, I seek to develop computational methods for synthesizing intelligent communication and coordination strategies for collaborative multi-robot systems. I leverage both classical model-based control and planning approaches as well as data-driven methods such as Multi-Agent Reinforcement Learning (MARL) to provide several contributions towards enabling emergent cooperative teaming behavior across both homogeneous and heterogeneous (including agents with different capabilities) robot teams. In Proc. Association for the Advancement of Artificial Intelligence Conference (AAAI) Doctoral Consortium.
arXiv Papers
Rohan Paleja, Letian Chen, Yaru Niu, Andrew Silva, Zhaoxin Li, Songan Zhang, Chace Ritchie, Sugju Choi, Kimberlee Chestnut Chang, Hongtei Eric Tseng, Yan Wang, Subramanya Nageshrao, and Matthew Gombolay Interpretable Reinforcement Learning for Robotics and Continuous Control \| Abstract Interpretability in machine learning is critical for the safe deployment of learned policies across legally-regulated and safety-critical domains. While gradient-based approaches in reinforcement learning have achieved tremendous success in learning policies for continuous control problems such as robotics and autonomous driving, the lack of interpretability is a fundamental barrier to adoption. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, reinforcement learning approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning policies that parity or outperform baselines by up to 33% in autonomous driving scenarios while achieving a 300x-600x reduction in the number of parameters against deep learning baselines. We prove that ICCTs can serve as universal function approximators and display analytically that ICCTs can be verified in linear time. Furthermore, we deploy ICCTs in two realistic driving domains, based on interstate Highway-94 and 280 in the US. Finally, we verify ICCT’s utility with end-users and find that ICCTs are rated easier to simulate, quicker to validate, and more interpretable than neural networks. In arXiv Preprint.
Kamel Alrashedy, Abdullah Aljasser, Pradyumna Tambwekar, and Matthew Gombolay Can LLMs Patch Security Issues? \| Abstract Large Language Models (LLMs) have shown impressive proficiency in code generation. Unfortunately, these models share a weakness with their human counterparts: producing code that inadvertently has security vulnerabilities. These vulnerabilities could allow unauthorized attackers to access sensitive data or systems, which is unacceptable for safety-critical applications. In this work, we propose Feedback-Driven Security Patching (FDSP), where LLMs automatically refine generated, vulnerable code. Our approach leverages automatic static code analysis to empower the LLM to generate and implement potential solutions to address vulnerabilities. We address the research communitys needs for safe code generation by introducing a large-scale dataset, PythonSecurityEval, covering the diversity of real-world applications, including databases, websites and operating systems. We empirically validate that FDSP outperforms prior work that uses self-feedback from LLMs by up to 17.6% through our procedure that injects targeted, external feedback. Code and data are available at \url{this https URL} In arXiv Preprint.
Sean Ye, Manisha Natarajan, Zixuan Wu, and Matthew Gombolay Diffusion Based Multi-Agent Adversarial Tracking \| Abstract Target tracking plays a crucial role in real-world scenarios, particularly in drug-trafficking interdiction, where the knowledge of an adversarial target’s location is often limited. Improving autonomous tracking systems will enable unmanned aerial, surface, and underwater vehicles to better assist in interdicting smugglers that use manned surface, semi-submersible, and aerial vessels. As unmanned drones proliferate, accurate autonomous target estimation is even more crucial for security and safety. This paper presents Constrained Agent-based Diffusion for Enhanced Multi-Agent Tracking (CADENCE), an approach aimed at generating comprehensive predictions of adversary locations by leveraging past sparse state information. To assess the effectiveness of this approach, we evaluate predictions on single-target and multi-target pursuit environments, employing Monte-Carlo sampling of the diffusion model to estimate the probability associated with each generated trajectory. We propose a novel cross-attention based diffusion model that utilizes constraint-based sampling to generate multimodal track hypotheses. Our single-target model surpasses the performance of all baseline methods on Average Displacement Error (ADE) for predictions across all time horizons. In arXiv Preprint.
Laksita Dodeja, Pradyumna Tambwekar, Erin Hedlund-Botti, and Matthew Gombolay Towards the design of user-centric strategy recommendation systems for collaborative Human-AI tasks \| Abstract Artificial Intelligence is being employed by humans to collaboratively solve complicated tasks for search and rescue, manufacturing, etc. Efficient teamwork can be achieved by understanding user preferences and recommending different strategies for solving the particular task to humans. Prior work has focused on personalization of recommendation systems for relatively well-understood tasks in the context of e-commerce or social networks. In this paper, we seek to understand the important factors to consider while designing user-centric strategy recommendation systems for decision-making. We conducted a human-subjects experiment (n=60) for measuring the preferences of users with different personality types towards different strategy recommendation systems. We conducted our experiment across four types of strategy recommendation modalities that have been established in prior work: (1) Single strategy recommendation, (2) Multiple similar recommendations, (3) Multiple diverse recommendations, (4) All possible strategies recommendations. While these strategy recommendation schemes have been explored independently in prior work, our study is novel in that we employ all of them simultaneously and in the context of strategy recommendations, to provide us an in-depth overview of the perception of different strategy recommendation systems. We found that certain personality traits, such as conscientiousness, notably impact the preference towards a particular type of system (p < 0.01). Finally, we report an interesting relationship between usability, alignment and perceived intelligence wherein greater perceived alignment of recommendations with one’s own preferences leads to higher perceived intelligence (p < 0.01) and higher usability (p <0.01). In arXiv Preprint.
Mahdi Ebnali, Marco Zenati, Matthew Gombolay, and Roger Dias Surgical Team Performance Analysis Using Computer Vision: A Methodology and Use Case Study \| Abstract Current methods for assessing the performance of surgical teams, such as observational rating scales and team self-assessments, have significant limitations. Existing methods primarily focus on subjective evaluations of surgical team performance or use an unstructured and descriptive method, resulting in inaccurate, biased evaluations. A computer vision (CV) -based method has been proposed in this study to overcome these challenges. This technique is based on estimating the position and movement of OR staff during procedures using video data captured by standard cameras to assess the performance of Operation Room (OR) staff. We examined the feasibility of this approach by extracting OR team motion metrics from 30 videos from real-life cardiac surgery operations. Further studies are needed to provide more validation of the efficiency of this method to quantify the performance of surgical teams. In PsyArXiv Preprint
Thesis
Andrew Silva Interactive and Explainable Methods in Machine Learning with Humans \| Abstract This dissertation introduces and evaluates new mechanisms for interactivity and explainability within machine learning, specifically targeting human-in-the-loop learning systems. I will also introduce and evaluate new approaches for personalization with heterogeneous populations (i.e., populations in which users have diverse, potentially conflicting, preferences). The contributions of this dissertation aim to substantiate the thesis statement: Interactive and explainable machine learning yields improved experiences for human users of intelligent systems. Specifically: 1. Machine learning with human expertise improves task performance as measured by success rates and reward. 2. Personalized machine learning improves task performance for a large heterogeneous population of users. 3. Machine learning with explainability improves human perceptions of intelligent agents and enhances user compliance with agent suggestions. I offer both novel technical methods for interactivity and explainability within machine learning, as well as user studies to empirically validate my technical contributions. Along the way, I provide guidelines for future work in interactive and explainable machine learning, drawing on insights from empirical experimentation and user studies. This dissertation begins with an overview of relevant background knowledge and related work (Chapter 2) and then presents novel technical contributions in interactive and personalized learning (Chapters 3-6). I then transition to contributions in explainable machine learning (Chapters 7-8) and a large-scale user study across various explainability approaches (Chapter 9). I present a project applying personalization techniques to explanation modalities in an in-person user study (Chapter 10), which serves to unify prior work on interactivity and explainability into a single framework for learning personalized explainability profiles across a population of heterogeneous end-users. This dissertation then closes with limitations of work presented herein and directions for future work in Chapter 11, and concluding remarks in Chapter 12. Ph.D. Thesis: Doctor of Philosophy in Interactive Computing.
Esmaeil Seraj Enhancing Teamwork in Multi-Robot Systems: Embodied Intelligence via Model- and Data-Driven Approaches \| Abstract High-performing human teams leverage intelligent and efficient communication and coordination strategies to collaboratively maximize their joint utility. Inspired by teaming behaviors among humans, I seek to develop computational methods for synthesizing intelligent communication and coordination strategies for collaborative multi-robot systems. I leverage both classical model-based control and planning approaches as well as data-driven methods such as Multi-Agent Reinforcement Learning (MARL) and Learning from Demonstration (LfD) to provide several contributions towards enabling emergent cooperative teaming behavior across robot teams. In my thesis, I first leverage model-based methods for coordinated control and planning under uncertainty for multi-robot systems to study and develop techniques for efficiently incorporating environment models in multi-robot planning and decision making. In these contributions, I design centralized and decentralized coordination frameworks, at the control-input and the high-level planning stages, which are informed by and have access to the model of the world. First, I develop an algorithm for human-centered coordinated control of multi-robot networked systems in safety-critical applications. I tackle the problems of enabling a robot team to reason about a coordinated coverage plan through active state estimation and providing probabilistic guarantees for performance. I then extended these methods to directly formulate and account for heterogeneity in robots’ characteristics and capabilities. I design a hierarchical coordination framework, which enables a composite team of robots (i.e., including robots that can only sense and robots that can only manipulate the environment) to effectively collaborate on complex missions such as aerial wildfire fighting. Model-based approaches provide the ability to derive performance and stability guarantees. However, can be sensitive to the accuracy of the model and the quality of the heuristic algorithm. As such, I leverage data-driven and Machine Learning (ML) approaches, such as MARL, to provide several contributions towards learning emergent cooperative behaviors. I design a graph-based architecture to learn efficient and diverse communication models for coordinating cooperative heterogeneous teams. Finally, inspired by the theory of mind in humans’ strategic decision-making model, I develop an iterative model to learn deep decision-rationalization for optimizing action selection in collaborative, decentralized teaming. In recent years, MARL has been predominantly used by researchers to optimize a reward signal and learning multi-robot tasks. Nevertheless, Reinforcement Learning (RL) generally suffers from key limitations such as the difficulty of designing an expressive and suitable reward function for complex tasks, and high sample complexity. As such, accurate models of human strategies and behaviors are increasingly important. Additionally, as multi-robot systems become increasingly prevalent in our communities and workplace, aligning the values motivating robot behaviors with human values is critical. LfD attempts to learn the correct behavior directly from expert-generated data demonstrations rather than a reward function. As such, in the last part of my work, I develop a multi-agent LfD framework to efficiently incorporate humans’ domain-knowledge of teaming strategies for collaborative robot teams and directly learn team coordination policies from human teachers. To this end, I propose the MixTURE framework for human training of robot teams. MixTURE enables robot teams to learn a humans’ preferred strategy to collaborate, while simultaneously learning end-to-end emergent communication for the robot team to efficiently coordinate their actions, without the need for human generated data. MixTURE benefits from the merits of LfD methods over RL while significantly alleviating the human demonstrator’s workload and time required to provide demonstrations, as well as increasing the SUS and overall collaboration performance of the robot team. Ph.D. Thesis: Doctor of Philosophy in Electrical & Computer Engineering.
Rohan Paleja Interpretable Artificial Intelligence for Personalized Human-Robot Collaboration \| Abstract Collaborative robots (i.e., “cobots”) and machine learning-based virtual agents are increasingly entering the human workspace with the aim of increasing productivity, enhancing safety, and improving the quality of our lives. These agents will dynamically interact with a wide variety of people in dynamic and novel contexts, increasing the prevalence of human-machine teams in healthcare, manufacturing, and search-and-rescue. Within these domains, it is critical that collaborators have aligned objectives and maintain awareness over other agents’ behaviors to avoid potential accidents. In my thesis, I present several contributions that push the frontier of real-world robotics systems toward those that understand human behavior, maintain interpretability, communicate efficiently, and coordinate with high performance. Specifically, I first study the nature of collaboration in simulated, large-scale multi-agent systems, exploring techniques that utilize context-based communication among decentralized robots, and find that utilizing targeted communication and accounting for teammate heterogeneity is beneficial in generating effective coordination. Next, I transition to human-machine systems and develop a data-efficient, person-specific, and interpretable tree-based apprenticeship learning framework to enable cobots to infer and understand decision-making behavior across heterogeneous human end-users. Building on this, I extend neural tree-based architectures to support learning interpretable control policies for robots via gradient-based techniques. This not only allows end-users to inspect and understand learned behavior models, but also provides developers with the means to verify control policies for safety guarantees. Lastly, I present two works that deploy Explainable AI (xAI) techniques in human-machine collaboration, aiming to 1) characterize the utility of xAI and its benefits towards shared mental development, and 2) allow end-users to interactively modify learned policies via a graphical user interface to support team development. Ph.D. Thesis: Doctor of Philosophy in Mechanical Engineering.
Mariah Schrum Data Driven Personalization Techniques to Account for Heterogeneity in Human-Machine Interaction \| Abstract As robots and AI systems become more prevalent in every-day life, humans and machines will have to work closely together. Robotic devices will be used to support human health, service robots will operate alongside humans in homes, and autonomous vehicles will have to safely drive end-users to their destination. Yet, humans exhibit a high degree of heterogeneity which poses a challenge for robotic systems that are tasked with learning from and supporting humans. For example, in a medical setting, individual patients are likely to have different needs and varying biology that must be accounted for. Autonomous Vehicles (AVs) will have to learn about the differing preferences of end-users and adapt accordingly. Because of this human heterogeneity, one-size-fits-all algorithms will not suffice in many human-machine interaction scenarios. Instead, to effectively support humans, machines must be capable of recognizing individual desires, abilities, and characteristics and adapt to account for differences across individuals. This thesis focuses on the development of personalized algorithms that enable machines to better support and work with humans. Specifically, I aim to develop and research novel techniques for safely and efficiently supporting heterogeneous humans across various robotic domains. In this work, I develop data-driven, personalized frameworks in healthcare, learning from demonstration, and autonomous driving domains to account for heterogeneity amongst end-users. In this thesis, I first investigate the question of how robots can best learn from human demonstrators who are suboptimal and heterogeneous in their suboptimality. I present my work on Mutual Information Driven Meta-Learning from Demonstration (MIND MELD), a framework enabling personalized robotic learning from heterogeneous human demonstrators via a learned, personalized embedding. I then extend this approach with Reciprocal MIND MELD and introduce a framework to provide personalized feedback to suboptimal human demonstrators to improve upon their ability to provide high quality demonstrations. Humans are not only heterogeneous in terms of their abilities when teaching machines; they also tend to differ in their preferences for various machine behaviors. To account for differing preferences amongst end-users, I draw upon our Reciprocal MIND MELD work and introduce Manipulating Autonomous Vehicle Embedding Region for Individuals’ Comfort (MAVERIC), an approach for personalizing driving styles of AVs to fit the preferences of end-users via personalized embeddings. In my final work, I consider personalization in safety critical domains such as healthcare. I introduce Safe Meta-Active Learning (Safe MetAL), an approach for determining the optimal, personalized parameter settings for a Deep Brain Stimulation (DBS) patient. Ph.D. Thesis: Doctor of Philosophy in Mechanical Engineering.
2022
Journal Papers
Mariah Schrum, Muyleng Ghuy, Erin Hedlund-Botti, Manisha Natarajan, Michael Johnson, and Matthew Gombolay Concerning Trends in Likert Scale Usage in Human-Robot Interaction: Towards Improving Best Practices \| Abstract As robots become more prevalent, the importance of the field of human-robot interaction (HRI) grows accordingly. As such, we should endeavor to employ the best statistical practices in HRI research. Likert scales are commonly used metrics in HRI to measure perceptions and attitudes. Due to misinformation or honest mistakes, many HRI researchers do not adopt best practices when analyzing Likert data. We conduct a review of psychometric literature to determine the current standard for Likert scale design and analysis. Next, we conduct a survey of five years of the International Conference on Human-Robot Interaction (HRIc) (2016 through 2020) and report on incorrect statistical practices and design of Likert scales [1–3, 5, 7]. During these years, only 4 of the 144 papers applied proper statistical testing to correctly-designed Likert scales. We additionally conduct a survey of best practices across several venues and provide a comparative analysis to determine how Likert practices differ across the field of Human-Robot Interaction. We find that a venue’s impact score negatively correlates with number of Likert related errors and acceptance rate, and total number of papers accepted per venue positively correlates with the number of errors. We also find statistically significant differences between venues for the frequency of misnomer and design errors. Our analysis suggests there are areas for meaningful improvement in the design and testing of Likert scales. Based on our findings, we provide guidelines and a tutorial for researchers for developing and analyzing Likert scales and associated data. We also detail a list of recommendations to improve the accuracy of conclusions drawn from Likert data ACM Transactions on Human-Robot Interaction (THRI), Volume 12, Issue 3, Article 33.
Andrew Silva, Mariah Schrum, Erin Hedlund-Botti, Nakul Gopalan, and Matthew C. Gombolay Explainable Artificial Intelligence: Evaluating the Objective and Subjective Impacts of xAI on Human-Agent Interaction \| Abstract Intelligent agents must be able to communicate intentions and explain their decisionmaking processes to build trust, foster confidence, and improve human-agent team dynamics. Recognizing this need, academia and industry are rapidly proposing new ideas, methods, and frameworks to aid in the design of more explainable AI. Yet, there remains no standardized metric or experimental protocol for benchmarking new methods, leaving researchers to rely on their own intuition or ad hoc methods for assessing new concepts. In this work, we present the first comprehensive (n=286) user study testing a wide range of approaches for explainable machine learning, including feature importance, probability scores, decision trees, counterfactual reasoning, natural language explanations, and case-based reasoning, as well as a baseline condition with no explanations. We provide the first large-scale empirical evidence of the effects of explainability on human-agent teaming. Our results will help to guide the future of explainability research by highlighting the benefits of counterfactual explanations and the shortcomings of confidence scores for explainability. We also propose a novel questionnaire to measure explainability with human participants, inspired by relevant prior work and correlated with human-agent teaming metrics. Published Version International Journal of Human–Computer Interaction.
Mariah Schrum, Mark J Connolly, Eric Cole, Mihir Ghetiya, Robert Gross, and Matthew C. Gombolay Meta-Active Learning in Probabilistically Safe Optimization \| Abstract When a robotic system is faced with uncertainty, the system must take calculated risks to gain information as efficiently as possible while ensuring system safety. The need to safely and efficiently gain information in the face of uncertainty spans domains from healthcare to search and rescue. To efficiently learn when data is scarce or difficult to label, active learning acquisition functions intelligently select a data point that, if the label were known, would most improve the estimate of the unknown model. Unfortunately, prior work in active learning suffers from an inability to accurately quantify information-gain, generalize to new domains, and ensure safe operation. To overcome these limitations, we develop Safe MetAL, a probabilistically-safe, active learning algorithm which meta-learns an acquisition function for selecting sample efficient data points in safety critical domains. The key to our approach is a novel integration of meta-active learning and chance-constrained optimization. We (1) meta-learn an acquisition function based on sample history, (2) encode this acquisition function in a chance-constrained optimization framework, and (3) solve for an information-rich set of data points while enforcing probabilistic safety guarantees. We present state-of-the-art results in active learning of the model of a damaged UAV and in learning the optimal parameters for deep brain stimulation. Our approach achieves a 41% improvement in learning the optimal model and a 20% speedup in computation time compared to active and meta-learning approaches while ensuring safety of the system. IEEE Robotics and Automation Letters.
Esmaeil Seraj, Andrew Silva, and Matthew Gombolay Multi-UAV Planning for Cooperative Wildfire Coverage and Tracking with Quality-of-Service Guarantees \| Abstract In recent years, teams of robot and Unmanned Aerial Vehicles (UAVs) have been commissioned by researchers to enable accurate, online wildfire coverage and tracking. While the majority of prior work focuses on the coordination and control of such multirobot systems, to date, these UAV teams have not been given the ability to reason about a fire’s track (i.e., location and propagation dynamics) to provide performance guarantee over a time horizon. Motivated by the problem of aerial wildfire monitoring, we propose a predictive framework which enables cooperation in multi-UAV teams towards collaborative field coverage and fire tracking with probabilistic performance guarantee. Our approach enables UAVs to infer the latent fire propagation dynamics for time-extended coordination in safety-critical conditions. We derive a set of novel, analytical temporal, and tracking-error bounds to enable the UAV-team to distribute their limited resources and cover the entire fire area according to the case-specific estimated states and provide a probabilistic performance guarantee. Our results are not limited to the aerial wildfire monitoring case-study and are generally applicable to problems, such as search-and-rescue, target tracking and border patrol. We evaluate our approach in simulation and provide demonstrations of the proposed framework on a physical multi-robot testbed to account for real robot dynamics and restrictions. Our quantitative evaluations validate the performance of our method accumulating 7.5× and 9.0× smaller tracking-error than state-of-the-art model-based and reinforcement learning benchmarks, respectively. Autonomous Agents and Multi-Agent Systems, Volume 36, Article 39
Sam Broida, Mariah Schrum, Eric Yoon, Aidan Sweeney, Neil Dhruv, Matthew Gombolay, and Sangwook Yoon Improving Surgical Triage in Spine Clinic: Predicting Likelihood of Surgery Using Machine Learning \| Abstract Background: Correctly triaging patients to a surgeon or non-operative provider is an important part of the referral process. Clinics typically triage new patients based on simple intake questions. This is time-consuming and does not incorporate objective data. Our goal was to use machine learning to more accurately screen surgical candidates seen in spine clinic. Methods: Using questionnaire data and MRI reports, a set of artificial neural networks were trained to predict whether a patient would be recommended for spine surgery. Questionnaire responses included demographics, chief complaint, and pain characteristics. The primary endpoint was the surgeon’s determination of whether a patient was an operative candidate. Model accuracy in predicting this endpoint was assessed using a separate subset of patients apart from the training data. Results: The retrospective dataset included 1,663 cervical and lumbar patients. Questionnaire data was available for all participants and MRI reads were available for 242 patients. Within six months of initial evaluation, 717 (43.1%) patients were deemed surgical candidates by the surgeon. Our models predict surgeons’ recommendations with AUC scores of 0.686 for lumbar (PPV 66%, NPV 80%) and 0.821 for cervical (PPV 83%, NPV 85%) patients. Conclusions: Our models use patient data to accurately predict whether patients will receive a surgical recommendation. The models’ high NPV demonstrate that this approach can reduce the burden of non-surgical patients in surgery clinic without losing many surgical candidates. This could reduce unnecessary visits for patients, increase the proport World Neurosurgery, Volume 163, Pages e192-e198
Dean D. Molinaro, Inseung Kang, Jonathan Camargo, Matthew Gombolay and Aaron Young Subject-Independent, Biological Hip Moment Estimation during Multimodal Overground Ambulation using Deep Learning \| Abstract Estimating biological joint moments using wearable sensors could enable out-of-lab biomechanical analyses and exoskeletons that assist throughout daily life. To realize these possibilities, this study introduced a subject-independent hip moment estimator using a temporal convolutional network (TCN) and validated its performance and generalizability during multimodal ambulation. Electrogoniometer and simulated IMU data from sixteen participants walking on level ground, ramps and stairs were used to evaluate our approach when benchmarked against a fully-connected neural network, a long short-term memory network, and a baseline method (i.e., using subject-average moment curves based on ambulation mode and gait phase). Additionally, the generalizability of our approach was evaluated by testing on ground slopes, stair heights, and gait transitions withheld during model training. The TCN outperformed the benchmark approaches on the hold-out data (p < 0.05), with an average RMSE of 0.131±0.018 Nm/kg and R 2 of 0.880±0.030 during steady-state ambulation. When tested on the 20 leave-one-out slope and stair height conditions, the TCN significantly increased RMSE only on the steepest (+18°) incline (p < 0.05). Finally, the TCN RMSE and R 2 was 0.152±0.027 Nm/kg and 0.786±0.055, respectively, during mode transitions. Thus, our approach accurately estimated hip moments and generalized to unseen gait contexts using data from three wearable sensors. IEEE Transactions on Biomedical Engineering (TBME), Volume 4, Issue 1, Pages 219-229
Andrew Silva, Nina Moorman, William Silva, Zulfiqar Zaidi, Nakul Gopalan, and Matthew Gombolay LanCon-Learn: Learning with Language to Enable Generalization in Multi-Task Manipulation \| Abstract Robots must be capable of learning from previously solved tasks and generalizing that knowledge to quickly perform new tasks to realize the vision of ubiquitous and useful robot assistance in the real world. While multi-task learning research has produced agents capable of performing multiple tasks, these tasks are often encoded as one-hot goals. In contrast, natural language specifications offer an accessible means both for (1) users to describe a set of new tasks to the robot and (2) robots to reason about the similarities and differences among tasks through language-based task embeddings. Until now, multi-task learning with language has been limited to navigation based tasks and has not been applied to continuous manipulation tasks, requiring precision to grasp and move objects. We present LanCon-Learn, a novel attention-based approach to language-conditioned multi-task learning in manipulation domains to enable learning agents to reason about relationships between skills and task objectives through natural language and interaction. We evaluate LanCon-Learn for both reinforcement learning and imitation learning, across multiple virtual robot domains along with a demonstration on a physical robot. LanCon-Learn achieves up to a 200% improvement in zero-shot task success rate and transfers known skills to novel tasks faster than non-language-based baselines, demonstrating the utility of language for goal specification. IEEE Robotics and Automation Letters, Volume 7, Issue 2, Pages 1635-1642.
Conference Papers
Mariah Schrum, Erin Hedlund-Botti, and Matthew Gombolay Reciprocal MIND MELD: Improving Learning From Demonstration via Personalized, Reciprocal Teaching \| Abstract Endowing robots with the ability to learn novel tasks via demonstrations will increase the accessibility of robots for non-expert, non-roboticists. However, research has shown that humans can be poor teachers, making it difficult for robots to effectively learn from humans. If the robot could instruct humans how to provide better demonstrations, then humans might be able to effectively teach a broader range of novel, out-of-distribution tasks. In this work, we introduce Reciprocal MIND MELD, a framework in which the robot learns the way in which a demonstrator is suboptimal and utilizes this information to provide feedback to the demonstrator to improve upon their demonstrations. We additionally develop an Embedding Predictor Network which learns to predict the demonstrator’s suboptimality online without the need for optimal labels. In a series of human-subject experiments in a driving simulator domain, we demonstrate that robotic feedback can effectively improve human demonstrations in two dimensions of suboptimality (p < .001) and that robotic feedback translates into better learning outcomes for a robotic agent on novel tasks (p = .045). In Proc. Conference on Robot Learning (CoRL). [39% Acceptance Rate]
Sachin Konan, Esmaeil Seraj, and Matthew Gombolay Contrastive Decision Transformers \| Abstract Decision Transformers (DT) have drawn upon the success of Transformers by abstracting Reinforcement Learning as a target-return-conditioned, sequence modeling problem. In our work, we claim that the distribution of DT’s target-returns represents a series of different tasks that agents must learn to handle. Work in multi-task learning has shown that separating the representations of input data belonging to different tasks can improve performance. We draw from this approach to construct ConDT (Contrastive Decision Transformer). ConDT leverages an enhanced contrastive loss to train a return-dependent transformation of the input embeddings, which we empirically show clusters these embeddings by their return. We find that ConDT significantly outperforms DT in Open-AI Gym domains by 10% and 39% in visually challenging Atari domains. Additionally, ConDT shows promising application to robot learning by outperforming DT by 20% in the Adroit Robotic HandGrip Experiments. In Proc. Conference on Robot Learning (CoRL). [39% Acceptance Rate]
Letian Chen, Sravan Jayanthi, Rohan Paleja, Daniel Martin, Nakul Gopalan, Viacheslav Zakharov, and Matthew Gombolay Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations \| Abstract Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task (p < .05) and personalization (p < .05) performance. In Proc. Conference on Robot Learning (CoRL). [39% Acceptance Rate]
Batuhan Altundas, Zheyuan Wang, Josh Bishop, and Matthew Gombolay Learning Coordination Policies over Heterogeneous Graphs for Human-Robot Teams via Recurrent Neural Schedule Propagation \| Abstract As human-robot collaboration increases in the workforce, it becomes essential for human-robot teams to coordinate efficiently and intuitively. Traditional approaches for human-robot scheduling either utilize exact methods that are intractable for large-scale problems and struggle to account for stochastic, time-varying human task performance, or application-specific heuristics that require expert domain knowledge to develop. We propose a deep learning-based framework, called HybridNet, combining a heterogeneous graph-based encoder with a recurrent schedule propagator for scheduling stochastic human-robot teams under upper- and lower-bound temporal constraints. The HybridNet’s encoder leverages Heterogeneous Graph Attention Networks to model the initial environment and team dynamics while accounting for the constraints. By formulating task scheduling as a sequential decision-making process, the HybridNet’s recurrent neural schedule propagator leverages Long Short-Term Memory (LSTM) models to propagate forward consequences of actions to carry out fast schedule generation, removing the need to interact with the environment between every task-agent pair selection. The resulting scheduling policy network provides a computationally lightweight yet highly expressive model that is end-to-end trainable via Reinforcement Learning algorithms. We develop a virtual task scheduling environment for mixed human-robot teams in a multi-round setting, capable of modeling the stochastic learning behaviors of human workers. Experimental results showed that HybridNet outperformed other human-robot scheduling solutions across problem sizes for both deterministic and stochastic human performance, with faster runtime compared to pure-GNN-based schedulers. In Proc. International Conference on Intelligent Robots and Systems (IROS).
Andrew Hundt, William Agnew, Vicky Zang, Severin Kacianka, and Matthew Gombolay Robots Enact Malignant Stereotypes \| Abstract Stereotypes, bias, and discrimination have been extensively documented in Machine Learning (ML) methods such as Computer Vision (CV) [18, 80], Natural Language Processing (NLP) [6], or both, in the case of large image and caption models such as OpenAI CLIP [14]. In this paper, we evaluate how ML bias manifests in robots that physically and autonomously act within the world. We audit one of several recently published CLIP-powered robotic manipulation methods, presenting it with objects that have pictures of human faces on the surface which vary across race and gender, alongside task descriptions that contain terms associated with common stereotypes. Our experiments definitively show robots acting out toxic stereotypes with respect to gender, race, and scientifically-discredited physiognomy, at scale. Furthermore, the audited methods are less likely to recognize Women and People of Color. Our interdisciplinary sociotechnical analysis synthesizes across fields and applications such as Science Technology and Society (STS), Critical Studies, History, Safety, Robotics, and AI. We find that robots powered by large datasets and Dissolution Models (sometimes called “foundation models”, e.g. CLIP) that contain humans risk physically amplifying malignant stereotypes in general; and that merely correcting disparities will be insufficient for the complexity and scale of the problem. Instead, we recommend that robot learning methods that physically manifest stereotypes or other harmful outcomes be paused, reworked, or even wound down when appropriate, until outcomes can be proven safe, effective, and just. Finally, we discuss comprehensive policy changes and the potential of new interdisciplinary research on topics like Identity Safety Assessment Frameworks and Design Justice to better understand and address these harms. In Proc. ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT).
Roger Dias, Lauren Kennedy-Metz, Steven Yule, Matthew Gombolay, and Marco Zenati Assessing Team Situational Awareness in the Operating Room via Computer Vision \| Abstract Situational awareness (SA) at both individual and team levels, plays a critical role in the operating room (OR). During the pre-incision time-out, the entire OR team comes together to deploy the surgical safety checklist (SSC). Worldwide, the implementation of the SSC has been shown to reduce intraoperative complications and mortality among surgical patients. In this study, we investigated the feasibility of applying computer vision analysis on surgical videos to extract team motion metrics that could differentiate teams with good SA from those with poor SA during the pre-incision time-out. We used a validated observation-based tool to assess SA, and a computer vision software to measure body position and motion patterns in the OR. Our findings showed that it is feasible to extract surgical team motion metrics captured via off-the-shelf OR cameras. Entropy as a measure of the level of team organization was able to distinguish surgical teams with good and poor SA. These findings corroborate existing studies showing that computer vision-based motion metrics have the potential to integrate traditional observation-based performance assessments in the OR. In Proc. Conference on Cognitive and Computational Aspects of Situation Management (CogSIMA).
Rohan Paleja, Yaru Niu, Andrew Silva, Chace Ritchie, Sugju Choi, and Matthew Gombolay Learning Interpretable, High-Performing Policies for Autonomous Driving \| Abstract Gradient-based approaches in reinforcement learning have achieved tremendous success in learning policies for autonomous vehicles. While the performance of these approaches warrants real-world adoption, these policies lack interpretability, limiting deployability in the safety-critical and legally-regulated domain of autonomous driving (AD). AD requires interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33% in AD scenarios while achieving a 300x-600x reduction in the number of policy parameters against deep learning baselines. Furthermore, we demonstrate the interpretability and utility of our ICCTs through a 14-car physical robot demonstration. In Proc. Robotics: Science and Systems (RSS). [32% Acceptance Rate]
Nakul Gopalan, Nina Moorman, Manisha Natarajan, and Matthew Gombolay Negative Result for Learning from Demonstration: Challenges for End-Users Teaching Robots with Task And Motion Planning Abstractions \| Abstract Learning from demonstration (LfD) seeks to democratize robotics by enabling non-experts to intuitively program robots to perform novel skills through human task demonstration. Yet, LfD is challenging under a task and motion planning setting which requires hierarchical abstractions. Prior work has studied mechanisms for eliciting demonstrations that include hierarchical specifications of task and motion, via keyframes [1] or hierarchical task network specifications [2]. However, such prior works have not examined whether non-roboticist end-users are capable of providing such hierarchical demonstrations without explicit training from a roboticist showing how to teach each task [3]. To address the limitations and assumptions of prior work, we conduct two novel human-subjects experiments to answer (1) what are the necessary conditions to teach users through hierarchy and task abstractions and (2) what instructional information or feedback is required to support users to learn to program robots effectively to solve novel tasks. Our first experiment shows that fewer than half (35.71%) of our subjects provide demonstrations with sub-task abstractions when not primed. Our second experiment demonstrates that users fail to teach the robot correctly when not shown a video demonstration of an expert’s teaching strategy for the exact task that the subject is training. Not even showing the video of an analogue task was sufficient. These experiments reveal the need for fundamentally different approaches in LfD which can allow end-users to teach generalizable long-horizon tasks to robots without the need to be coached by experts at every step. In Proc. Robotics: Science and Systems (RSS). [32% Acceptance Rate] [Best Student Paper Nomination]
Zheyuan Wang and Matthew Gombolay Stochastic Resource Optimization over Heterogeneous Graph Neural Networks for Failure-Predictive Maintenance Scheduling \| Abstract Resource optimization for predictive maintenance is a challenging computational problem that requires inferring and reasoning over stochastic failure models and dynamically allocating repair resources. Predictive maintenance scheduling is typically performed with a combination of ad hoc, handcrafted heuristics with manual scheduling corrections by human domain experts, which is a labor-intensive process that is hard to scale. In this paper, we develop an innovative heterogeneous graph neural network to automatically learn an end-to-end resource scheduling policy. Our approach is fully graph-based with the addition of state summary and decision value nodes that provides a computationally lightweight and nonparametric means to perform dynamic scheduling. We augment our policy optimization procedure to enable robust learning in highly stochastic environments for which typical actor-critic reinforcement learning methods are ill-suited. In consultation with aerospace industry partners, we develop a virtual predictive-maintenance environment for a heterogeneous fleet of aircraft, called AirME. Our approach sets a new state-of-the-art by outperforming conventional, hand-crafted heuristics and baseline learning methods across problem sizes and various objective functions. In Proc. International Conference on Automated Planning and Scheduling (ICAPS). [31% Acceptance Rate]
Mariah Schrum, Erin Hedlund-Botti, Nina Moorman, and Matthew Gombolay MIND MELD: Personalized Meta-Learning for Robot-Centric Imitation Learning \| Abstract Learning from demonstration (LfD) techniques seek to enable users without computer programming experience to teach robots novel tasks. There are generally two types of LfD: human- and robot-centric. While human-centric learning is intuitive, human centric learning suffers from performance degradation due to covariate shift. Robot-centric approaches, such as Dataset Aggregation (DAgger), address covariate shift but can struggle to learn from suboptimal human teachers. To create a more human-aware version of robot-centric LfD, we present Mutual Information-driven Meta-learning from Demonstration (MIND MELD). MIND MELD meta-learns a mapping from suboptimal and heterogeneous human feedback to optimal labels, thereby improving the learning signal for robot-centric LfD. The key to our approach is learning an informative personalized embedding using mutual information maximization via variational inference. The embedding then informs a mapping from human provided labels to optimal labels. We evaluate our framework in a human-subjects experiment, demonstrating that our approach improves corrective labels provided by human demonstrators. Our framework outperforms baselines in terms of ability to reach the goal (p < .001), average distance from the goal (p = .006), and various subjective ratings (p = .008). In Proc. International Conference on Human-Robot Interaction (HRI). [25% Acceptance Rate] [Best Paper Award]
Sachin Konan, Esmaeil Seraj, and Matthew Gombolay Iterated Reasoning with Mutual Information in Cooperative and Byzantine Decentralized Teaming \| Abstract Information sharing is key in building team cognition and enables coordination and cooperation. High-performing human teams also benefit from acting strategically with hierarchical levels of iterated communication and rationalizability, meaning a human agent can reason about the actions of their teammates in their decision-making. Yet, the majority of prior work in Multi-Agent Reinforcement Learning (MARL) does not support iterated rationalizability and only encourage inter-agent communication, resulting in a suboptimal equilibrium cooperation strategy. In this work, we show that reformulating an agent’s policy to be conditional on the policies of its neighboring teammates inherently maximizes Mutual Information (MI) lower-bound when optimizing under Policy Gradient (PG). Building on the idea of decision-making under bounded rationality and cognitive hierarchy theory, we show that our modified PG approach not only maximizes local agent rewards but also implicitly reasons about MI between agents without the need for any explicit ad-hoc regularization terms. Our approach, InfoPG, outperforms baselines in learning emergent collaborative behaviors and sets the state-of-the-art in decentralized cooperative MARL tasks. Our experiments validate the utility of InfoPG by achieving higher sample efficiency and significantly larger cumulative reward in several complex cooperative multi-agent domains. In Proc. Conference on Learning Representations (ICLR). [32% Acceptance Rate]
Andrew Silva, Rohit Chopra, and Matthew Gombolay Cross-Loss Influence Functions to Explain Deep Network Representations \| Abstract As machine learning is increasingly deployed in the real world, it is ever more vital that we understand the decision-criteria of the models we train. Recently, researchers have shown that influence functions, a statistical measure of sample impact, may be extended to approximate the effects of training samples on classification accuracy for deep neural networks. However, prior work only applies to supervised learning setups where training and testing share an objective function. Despite the rise in unsupervised learning, self-supervised learning, and model pre-training, there are currently no suitable technologies for estimating influence of deep networks that do not train and test on the same objective. To overcome this limitation, we provide the first theoretical and empirical demonstration that influence functions can be extended to handle mismatched training and testing settings. Our result enables us to compute the influence of unsupervised and self-supervised training examples with respect to a supervised test objective. We demonstrate this technique on a synthetic dataset as well as two Skipgram language model examples to examine cluster membership and sources of unwanted bias. In Proc. Conference on Aritifcial Intelligence and Statistics (AISTATS). [29% Acceptance Rate]
Esmaeil Seraj, Zheyuan Wang, Rohan Paleja, Daniel Martin, Matthew Sklar, Anirudh Patel, and Matthew Gombolay Learning Efficient Diverse Communication for Cooperative Heterogeneous Teaming* \| Abstract High-performing teams learn intelligent and efficient communication and coordination strategies to maximize their joint utility. These teams implicitly understand the different roles of heterogeneous team members and adapt their communication protocols accordingly. Multi-Agent Reinforcement Learning (MARL) seeks to develop computational methods for synthesizing such coordination strategies, but formulating models for heterogeneous teams with different state, action, and observation spaces has remained an open problem. Without properly modeling agent heterogeneity, as in prior MARL work that leverages homogeneous graph networks, communication becomes less helpful and can even deteriorate the cooperativity and team performance. We propose Heterogeneous Policy Networks (HetNet) to learn efficient and diverse communication models for coordinating cooperative heterogeneous teams. Building on heterogeneous graph-attention networks, we show that HetNet not only facilitates learning heterogeneous collaborative policies per existing agent-class but also enables end-to-end training for learning highly efficient binarized messaging. In Proc. Autonomous Agents and Multiagent Systems (AAMAS). [26% Acceptance Rate]
Workshop/Symposium Papers and Doctoral Consortia
Yue Yang, Letian Chen, and Matthew Gombolay Safe Inverse Reinforcement Learning via Control Barrier Function \| Abstract Learning from Demonstration (LfD) is a powerful method for enabling robots to perform novel tasks as it is often more tractable for a non-roboticist end-user to demonstrate the desired skill and for the robot to efficiently learn from the associated data than for a human to engineer a reward function for the robot to learn the skill via reinforcement learning (RL). Safety issues arise in modern LfD techniques, e.g., Inverse Reinforcement Learning (IRL), just as they do for RL; yet, safe learning in LfD has received little attention. In the context of agile robots, safety is especially vital due to the possibility of robot-environment collision, robot-human collision, and damage to the robot. In this paper, we propose a safe IRL framework, CBFIRL, that leverages the Control Barrier Function (CBF) to enhance the safety of the IRL policy. The core idea of CBFIRL is to combine a loss function inspired by CBF requirements with the objective in an IRL method, both of which are jointly optimized via gradient descent. In the experiments, we show our framework performs safer compared to IRL methods without CBF, that is ~15% and ~20% improvement for two levels of difficulty of a 2D racecar domain and ~50% improvement for a 3D drone domain. In Proc. CoRL Agility Workshop.
Arjun Krishna, Zulfiqar Zaidi, Letian Chen, Rohan Paleja, Esmaeil Seraj, and Matthew Gombolay Utilizing Human Feedback for Primitive Optimization in Wheelchair Tennis \| Abstract Agile robotics presents a difficult challenge with robots moving at high speeds requiring precise and low-latency sensing and control. Creating agile motion that accomplishes the task at hand while being safe to execute is a key requirement for agile robots to gain human trust. This requires designing new approaches that are flexible and maintain knowledge over world constraints. In this paper, we consider the problem of building a flexible and adaptive controller for a challenging agile mobile manipulation task of hitting ground strokes on a wheelchair tennis robot. We propose and evaluate an extension to the work done on learning striking behaviors using a probabilistic movement primitive (ProMP) framework by (1) demonstrating the safe execution of learned primitives on an agile mobile manipulator setup, and (2) proposing an online primitive refinement procedure that utilizes evaluative feedback from humans on the executed trajectories. In Proc. CoRL Agility Workshop.
Nina Moorman, Erin Hedlund-Botti, and Matthew Gombolay Towards Cognitive Robots That People Accept in Their Home \| Abstract It is intractable for assistive robotics to have all functionalities pre-programmed prior to deployment. Rather, it is more realistic for robots to perform supplemental, on-site learning about the user’s needs and preferences and particularities of the environment. This additional learning is especially helpful for care robots that assist with individualized caregiver activities in residential or assisted living facilities. Each patient is unique, has differing needs, and those needs will change over time, so robots require the ability to adapt and learn. Many assistive robots, ranging in complexity from Roomba to Pepper, have the ability to conduct some of their learning in the home. In this work, we propose to investigate the impacts on end-users of observing this in situ learning through a series of human-subjects experiments. We will assess end-user attitudes towards embodied robots that conduct some learning in the home as compared to a baseline condition where the robot is delivered fully capable. We will additionally compare different modes of learning and interaction to determine whether some are more likely to instill trust. In Proc. AAAI Fall Symposium Series on AI-HRI.
Batuhan Altundas, Zheyuan Wang, and Matthew Gombolay Towards Learning Fast Human-Robot Coordination with Recurrent Neural Schedule Propagation On Heterogeneous Graphs \| Abstract As human-robot collaboration increases in the workforce, it becomes essential to coordinate co-robots efficiently, intuitively and safely around human coworkers. Traditional approaches for human-robot task scheduling either utilize exact methods that are intractable for large-scale problems and struggle to account for stochastic, time varying human task performance, or application-specific heuristics that require expert domain knowledge to develop. We propose a deep learning based framework, called HybridNet, combining a heterogeneous graph-based encoder with a recurrent schedule propagator for coordinating human-robot teams under temporospatial constraints. We model stochastic human learning performance through multiple iterations of the task-allocation problem and leverage Long Short-Term Memory (LSTM) models to propagate forward consequences of actions to carry out fast schedule generation, removing the need to interact with the environment between every task-agent pair selection. The HybridNet scheduler provides a computationally lightweight yet highly expressive model that is end-to-end trainable via Reinforcement Learning algorithms. We show that HybridNet outperforms other human-robot scheduling solutions and achieves faster runtime compared to pure-GNN-based models. In Proc. American Control Conference Workshop on Recent Advancement of Human Autonomy Interaction and Integration (ACC-HAI2).
Luis Pimentel, Rohan Paleja, Zheyuan Wang, Esmaeil Seraj, James Pagan, and Matthew Gombolay Scaling Multi-Agent Reinforcement Learning via State Upsampling \| Abstract We consider the problem of scaling Multi-Agent Reinforcement Learning (MARL) algorithms toward larger environments and team sizes. While it is possible to learn a MARL-synthesized policy on these larger problems from scratch, training is difficult as the joint state-action space is much larger. Policy learning will require a large amount of experience (and associated training time) to reach a target performance. In this paper, we propose a transfer learning method that accelerates the training performance in such high-dimensional tasks with increased complexity. Our method upsamples an agent’s state representation in a smaller, less challenging, source task in order to pre-train a target policy for a larger, more challenging, target task. By transferring the policy after pre-training and continuing MARL in the target domain, the information learned within the source task enables higher performance within the target task in significantly less time than training from scratch. As such, our method enables the scalability of coordination problems. Furthermore, as our method only changes the state representation of agents across tasks, it is agnostic to the policy’s architecture and can be deployed across different MARL algorithms. We provide results showing that a policy trained under our method is able to achieve up to a 7.88$\times$ performance improvement under the same amount of training time, compared to a policy trained from scratch. Moreover, our method enables learning in difficult target task settings where training from scratch fails. In Proc. RSS 2022 Workshop on Scaling Robot Learning (RSS22-SRL).
Andrew Hundt, William Agnew, Vicky Zang, Severin Kacianka, and Matthew Gombolay Robots Enact Malignant Stereotypes \| Abstract Stereotypes, bias, and discrimination have been extensively documented in Machine Learning (ML) methods such as Computer Vision (CV) [18, 80], Natural Language Processing (NLP) [6], or both, in the case of large image and caption models such as OpenAI CLIP [14]. In this paper, we evaluate how ML bias manifests in robots that physically and autonomously act within the world. We audit one of several recently published CLIP-powered robotic manipulation methods, presenting it with objects that have pictures of human faces on the surface which vary across race and gender, alongside task descriptions that contain terms associated with common stereotypes. Our experiments definitively show robots acting out toxic stereotypes with respect to gender, race, and scientifically-discredited physiognomy, at scale. Furthermore, the audited methods are less likely to recognize Women and People of Color. Our interdisciplinary sociotechnical analysis synthesizes across fields and applications such as Science Technology and Society (STS), Critical Studies, History, Safety, Robotics, and AI. We find that robots powered by large datasets and Dissolution Models (sometimes called “foundation models”, e.g. CLIP) that contain humans risk physically amplifying malignant stereotypes in general; and that merely correcting disparities will be insufficient for the complexity and scale of the problem. Instead, we recommend that robot learning methods that physically manifest stereotypes or other harmful outcomes be paused, reworked, or even wound down when appropriate, until outcomes can be proven safe, effective, and just. Finally, we discuss comprehensive policy changes and the potential of new interdisciplinary research on topics like Identity Safety Assessment Frameworks and Design Justice to better understand and address these harms. In Proc. RSS 2022 Workshop on Learning from Diverse, Offline Data (L-DOD).
Mariah Schrum, Erin Hedlund-Botti, and Matthew Gombolay Personalized Meta-Learning for Domain Agnostic Learning from Demonstration \| Abstract For robots to perform novel tasks in the real-world, they must be capable of learning from heterogeneous, non-expert human teachers across various domains. Yet, novice human teachers often provide suboptimal demonstrations, making it difficult for robots to successfully learn. Therefore, to effectively learn from humans, we must develop learning methods that can account for teacher suboptimality and can do so across various robotic platforms. To this end, we introduce Mutual Information Driven Meta-Learning from Demonstration (MIND MELD) [12, 13], a personalized meta-learning framework which meta-learns a mapping from suboptimal human feedback to feedback closer to optimal, conditioned on a learned personalized embedding. In a human subjects study, we demonstrate MIND MELD’s ability to improve upon suboptimal demonstrations and learn meaningful, personalized embeddings. We then propose Domain Agnostic MIND MELD, which learns to transfer the personalized embedding learned in one domain to a novel domain, thereby allowing robots to learn from suboptimal humans across disparate platforms (e.g., self-driving car or in-home robot). In Proc. International Conference on Human Robot Interaction (HRI) Pioneers Workshop. [29% Acceptance Rate]
Nina Moorman and Matthew Gombolay Do People Trust Robots that Learn in the Home? \| Abstract It is not scalable for assistive robotics to have all functionalities pre-programmed prior to user introduction. Instead, it is more realistic for agents to perform supplemental on site learning. This opportunity to learn user and environment particularities is especially helpful for care robots that assist with individualized caregiver activities in residential or nursing home environments. Many assistive robots, ranging in complexity from Roomba to Pepper, already conduct some of their learning in the home, observable to the user. We lack an understanding of how witnessing this learning impacts the user. Thus, we propose to assess end-user attitudes towards the concept of embodied robots that conduct some learning in the home as compared to robots that are delivered fully-capable. In this virtual, betweensubjects study, we recruit end users (care-givers and care-takers) from nursing homes, and investigate user trust in three different domains: navigation, manipulation, and preparation. Informed by the first study where we identify agent learning as a key factor in determining trust, we propose a second study to explore how to modulate that trust. This second, in-person study investigates the effectiveness of apologies, explanations of robot failure, and transparency of learning at improving trust in embodied learning robots. In Proc. HRI 2022 Workshop on Machine Learning in Human-Robot Collaboration (MLHRC).
Max Zuo, Logan Schick, Matthew Gombolay, and Nakul Gopalan Efficient Exploration via First-Person Behavior Cloning Assisted Rapidly-Exploring Random Trees \| Abstract Modern day computer games have extremely large state and action spaces. To detect bugs in these games’ models, human testers play the games repeatedly to explore the game and find errors in the games. Such game play is exhaustive and time consuming. Moreover, since robotics simulators depend on similar methods of model specification and debugging, the problem of finding errors in the model is of interest for the robotics community to ensure robot behaviors and interactions are consistent in simulators. Previous methods have used reinforcement learning [8] and search based methods [6] including Rapidly-exploring Random Trees (RRT) to explore a game’s state-action space to find bugs. However, such search and exploration based methods are not efficient at exploring the state-action space without a pre-defined heuristic. In this work we attempt to combine a human-tester’s expertise in solving games, and the exhaustiveness of RRT to search a game’s state space efficiently with high coverage. This paper introduces humanseeded RRT (HS-RRT) and behavior-cloning-assisted RRT (CARRT) in testing the number of game states searched and the time taken to explore those game states. We compare our methods to an existing weighted RRT [18] baseline for game exploration testing studied. We find HS-RRT and CA-RRT both explore more game states in fewer tree expansions/iterations when compared to the existing baseline. In each test, CA-RRT reached more states on average in the same number of iterations as RRT. In our tested environments, CA-RRT was able to reach the same number of states as RRT by more than 5000 fewer iterations on average, almost a 50% reduction. In Proc. HRI 2022 Workshop on Machine Learning in Human-Robot Collaboration (MLHRC).
Mariah Schrum, Erin Hedlund-Botti, and Matthew Gombolay Towards Improving Life-Long Learning Via Personalized, Reciprocal Teaching \| Abstract In a world with ubiquitous robots, robots will need to be personalizable and capable of learning novel tasks from humans throughout their deployment. However, research has shown that humans can be poor teachers, making it difficult for robots to effectively learn from humans. In prior work, we introduced Mutual Information Driven Meta-Learning from Demonstration (MIND MELD), which learns to map suboptimal human demonstrations to higher-quality demonstrations. While this work effectively accounts for suboptimality on novel tasks within a set distribution of calibration tasks, MIND MELD does not convey to the demonstrator the way in which the demonstrator is suboptimal. If the human could learn how to provide better demonstrations, then the human might be able to effectively teach a broader range of novel, out-of-distribution tasks where MIND MELD does not readily account for potential demonstration suboptimality. In this work, we introduce Reciprocal MIND MELD, a framework in which the robot learns the way in which a demonstrator is suboptimal and utilizes this information to provide feedback to the demonstrator to improve their demonstrations long-term. In a human-subjects experiment, we demonstrate that the robot can effectively improve how a human provides feedback (p < .001). Additionally, we show that humans trust the robot more (p = .014) and feel more team fluency when the robot provides helpful advice (p = .014). In Proc. HRI 2022 Workshop on Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI).
David Fernandez, Guillermo Grande, Sean Ye, Nakul Gopalan, and Matthew Gombolay Interactive Learning with Natural Language Feedback \| Abstract We seek to enable non-roboticist end-users to teach robots by examining how end-users can provide natural language feedback to the robot. We hypothesized that enabling users to use language to train an agent would be more intuitive as user’s don’t have to translate their intent through another system. We build upon Deep TAMER to allow users to provide feedback through natural language to a learning agent. Our algorithm includes (1) a Transformer based language model to map natural language feedback to scalar reward values and (2) a method to synthetically assign rewards to nearby state-action pairs that were unexplored by the agent. We report our results from a 2×4 mixed-subjects experiment design to evaluate the usability, workload, and trainablitity of our system compared to Deep TAMER on simulated tasks. While the experimenters were able to train an agent on both simulated environments to achieve competitive rewards, we could not show that natural language feedback significantly lowered workload, increase usability, or train better agents than baseline Deep TAMER with human subjects. This work indicates a need for further research into the types of feedback end-users prefer to use to train agents. In Proc. HRI 2022 Workshop on Participatory Design and End-User Programming for Human-Robot Interaction (PD/EUP).
Esmaeil Seraj and Matthew Gombolay Embodied Team Intelligence in Multi-Robot Systems \| Abstract High-performing human teams leverage intelligent and efficient communication and coordination strategies to collaboratively maximize their joint utility. Inspired by teaming behaviors among humans, I seek to develop computational methods for synthesizing intelligent communication and coordination strategies for collaborative multi-robot systems. I leverage both classical model-based control and planning approaches as well as data-driven methods such as Multi-Agent Reinforcement Learning (MARL) to provide several contributions towards enabling emergent cooperative teaming behavior across both homogeneous and heterogeneous (including agents with different capabilities) robot teams. In future work, I aim to investigate efficient ways to incorporate humans’ teaming strategies for robot teams and directly learn team coordination policies from human experts. In Proc. Autonomous Agents and Multiagent Systems (AAMAS) Doctoral Consortium.
Rohan Paleja and Matthew Gombolay Mutual Understanding in Human-Machine Teaming \| Abstract Collaborative robots (i.e., “cobots”) and machine learning-based virtual agents are increasingly entering the human workspace with the aim of increasing productivity, enhancing safety, and improving the quality of our lives. These agents will dynamically interact with a wide variety of people in dynamic and novel contexts, increasing the prevalence of human-machine teams in healthcare, manufacturing, and search-and-rescue. In this research, we enhance the mutual understanding within a human-machine team by enabling cobots to understand heterogeneous teammates via person-specific embeddings, identifying contexts in which xAI methods can help improve team mental model alignment, and enabling cobots to effectively communicate information that supports high-performance human-machine teaming. In Proc. Association for the Advancement of Artificial Intelligence Conference (AAAI) Doctoral Consortium.
Andrew Silva and Matthew Gombolay Empirically Evaluating Meta Learning of Robot Explainability with Humans \| Abstract As physically-embodied robots and digital assistants are deployed in the real world, these agents must be able to communicate their decision-making criteria to build trust, improve human-robot teaming, and enable collaboration. While the field of explainable machine learning has made great strides in building a set of mechanisms to enable such communication, these advancements often assume that one approach is ideally suited to one problem (e.g., decision trees are best for explaining how to triage patients in an emergency room), failing to recognize that individual users may have different past experiences or preferences. In this work, we present the design of a user study to evaluate a novel approach to personalization of robot explainability through meta-learning with humans. Our study will be the first to evaluate meta learning with humans in the loop and with multiple approaches to robot explainability. Our results will help to pave the way for academic and industry deployments of explainable machine learning to diverse user populations. In Proc. HRI 2022 Workshop Your Study Design (WYSD) Workshop.
Sravan Jayanthi, Letian Chen, and Matthew Gombolay Strategy Discovery and Mixture in Lifelong Learning from Heterogeneous Demonstration \| Abstract Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. A key challenge in LfD research is that users tend to provide heterogeneous demonstrations for the same task due to various strategies and preferences. Therefore, it is essential to develop LfD algorithms that ensure flexibility (the robot adapts to personalized strategies), efficiency (the robot achieves sample-efficient adaptation requiring only a few demonstrations by the user), and scalability (robot reuses a concise set of strategies to represent a large amount of behaviors). In this paper, we propose a novel algorithm, Dynamic Multi-Strategy Reward Distillation (DMSRD), which distills common knowledge between heterogeneous demonstrations, leverages learned strategies to construct mixtures policies, and continues to improve by learning from all available data. Our personalized, federated, and lifelong LfD architecture surpasses benchmarks in two continuous control problems with an average 62% improvement in policy returns, 50% improvement in log likelihood, and 36% decrease in the estimated KL divergence between learned behavior and demonstrations, alongside stronger task reward correlation and more precise strategy rewards. In Proc. 2022 AAAI Interactive Machine Learning workshop.
Thesis
Laura G. Strickland Coordinating Team Tactics for Swarm-vs.-Swarm Adversarial Games \| Abstract While swarms of UAVs have received much attention in the last few years, adversarial swarms (i.e., competitive, swarm-vs.-swarm games) have been less well studied. In this dissertation, I investigate the factors influential in team-vs.-team UAV aerial combat scenarios, elucidating the impacts of force concentration and opponent spread in the engagement space. Specifically, this dissertation makes the following contributions: (1) Tactical Analysis: Identifies conditions under which either explicitly-coordinating tactics or decentralized, greedy tactics are superior in engagements as small as 2-vs.-2 and as large as 10-vs.-10, and examines how these patterns change with the quality of the teams’ weapons; (2) Coordinating Tactics: Introduces and demonstrates a deep-reinforcement-learning framework that equips agents to learn to use their own and their teammates’ situational context to decide which pre-scripted tactics to employ in what situations, and which teammates, if any, to coordinate with throughout the engagement; the efficacy of agents using the neural network trained within this framework outperform baseline tactics in engagements against teams of agents employing baseline tactics in N-vs.-N engagements for N as small as two and as large as 64; and (3) Bio-Inspired Coordination: Discovers through Monte-Carlo agent-based simulations the importance of prioritizing the team’s force concentration against the most threatening opponent agents, but also of preserving some resources by deploying a smaller defense force and defending against lower-penalty threats in addition to high-priority threats to maximize the remaining fuel within the defending team’s fuel reservoir. Ph.D. Thesis: Doctor of Philosophy in Robotics.
Zheyuan Wang Learning Dynamic Priority Scheduling Policies with Graph Attention Networks \| Abstract The aim of this thesis is to develop novel graph attention network-based models to automatically learn scheduling policies for effectively solving resource optimization problems, covering both deterministic and stochastic environments. The policy learning methods utilize both imitation learning, when expert demonstrations are accessible at low cost, and reinforcement learning, when otherwise reward engineering is feasible. By parameterizing the learner with graph attention networks, the framework is computationally efficient and results in scalable resource optimization schedulers that adapt to various problem structures. This thesis addresses the problem of multi-robot task allocation (MRTA) under temporospatial constraints. Initially, robots with deterministic and homogeneous task performance are considered with the development of the RoboGNN scheduler. Then, I develop ScheduleNet, a novel heterogeneous graph attention network model, to efficiently reason about coordinating teams of heterogeneous robots. Next, I address problems under the more challenging stochastic setting in two parts. Part 1) Scheduling with stochastic and dynamic task completion times. The MRTA problem is extended by introducing human coworkers with dynamic learning curves and stochastic task execution. HybridNet, a hybrid network structure, has been developed that utilizes a heterogeneous graph-based encoder and a recurrent schedule propagator, to carry out fast schedule generation in multi-round settings. Part 2) Scheduling with stochastic and dynamic task arrival and completion times. With an application in failure-predictive plane maintenance, I develop a heterogeneous graph-based policy optimization (HetGPO) approach to enable learning robust scheduling policies in highly stochastic environments. Through extensive experiments, the proposed framework has been shown to outperform prior state-of-the-art algorithms in different applications. My research contributes several key innovations regarding designing graph-based learning algorithms in operations research. Ph.D. Thesis: Doctor of Philosophy in Electrical and Computer Engineering.
2021
Journal Papers
Zheyuan Wang, Chen Liu, and Matthew Gombolay Heterogeneous Graph Attention Networks for Scalable Multi-Robot Scheduling with Temporospatial Constraints \| Abstract Robot teams are increasingly being deployed in environments, such as manufacturing facilities and warehouses, to save cost and improve productivity. To efficiently coordinate multi-robot teams, fast, high-quality scheduling algorithms are essential to satisfy the temporal and spatial constraints imposed by dynamic task specification and part and robot availability. Traditional solutions include exact methods, which are intractable for large-scale problems, or application-specific heuristics, which require expert domain knowledge to develop. In this paper, we propose a novel heterogeneous graph attention network model, called ScheduleNet, to learn scheduling policies that overcome the limitations of conventional approaches. By introducing robot- and proximity-specific nodes into the simple temporal network encoding temporal constraints, we obtain a heterogeneous graph structure that is nonparametric in the number of tasks, robots and task resources or locations. We show that our model is end-to-end trainable via imitation learning on small-scale problems, and generalizes to large, unseen problems. Empirically, our method outperforms the existing state-of-the-art methods in a variety of testing scenarios involving both homogeneous and heterogeneous robot teams. Autonomous Robots.
Esmaeil Seraj, Letian Chen, and Matthew Gombolay A Hierarchical Coordination Framework for Joint Perception-Action Tasks in Composite Robot Teams \| Abstract We propose a collaborative planning and control algorithm to enhance cooperation for composite teams of autonomous robots in dynamic environments. Composite robot teams are groups of agents that perform different tasks according to their respective capabilities in order to accomplish an overarching mission. Examples of such teams include groups of perception agents (can only sense) and action agents (can only manipulate) working together to perform disaster response tasks. Coordinating robots in a composite team is a challenging problem due to the heterogeneity in the robots’ characteristics and their tasks. Here, we propose a coordination framework for composite robot teams. The proposed framework consists of two hierarchical modules: (1) a Multi-Agent State-Action-Reward-Time-State-Action (MA-SARTSA) algorithm in Multi-Agent Partially Observable Semi-Markov Decision Process (MA-POSMDP) as the high-level decision-making module to enable perception agents to learn to surveil in an environment with an unknown number of dynamic targets and (2) a low-level coordinated control and planning module that ensures probabilistically-guaranteed support for action agents. Simulation and physical robot implementations of our algorithms on a multi-agent robot testbed demonstrated the efficacy and feasibility of our coordination framework by reducing the overall operation times in a benchmark wildfire-fighting case-study. IEEE Transactions on Robotics.
Ruisen Liu, Manisha Natarajan, and Matthew Gombolay Coordinating Human-Robot Teams with Dynamic and Stochastic Task Proficiencies \| Abstract As robots become ubiquitous in the workforce, it is essential that human-robot collaboration be both intuitive and adaptive. A robot’s ability to coordinate team activities improves based on its ability to infer and reason about the dynamic (i.e., the “learning curve”) and stochastic task performance of its human counterparts. We introduce a novel resource coordination algorithm that enables robots to schedule team activities by 1) actively characterizing the task performance of their human teammates and 2) ensuring the schedule is robust to temporal constraints given this characterization. We first validate our modeling assumptions via user study. From this user study, we create a data-driven prior distribution over human task performance for our virtual and physical evaluations of human-robot teaming. Second, we show that our methods are scalable and produce high-quality schedules. Third, we conduct a between-subjects experiment (n=90) to assess the effects on a human-robot team of a robot scheduler actively exploring the humans’ task proficiency. Our results indicate that human-robot working alliance (p<0.001) and human performance (p=0.00359) are maximized when the robot dedicates more time to exploring the capabilities of human teammates. ACM Transactions on Human-Robot Interaction (THRI), Volume 11, Issue 1, Pages 1-42.
Conference Papers
Rohan Paleja, Muyleng Ghuy, Nadun Ranawaka, Reed Jensen, and Matthew Gombolay The Utility of Explainable AI in Ad Hoc Human-Machine Teaming \| Abstract Recent advances in machine learning have led to growing interest in Explainable AI (xAI) to enable humans to gain insight into the decision-making of machine learning models. Despite this recent interest, the utility of xAI techniques has not yet been characterized in human-machine teaming. Importantly, xAI offers the promise of enhancing team situational awareness (SA) and shared mental model development, which are the key characteristics of effective human-machine teams. Rapidly developing such mental models is especially critical in ad hoc human-machine teaming, where agents do not have a priori knowledge of others’ decision-making strategies. In this paper, we present two novel human-subject experiments quantifying the benefits of deploying xAI techniques within a human-machine teaming scenario. First, we show that xAI techniques can support SA ($p<0.05)$. Second, we examine how different SA levels induced via a collaborative AI policy abstraction affect ad hoc human-machine teaming performance. Importantly, we find that the benefits of xAI are not universal, as there is a strong dependence on the composition of the human-machine team. Novices benefit from xAI providing increased SA ($p<0.05$) but are susceptible to cognitive overhead ($p<0.05$). On the other hand, expert performance degrades with the addition of xAI-based support ($p<0.05$), indicating that the cost of paying attention to the xAI outweighs the benefits obtained from being provided additional information to enhance SA. Our results demonstrate that researchers must deliberately design and deploy the right xAI techniques in the right scenario by carefully considering human-machine team composition and how the xAI method augments SA. In Proc. Conference on Neural Information Processing Systems (NeurIPS). [26% Acceptance Rate]
Elias Stengel-Eskin, Andrew Hundt, Zhuohong He, Aditya Murali, Nakul Gopalan, Matthew Gombolay, and Gregory Hager Guiding Multi-Step Rearrangement Tasks with Natural Language Instructions \| Abstract Enabling human operators to interact with robotic agents using natural language would allow non-experts to intuitively instruct these agents. Towards this goal, we propose a novel Transformer-based model which enables a user to guide a robot arm through a 3D multi-step manipulation task with natural language commands. Our system maps images and commands to masks over grasp or place locations, grounding the language directly in perceptual space. In a suite of block rearrangement tasks, we show that these masks can be combined with an existing manipulation framework without re-training, greatly improving learning efficiency. Our masking model is several orders of magnitude more sample efficient than typical Transformer models, operating with hundreds, not millions, of examples. Our modular design allows us to leverage supervised and reinforcement learning, providing an easy interface for experimentation with different architectures. Our model completes block manipulation tasks with synthetic commands 530% more often than a UNet-based baseline, and learns to localize actions correctly while creating a mapping of symbols to perceptual input that supports compositional reasoning. We provide a valuable resource for 3D manipulation instruction following research by porting an existing 3D block dataset with crowdsourced language to a simulated environment. Our method’s 25.3% absolute improvement in identifying the correct block on the ported dataset demonstrates its ability to handle syntactic and lexical variation. In Proc. Conference on Robot Learning (CoRL). [38% Acceptance Rate]
Andrew Hundt, Aditya Murali, Priyanka Hubli, Ran Liu, Nakul Gopalan, Matthew Gombolay, and Gregory Hager “Good Robot! Now Watch This!”: Repurposing Reinforcement Learning for Task-to-Task Transfer \| Abstract Modern Reinforcement Learning (RL) algorithms are not sample efficient to train on multi-step tasks in complex domains, impeding their wider deployment in the real world. We address this problem by leveraging the insight that RL models trained to complete one set of tasks can be re-purposed to complete related tasks when given just a handful of demonstrations. Based upon this insight, we propose See-SPOT-Run (SSR), a new computational approach to robot learning that enables a robot to complete a variety of real robot tasks in novel problem domains without task-specific training. SSR uses pretrained RL models to create vectors that represent model, task, and action relevance in demonstration and test scenes. SSR then compares these vectors via our Cycle Consistency Distance (CCD) metric to determine the next action to take. SSR completes 58% more task steps and 20% more trials than a baseline few-shot learning method that requires task-specific training. SSR also achieves a four order of magnitude improvement in compute efficiency and a 20% to three order of magnitude improvement in sample efficiency compared to the baseline and to training RL models from scratch. To our knowledge, we are the first to address multi-step tasks from demonstration on a real robot without task-specific training, where both the visual input and action space output are high dimensional. In Proc. Conference on Robot Learning (CoRL). [38% Acceptance Rate]
Roger Dias, Marco Zenati, Geoff Rance, Rithy Srey, David Arney, Letian Chen, Rohan Paleja, Lauren Kennedy-Metz, and Matthew Gombolay Using Machine Learning to Predict Perfusionists’ Critical Decision-Making during Cardiac Surgery \| Abstract The cardiac surgery operating room is a high-risk and complex environment in which multiple experts work as a team to provide safe and excellent care to patients. During the cardiopulmonary bypass phase of cardiac surgery, critical decisions need to be made and the perfusionists play a crucial role in assessing available information and taking a certain course of action. In this paper, we report the findings of a simulation-based study using machine learning to build predictive models of perfusionists’ decision-making during critical situations in the operating room (OR). Performing 30-fold cross-validation across 30 random seeds, our machine learning approach was able to achieve an accuracy of 78.2% (95% confidence interval: 77.8% to 78.6%) in predicting perfusionists’ actions, having access to only 148 simulations. The findings from this study may inform future development of computerised clinical decision support tools to be embedded into the OR, improving patient safety and surgical outcomes. In. Computer Methods in Biomechanics and Biomedical Engineering.
Erin Hedlund, Michael Johnson, and Matthew Gombolay The Effects of a Robot’s Performance on Human Teachers for Learning from Demonstration Tasks \| Abstract Learning from Demonstration (LfD) algorithms seek to enable end-users to teach robots new skills through human demonstration of a task. Previous studies have analyzed how robot failure affects human trust, but not in the context of the human teaching the robot. In this paper, we investigate how human teachers react to robot failure in an LfD setting. We conduct a study in which participants teach a robot how to complete three tasks, using one of three instruction methods, while the robot is pre-programmed to either succeed or fail at the task. We find that when the robot fails, people trust the robot less (p<.001) and themselves less (p=.004) and they believe that others will trust them less (p<.001). Human teachers also have a lower impression of the robot and themselves (p<.001) and found the task more difficult when the robot fails (p<.001). Motion capture was found to be a less difficult instruction method than teleoperation (p=.016), while kinesthetic teaching gave the teachers the lowest impression of themselves compared to teleoperation (p=.017) and motion capture (p<.001). Importantly, a mediation analysis showed that people’s trust in themselves is heavily mediated by what they think that others — including the robot — think of them (p<.001). These results provide valuable insights to improving the human-robot relationship for LfD. In Proc. International Conference on Human-Robot Interaction (HRI). [23% Acceptance Rate]
Mariah Schrum, Glen Neville, Michael Johnson, Nina Moorman, Rohan Paleja, Karen Feigh, and Matthew Gombolay Effects of Social Factors and Team Dynamics on Adoption of Collaborative Robot Autonomy* \| Abstract As automation becomes more prevalent, the fear of job loss due to automation increases. Workers may not be amenable to working with a robotic co-worker due to a negative perception of the technology. The attitudes of workers towards automation are influenced by a variety of complex and multi-faceted factors such as intention to use, perceived usefulness and other external variables. In an analog manufacturing environment, we explore how these various factors influence an individual’s willingness to work with a robot over a human co-worker in a collaborative Lego building task. We specifically explore how this willingness is affected by: 1) the level of social rapport established between the individual and his or her human co-worker, 2) the anthropomorphic qualities of the robot, and 3) factors including trust, fluency and personality traits. Our results show that a participant’s willingness to work with automation decreased due to lower perceived team fluency (p=0.045), rapport established between a participant and their co-worker (p=0.003), the gender of the participant being male (p=0.041), and a higher inherent trust in people (p=0.018). In Proc. International Conference on Human-Robot Interaction (HRI). [23% Acceptance Rate]
Esmaeil Seraj, Vahid Azimi, Chaouki Abdallah, Seth Hutchinson and Matthew Gombolay Adaptive Leader-Follower Control for Multi-Robot Teams with Uncertain Network Structure \| Abstract Traditionally-designed, centralized or decentralized control architectures typically rely on the availability of communication channels between neighboring robots as well as a known, static network structure to tightly coordinate their actions in order to achieve global consensus. Unfortunately, communication constraints and network disconnectivity are key bottlenecks in such approaches, leading to the failure of conventional centralized or decentralized networked controllers in achieving stability and global consensus. To overcome these limitations, we develop a centralized, coordinated-control structure for multi-robot teams with uncertain network structure. Our novel approach enables multi-robot teams to achieve consensus even with disconnected communication graphs. Leveraging model reference adaptive control framework and networked control architectures, we develop a coordinated leader-follower consensus controller capable of overcoming communication losses within the team, handling non-communicative robots, and compensating for environmental noise. We prove the stability of our controller and empirically validate our approach by analyzing the effects of reference graph structures and environmental noise on the performance of robot team for navigation tasks. Finally, we demonstrate our novel controller in a multi-robot testbed. In Proc. American Control Conference (ACC).
Yaru Niu, Rohan Paleja, and Matthew Gombolay Multi-Agent Graph-Attention Communication and Teaming \| Abstract High-performing teams learn effective communication strategies to judiciously share information and reduce the cost of communication overhead. Within multi-agent reinforcement learning, synthesizing effective policies requires reasoning about when to communicate, whom to communicate with, and how to process messages. We propose a novel multi-agent reinforcement learning algorithm, Multi-Agent Graph-attentIon Communication (MAGIC), with a graph-attention communication protocol in which we learn 1) a Scheduler to help with the problems of when to communicate and whom to address messages to, and 2) a Message Processor using Graph Attention Networks (GATs) with dynamic graphs to deal with communication signals. The Scheduler consists of a graph attention encoder and a differentiable attention mechanism, which outputs dynamic, differentiable graphs to the Message Processor, which enables the Scheduler and Message Processor to be trained end-to-end. We evaluate our approach on a variety of cooperative tasks, including Google Research Football. Our method outperforms baselines across all domains, achieving $\approx 10\%$ increase in reward in the most challenging domain. We also show MAGIC communicates $23.2\%$ more efficiently than the average baseline, is robust to stochasticity, and scales to larger state-action spaces. Finally, we demonstrate MAGIC on a physical, multi-robot testbed. In Proc. Autonomous Agents and Multiagent Systems (AAMAS). [25% Acceptance Rate]
Andrew Silva and Matthew Gombolay Encoding Human Domain Knowledge to Warm Start Reinforcement Learning \| Abstract Deep reinforcement learning has seen great success across a breadth of tasks, such as in game playing and robotic manipulation. However, the modern practice of attempting to learn tabula rasa disregards the logical structure of many domains and the wealth of readily available knowledge from domain experts that could help “warm start” the learning process. Further, learning from demonstration techniques are not yet efficient enough to infer this knowledge through sampling-based mechanisms in large state and action spaces. We present a new reinforcement learning architecture that can encode expert knowledge, in the form of propositional logic, directly into a neural, tree-like structure of fuzzy propositions amenable to gradient descent and show that our novel architecture is able to outperform reinforcement and imitation learning techniques across an array of reinforcement learning challenges. We further conduct a user study to solicit expert policies from a variety of humans and find that humans are able to specify policies that provide a higher quality reward both before and after training relative to baseline methods, demonstrating the utility of our approach. In Proc. Conference on Artificial Intelligence (AAAI). [21% Acceptance Rate]
Laura Strickland, Charles Pippin, and Matthew Gombolay Learning to Steer Swarm-vs.-Swarm Engagements \| Abstract UAVs are becoming increasingly commonplace, and with their growing popularity, the question of how to counter a swarm of UAVs operated by bad actors becomes more critical. In this paper, we explore the possibility of using a team of fixed-wing UAVs to counter an adversarial swarm of fixed-wing UAVs. To learn to coordinate counter-swarm tactics, we propose Situation-Dependent Option-action Evaluation (SDOE), a distributed and scalable actor-critic RL architecture. Our approach enables each UAV to evaluate options over a set of scripted tactics as well as the option to maneuver freely, allowing for emergent team behavior. A key to the scalability of our approach is a novel, distributed neural network architecture that enables agents to share situational awareness and select tactics in a pairwise fashion, allowing agents to choose who to coordinate with, when, and how regardless of the size of the swarm. We test agents trained with our approach in simulated engagements of up to 16-vs.-16 UAVs, and find that, even as the size of the engagement increases, the agents trained using SDOE against a greedy, non-coordinating tactic win engagements against a team of greedy agents more reliably than another team of greedy agents. In Proc. American Institute of Aeronautics and Astronautics (AIAA) SciTech 2021 Forum.
Michael Johnson, Ruisen Liu, Nakul Gopalan, and Matthew Gombolay An Approach to Human-Robot Collaborative Drilling and Fastening in Aerospace Final Assembly \| Abstract The aerospace manufacturing industry is increasingly adopting automated machinery to accomplish labor intensive tasks and to meet growing production demands. Traditionally, production floors are filled with fixed-installation robots that are not easily adapted to hanging needs. In this work, we present an approach to using a collaborative robot to complete drilling and fastening tasks that can adapt to new environments by leveraging a human operator and expert demonstrator. The human trains the robot to complete the task autonomously by defining its environment and providing the robot demonstrations on how to locate, classify, and insert fasteners into a fuselage. The system begins with no information and uses offline and online learning techniques to develop a data bank of relevant information to improve the insertion process within the workspace. We show the results of unit tests that evaluate the multiple steps to the learning-execution process and draw conclusions from our observations. In Proc. American Institute of Aeronautics and Astronautics (AIAA) SciTech 2021 Forum.
Workshop/Symposium Papers
Yaru Niu, Rohan Paleja, and Matthew Gombolay MAGIC: Multi-Agent Graph-Attention Communication \| Abstract High-performing teams learn effective communication strategies to judiciously share information and reduce the cost of communication overhead. Within multi-agent reinforcement learning, synthesizing effective policies requires reasoning about when to communicate, whom to communicate with, and how to process messages. We propose a novel multi-agent reinforcement learning algorithm, Multi-Agent Graph-attentIon Communication (MAGIC), with a graph-attention communication protocol in which we learn 1) a Scheduler to help with the problems of when to communicate and whom to address messages to, and 2) a Message Processor using Graph Attention Networks (GATs) with dynamic graphs to deal with communication signals. The Scheduler consists of a graph attention encoder and a differentiable attention mechanism, which outputs dynamic, differentiable graphs to the Message Processor, which enables the Scheduler and Message Processor to be trained end-to-end. We evaluate our approach on a variety of cooperative tasks, including Google Research Football. Our method outperforms baselines across all domains, achieving $\approx 10\%$ increase in reward in the most challenging domain. We also show MAGIC communicates $23.2\%$ more efficiently than the average baseline, is robust to stochasticity, and scales to larger state-action spaces. Finally, we demonstrate MAGIC on a physical, multi-robot testbed. In Proc. ICCV 2021 Workshop on Multi-Agent Interaction and Relational Reasoning. [Spotlight Talk] [Best Paper Award]
Vanya Cohen, Geraud Nangue Tasse, Nakul Gopalan, Steven James, Matthew Gombolay, and Benjamin Rosman Learning to Follow Language Instructions with Compositional Policies \| Abstract We propose a framework that learns to execute natural language instructions in an environment consisting of goalreaching tasks that share components of their task descriptions. Our approach leverages the compositionality of both value functions and language, with the aim of reducing the sample complexity of learning novel tasks. First, we train a reinforcement learning agent to learn value functions that can be subsequently composed through a Boolean algebra to solve novel tasks. Second, we fine-tune a seq2seq model pretrained on web-scale corpora to map language to logical expressions that specify the required value function compositions. Evaluating our agent in the BabyAI domain, we observe a decrease of 86% in the number of training steps needed to learn a second task after mastering a single task. Results from ablation studies further indicate that it is the combination of compositional value functions and language representations that allows the agent to quickly generalize to new tasks. In Proc. AAAI Artificial Intelligence for Human-Robot Interaction (AI-HRI) Fall Symposium.
Letian Chen, Rohan Paleja, and Matthew Gombolay Towards Sample-efficient Apprenticeship Learning from Suboptimal Demonstration \| Abstract Learning from Demonstration (LfD) seeks to democratize robotics by enabling non-roboticist end-users to teach robots to perform novel tasks by providing demonstrations. However, as demonstrators are typically non-experts, modern LfD techniques are unable to produce policies much better than the suboptimal demonstration. A previously-proposed framework, SSRR, has shown success in learning from suboptimal demonstration but relies on noise-injected trajectories to infer an idealized reward function. A random approach such as noise-injection to generate trajectories has two key drawbacks: 1) Performance degradation could be random depending on whether the noise is applied to vital states and 2) Noise-injection generated trajectories may have limited suboptimality and therefore will not accurately represent the whole scope of suboptimality. We present Systematic Self-Supervised Reward Regression, S3RR, to investigate systematic alternatives for trajectory degradation. In Proc. AAAI Artificial Intelligence for Human-Robot Interaction (AI-HRI) Fall Symposium.
Mariah Schrum, Erin Hedlund, and Matthew Gombolay Improving Robot-Centric Learning from Demonstration via Personalized Embeddings \| Abstract Learning from demonstration (LfD) techniques seek to enable novice users to teach robots novel tasks in the real world. However, prior work has shown that robot-centric LfD approaches, such as Dataset Aggregation (DAgger), do not perform well with human teachers. DAgger requires a human demonstrator to provide corrective feedback to the learner either in real-time, which can result in degraded performance due to suboptimal human labels, or in a post hoc manner which is time intensive and often not feasible. To address this problem, we present Mutual Information-driven Metalearning from Demonstration (MIND MELD), which metalearns a mapping from poor quality human labels to predicted ground truth labels, thereby improving upon the performance of prior LfD approaches for DAgger-based training. The key to our approach for improving upon suboptimal feedback is mutual information maximization via variational inference. Our approach learns a meaningful, personalized embedding via variational inference which informs the mapping from human provided labels to predicted ground truth labels. We demonstrate our framework in a synthetic domain and in a human-subjects experiment, illustrating that our approach improves upon the corrective labels provided by a human demonstrator by 63%. In Proc. AAAI Artificial Intelligence for Human-Robot Interaction (AI-HRI) Fall Symposium.
Andrew Silva, Pradyumna Tambwekar, and Matthew Gombolay Towards a Comprehensive Understanding and Accurate Evaluation of Societal Biases in Pre-Trained Transformers \| Abstract The ease of access to pre-trained transformers has enabled developers to leverage large-scale language models to build exciting applications for their users. While such pre-trained models offer convenient starting points for researchers and developers, there is little consideration for the societal biases captured within these model risking perpetuation of racial, gender, and other harmful biases when these models are deployed at scale. In this paper, we investigate gender and racial bias across ubiquitous pre-trained language models, including GPT-2, XLNet, BERT, RoBERTa, ALBERT and DistilBERT. We evaluate bias within pre-trained transformers using three metrics: WEAT, sequence likelihood, and pronoun ranking. We conclude with an experiment demonstrating the ineffectiveness of word-embedding techniques, such as WEAT, signaling the need for more robust bias testing in transformers In Proc. North American Chapter of the Association for Computational Linguistics.
Rohan Paleja, Andrew Silva, Letian Chen, and Matthew Gombolay Interpretable and Personalized Apprenticeship Scheduling: Learning Interpretable Scheduling Policies from Heterogeneous User Demonstrations \| Abstract Resource scheduling and coordination is an NP-hard optimization requiring an efficient allocation of agents to a set of tasks with upper- and lower bound temporal and resource constraints. Due to the large-scale and dynamic nature of resource coordination in hospitals and factories, human domain experts manually plan and adjust schedules on the fly. To perform this job, domain experts leverage heterogeneous strategies and rules-of-thumb honed over years of apprenticeship. What is critically needed is the ability to extract this domain knowledge in a heterogeneous and interpretable apprenticeship learning framework to scale beyond the power of a single human expert, a necessity in safety-critical domains. We propose a personalized and interpretable apprenticeship scheduling algorithm that infers an interpretable representation of all human task demonstrators by extracting decision-making criteria specified by an inferred, personalized embedding without constraining the number of decision-making strategies. We achieve near-perfect LfD accuracy in synthetic domains and 88.22% accuracy on a real-world planning domain, outperforming baselines. Further, a user study conducted shows that our methodology produces both interpretable and highly usable models (p < 0.05). In Proc. AAMAS Autonomous Robots and Multirobot Systems (ARMS) Workshop.
2020
Journal Papers
Zheyuan Wang and Matthew Gombolay Learning Scheduling Policies for Multi-Robot Coordination with Graph Attention Networks \| Abstract Increasing interest in integrating advanced robotics within manufacturing has spurred a renewed concentration in developing real-time scheduling solutions to coordinate human-robot collaboration in this environment. Traditionally, the problem of scheduling agents to complete tasks with temporal and spatial constraints has been approached either with exact algorithms, which are computationally intractable for large-scale, dynamic coordination, or approximate methods that require domain experts to craft heuristics for each application.We seek to overcome the limitations of these conventional methods by developing a novel graph attention network-based scheduler to automatically learn features of scheduling problems towards generating high-quality solutions. To learn effective policies for combinatorial optimization problems, we combine imitation learning, which makes use of expert demonstration on small problems, with graph neural networks, in a non-parametric framework, to allow for fast, near-optimal scheduling of robot teams with various sizes, while generalizing to large, unseen problems. Experimental results showed that our network-based policy was able to find high-quality solutions for ~90% of the testing problems involving scheduling 2–5 robots and up to 100 tasks, which significantly outperforms prior state-of-the-art, approximate methods. Those results were achieved with affordable computation cost and up to 100x less computation time compared to exact solvers. IEEE Robotics and Automation Letters, Volume 5, Issue 3, pages 4509-4516.
Mariah Schrum and Matthew C. Gombolay When Your Robot Breaks: Active Learning During Plant Failure \| Abstract Detecting and adapting to catastrophic failures in robotic systems requires a robot to learn its new dynamics quickly and safely to best accomplish its goals. To address this challenging problem, we propose probabilistically-safe, online learning techniques to infer the altered dynamics of a robot at the moment a failure (e.g., physical damage) occurs. We combine model predictive control and active learning within a chance-constrained optimization framework to safely and efficiently learn the new plant model of the robot. We leverage a neural network for function approximation in learning the latent dynamics of the robot under failure conditions. Our framework generalizes to various damage conditions while being computationally light-weight to advance real-time deployment. We empirically validate within a virtual environment that we can regain control of a severely damaged aircraft in seconds and require only 0.1 seconds to find safe, information-rich trajectories, outperforming state-of-the-art approaches. IEEE Robotics and Automation Letters.
Conference Papers
Mariah Schrum, Eric Yoon, Matthew Gombolay, Sangwook Yoon A Deep Learning Approach to Efficiently Triaging Spine Surgery Patients Based Upon Computerized Intake Questionnaires \| Abstract This is the first study investigating the efficacy of using intake questionnaires for triaging patients for spine surgery via deep neural networks. Our results show that we are able to tune the algorithm to prioritize capturing surgical patients or reducing nonsurgical patients. This system can be adapted to different priorities in an automated manner and to suit the needs of individual providers or spine center as a whole. Since this is an automated system, this system is also scalable without increasing per triage cost. In Proc. Lumbar Spine Research Society (LSRS). [Podium Talk]
Letian Chen, Rohan Paleja, and Matthew Gombolay Learning from Suboptimal Demonstration via Self-Supervised Reward Regression \| Abstract Learning from Demonstration (LfD) seeks to democratize robotics by enabling non-roboticist end-users to teach robots to perform a task by providing a human demonstration. However, modern LfD techniques, e.g. inverse reinforcement learning (IRL), assume users provide at least stochastically optimal demonstrations. This assumption fails to hold in most real-world scenarios. Recent attempts to learn from sub-optimal demonstration leverage pairwise rankings and following the Luce-Shepard rule. However, we show these approaches make incorrect assumptions and thus suffer from brittle, degraded performance. We overcome these limitations in developing a novel approach that bootstraps off suboptimal demonstrations to synthesize optimality-parameterized data to train an idealized reward function. We empirically validate we learn an idealized reward function with ~0.95 correlation with ground-truth reward versus ~0.75 for prior work. We can then train policies achieving ~200% improvement over the suboptimal demonstration and ~90% improvement over prior work. We present a physical demonstration of teaching a robot a topspin strike in table tennis that achieves 32% faster returns and 40% more topspin than user demonstration. In Proc. Conference on Robot Learning (CoRL). [34% Acceptance Rate] [Plenary Talk] [Best Paper Finalist]
Rohan Paleja, Andrew Silva, Letian Chen, and Matthew Gombolay Interpretable and Personalized Apprenticeship Scheduling: Learning Interpretable Scheduling Policies from Heterogeneous User Demonstrations \| Abstract Resource scheduling and coordination is an NP-hard optimization requiring an efficient allocation of agents to a set of tasks with upper- and lower bound temporal and resource constraints. Due to the large-scale and dynamic nature of resource coordination in hospitals and factories, human domain experts manually plan and adjust schedules on the fly. To perform this job, domain experts leverage heterogeneous strategies and rules-of-thumb honed over years of apprenticeship. What is critically needed is the ability to extract this domain knowledge in a heterogeneous and interpretable apprenticeship learning framework to scale beyond the power of a single human expert, a necessity in safety-critical domains. We propose a personalized and interpretable apprenticeship scheduling algorithm that infers an interpretable representation of all human task demonstrators by extracting decision-making criteria specified by an inferred, personalized embedding without constraining the number of decision-making strategies. We achieve near-perfect LfD accuracy in synthetic domains and 88.22% accuracy on a real-world planning domain, outperforming baselines. Further, a user study conducted shows that our methodology produces both interpretable and highly usable models (p < 0.05). In Proc. Conference on Neural Information Processing Systems (NeurIPS). [20% Acceptance Rate]
Zheyuan Wang and Matthew Gombolay Heterogeneous Graph Attention Networks for Scalable Multi-Robot Scheduling with Temporospatial Constraints \| Abstract Robot teams are increasingly being deployed in environments, such as manufacturing facilities and warehouses, to save cost and improve productivity. To efficiently coordinate multi-robot teams, fast, high-quality scheduling algorithms are essential to satisfy the temporal and spatial constraints imposed by dynamic task specification and part and robot availability. Traditional solutions include exact methods, which are intractable for large-scale problems, or application-specific heuristics, which require expert domain knowledge to develop. In this paper, we propose a novel heterogeneous graph attention network model, called ScheduleNet. By introducing robot- and proximity-specific nodes into the simple temporal network encoding temporal constraints, we obtain a heterogeneous graph structure that is nonparametric in the number of tasks, robots and task resources or locations. We show that our model is end-to-end trainable via imitation learning on small-scale problems, generalizing to large, unseen problems. Empirically, our method outperforms the existing state-of-the-art methods in a variety of testing scenarios. In Proc. Robotics: Science and Systems (RSS). [32% Acceptance Rate]
Mariah L. Schrum, Michael Johnson, Muyleng Ghuy, and Matthew C. Gombolay denotes co-first authors Four Years in Review: Statistical Practices of Likert Scales in Human-Robot Interaction Studies \| Abstract As robots become more prevalent, the importance of the field of human-robot interaction (HRI) grows accordingly. As such, we should endeavor to employ the best statistical practices. Likert scales are commonly used metrics in HRI to measure perceptions and attitudes. Due to misinformation or honest mistakes, most HRI researchers do not adopt best practices when analyzing Likert data. We conduct a review of psychometric literature to determine the current standard for Likert scale design and analysis. Next, we conduct a survey of four years of the International Conference on Human-Robot Interaction (2016 through 2019) and report on incorrect statistical practices and design of Likert scales. During these years, only 3 of the 110 papers applied proper statistical testing to correctly-designed Likert scales. Our analysis suggests there are areas for meaningful improvement in the design and testing of Likert scales. Lastly, we provide recommendations to improve the accuracy of conclusions drawn from Likert data. In Proc. Companion of the International Conference on Human-Robot Interaction (HRI). [alt.HRI Track] [19% Acceptance Rate]
Letian Chen, Rohan Paleja, Muyleng Ghuy, and Matthew C. Gombolay Joint Goal and Strategy Inference across Heterogeneous Demonstrators via Reward Network Distillation \| Abstract Reinforcement learning (RL) has achieved tremendous success as a general framework for learning how to make decisions. However, this success relies on the interactive hand-tuning of a reward function by RL experts. On the other hand, inverse reinforcement learning (IRL) seeks to learn a reward function from readily-obtained human demonstrations. Yet, IRL suffers from two major limitations: 1)reward ambiguity – there are an infinite number of possible re-ward functions that could explain an expert’s demonstration and 2) heterogeneity-human experts adopt varying strategies and preferences, which makes learning from multiple demonstrators difficult due to the common assumption that demonstrators seeks to maximize the same reward. In this work, we propose a method to jointly infer a task goal and humans’ strategic preferences via network distillation. This approach enables us to distill a robust task reward (addressing reward ambiguity) and to model each strategy’s objective (handling heterogeneity). We demonstrate our algorithm can better recover task reward and strategy rewards and imitate the strategies two simulated tasks and a real-world table tennis task. In Proc. International Conference on Human-Robot Interaction (HRI). [24% Acceptance Rate]
Manisha Natarajan and Matthew C. Gombolay Effects of Anthropormorphism and Accountability on Trust in Human Robot Interaction \| Abstract This paper examines how people’s trust and dependence on robot teammates providing decision support varies as a function of different attributes of the robot, such as perceived anthropomorphism, type of support provided by the robot, and its physical presence. We conduct a mixed-design user study with multiple robots to investigate trust, inappropriate reliance, and compliance measures in the context of a time-constrained game. We also examine how the effect of human accountability addresses errors due to over-compliance in the context of human robot interaction (HRI). This study is novel as it involves examining multiple attributes at once, thus enabling us to perform multi-way comparisons between different attributes on trust and compliance with the agent. Results from the 4x4x2x2 study show that behavior and anthropomorphism of the agent are the most significant factors in predicting the trust and compliance with the robot. Furthermore, adding a coalition-building preface, where the agent provides context to why it might make errors while giving advice, leads to an increase in trust for specific behaviors of the agent. In Proc. International Conference on Human-Robot Interaction (HRI). [24% Acceptance Rate]
Sean Ye, Glen Neville, Mariah Schrum, Matthew Gombolay, Sonia Chernova, and Ayanna Howard Human Trust after Robot Mistakes: Study of the Effects of Different Forms of Robot Communication \| Abstract Collaborative robots that work alongside humans will experience service breakdowns and make mistakes. These robotic failures can cause a degradation of trust between the robot and the community being served. A loss of trust may impact whether a user continues to rely on the robot for assistance. In order to improve the teaming capabilities between humans and robots, forms of communication that aid in developing and maintaining trust need to be investigated. In our study, we identify four forms of communication which dictate the timing of information given and type of initiation used by a robot. We investigate the effect that these forms of communication have on trust with and without robot mistakes during a cooperative task. Participants played a memory task game with the help of a humanoid robot that was designed to make mistakes after a certain amount of time passed. The results showed that participants’ trust in the robot was better preserved when that robot offered advice only upon request as opposed to when the robot took initiative to give advice. In Proc. International Conference on Robot and Human Interactive Communication (RO-MAN).
Esmaeil Seraj and Matthew C. Gombolay Coordinated Control of UAVs for Human-Centered Active Sensing of Wildfires \| Abstract Fighting wildfires is a precarious task, imperiling the lives of engaging firefighters and those who reside in the fire’s path. Firefighters need online and dynamic observation of the firefront to anticipate a wildfire’s unknown characteristics, such as size, scale, and propagation velocity, and to plan accordingly. In this paper, we propose a distributed control framework to coordinate a team of unmanned aerial vehicles (UAVs) for a human-centered active sensing of wildfires. We develop a dual-criterion objective function based on Kalman uncertainty residual propagation and weighted multi-agent consensus protocol, which enables the UAVs to actively infer the wildfire dynamics and parameters, track and monitor the fire transition, and safely manage human firefighters on the ground using acquired information. We evaluate our approach relative to prior work, showing significant improvements by reducing the environment’s cumulative uncertainty residual by more than $ 10^2 $ and $ 10^5 $ times in firefront coverage performance to support human-robot teaming for firefighting. We also demonstrate our method on physical robots in a mock firefighting exercise. In Proc. The American Control Conference (ACC). [Best Student Paper Finalist]
Andrew Silva, Ivan Rodriguez-Jimenez, Taylor Killian, Sung-Hyun Son, and Matthew Gombolay Optimization Methods for Interpretable Differentiable Decision Trees in Reinforcement Learning \| Abstract Decision trees are ubiquitous in machine learning for their ease of use and interpretability. Yet, these models are not typically employed in reinforcement learning as they cannot be updated online via stochastic gradient descent. We overcome this limitation by allowing for a gradient update over the entire tree that improves sample complexity affords interpretable policy extraction. First, we include theoretical motivation on the need for policy-gradient learning by examining the properties of gradient descent over differentiable decision trees. Second, we demonstrate that our approach equals or outperforms a neural network on all domains and can learn discrete decision trees online with average rewards up to 7x higher than a batch-trained decision tree. Third, we conduct a user study to quantify the interpretability of a decision tree, rule list, and a neural network with statistically significant results (p < 0.001). In Proc. The International Conference on Artificial Intelligence and Statistics (AISTATS).
Workshop/Symposium Papers
Ruisen Liu, Matthew Gombolay, and Stephen Balakirsky Towards Unpaired Human-to-Robot Demonstration Translation Learning Novel Tasks \| Abstract Advancements in autonomy can enhance space flight and exploration by enabling robots as cost-efficient agents when humans are unavailable. However, long-term mission success may require continuous maintenance and the ability to adapt on the fly. When encountering a novel scenario that is outside expected robot capabilities, it becomes valuable for a non-robotics expert to be able to visually demonstrate the intended task execution to the robot. Relying on visual demonstration introduces ambiguity in mapping from human to robot execution. One mapping approach is to learn unpaired image translations from human demonstrations and unrelated robot motions. In this paper, we target extensions to image translation to enable robust conveyance of desired task execution. We propose methods to ground generated images with truth in kinematic feasibility, without imposing additional data collection or computational requirements on the demonstrator. In Proc. ICSR Workshop Human Robot Interaction for Space Robotics (HRI-SR).
Yi Ting Sam, Manisha Natarajan, and Matthew Gombolay Stress and Performance in Human-Robot Space Teleoperation Tasks \| Abstract This paper investigates the relationship between stress, workload, and performance in robot teleoperation tasks. The investigation is motivated by the need to develop humanaware robot autonomy for space exploration. Based on prior work, the relationship between stress and performance follows an inverted-U, i.e. there exists an optimal level of stress where performance is maximized. We present a pilot study that utilizes real-time stress sensors on participants undergoing six rounds of stress-inducing or reducing conditions. The performance of the participants is recorded and analyzed with stress levels. We evaluate the relationship between stress (perceived and physiological), workload and performance across three teleoperation tasks. We find that the variation in stress is not significant across different rounds but do observe significance for perceived workload (p < 0.001), stress (p < 0.05), and respiration rate (p < 0.01) for a teleoperation task that requires continuous maneuvering to navigate the robotic arm through a maze. We propose an improved experimental design to better characterize the stress-performance relationship. In Proc. ICSR Workshop Human Robot Interaction for Space Robotics (HRI-SR).
Thesis
Letian Chen Robot Learning from Heterogeneous Demonstration. \| Abstract Learning from Demonstration (LfD) has become a ubiquitous and user-friendly technique to teach a robot how to perform a task (e.g., playing Ping Pong) without the need to use a traditional programming language (e.g., C++). As these systems are increasingly being placed in the hands of everyday users, researchers are faced with the reality that end-users are a heterogeneous population with varying levels of skills and experiences. This heterogeneity violates almost universal assumptions in LfD algorithms that demonstrations given by users are near-optimal and uniform in how the task is accomplished. In this thesis, I present algorithms to tackle two specific types of heterogeneity: heterogeneous strategy and heterogeneous performance. First, I present Multi-Strategy Reward Distillation (MSRD), which tackles the problem of learning from users who have adopted heterogeneous strategies. MSRD extracts separate task reward and strategy reward, which represents task specification and demonstrator’s strategic preference, respectively. We are able to extract the task reward that has 0.998 and 0.943 correlation with ground-truth reward on two simulated robotic tasks and successfully deploy it on a real-robot table-tennis task. Second, I develop two algorithms to address the problem of learning from suboptimal demonstration: SSRR and OP-AIRL. SSRR is a novel mechanism to regress over noisy demonstrations to infer an idealized reward function. OP-AIRL is a mechanism to learn a policy that more effectively teases out ambiguity from sub-optimal demonstrations. By combining SSRR with OP-AIRL, we are able to achieve a 688% and a 254% improvement over state-of-the-art on two simulated robot tasks. M.S. Thesis: Master of Science in Computer Science
2019
Journal Papers
Matthew Gombolay, Toni Golen, Neel Shah, and Julie Shah Queueing theoretic analysis of labor and delivery \| BibTeX @article{Gombolay:2019c, title={Queueing theoretic analysis of labor and delivery}, author={Gombolay, Matthew and Golen, Toni and Shah, Neel and Shah, Julie}, journal={Health Care Management Science}, pages={1–18}, year={2019}, publisher={Springer} } \| Abstract Childbirth is a complex clinical service requiring the coordinated support of highly trained healthcare professionals as well as management of a finite set of critical resources (such as staff and beds) to provide safe care. The mode of delivery (vaginal delivery or cesarean section) has a significant effect on labor and delivery resource needs. Further, resource management decisions may impact the amount of time a physician or nurse is able to spend with any given patient. In this work, we employ queueing theory to model one year of transactional patient information at a tertiary care center in Boston, Massachusetts. First, we observe that the M/G/∞ model effectively predicts patient flow in an obstetrics department. This model captures the dynamics of labor and delivery where patients arrive randomly during the day, the duration of their stay is based on their individual acuity, and their labor progresses at some rate irrespective of whether they are given a bed. Second, using our queueing theoretic model, we show that reducing the rate of cesarean section – a current quality improvement goal in American obstetrics – may have important consequences with regard to the resource needs of a hospital. We also estimate the potential financial impact of these resource needs from the hospital perspective. Third, we report that application of our model to an analysis of potential patient coverage strategies supports the adoption of team-based care, in which attending physicians share responsibilities for patients. Health Care Management Science, 22(1), pp.16-33.
Workshop/Symposium Papers
Mariah Schrum and Matthew C. Gombolay Improving Clinical Care of Pediatric Cerebral Palsy Patients with Inverse Reinforcement Learning \| Abstract Cerebral palsy (CP) patients exhibit pathological gait patterns as a result of a variety of neuromuscular defects. These gait patterns are typically used to inform therapeutic treatment, yet outcomes vary significantly among individuals within a gait class. We investigate inverse reinforcement learning as an approach to discover latent features of CP gait to help clinicians better understand an individual patient’s pathology and aid in clinical decision making. Furthermore, we develop deep reinforcement learning techniques that can prescribe ways in which a patient’s gait might be altered to help a patient better achieve their ideal gait. In Proc. ICRA Workshop Human Movement Science for Physical Human-Robot Collaboration.
Rohan Paleja and Matthew C. Gombolay Heterogeneous Learning from Demonstration \| Abstract The development of human-robot systems able to leverage the strengths of both humans and their robotic counterparts has been greatly sought after because of the foreseen, broad-ranging impact across industry and research. We believe the true potential of these systems cannot be reached unless the robot is able to act with a high level of autonomy, reducing the burden of manual tasking or teleoperation. To achieve this level of autonomy, robots must be able to work fluidly with its human partners, inferring their needs without explicit commands. This inference requires the robot to be able to detect and classify the heterogeneity of its partners. We propose a framework for learning from heterogeneous demonstration based upon Bayesian inference and evaluate a suite of approaches on a real-world dataset of gameplay from StarCraft II. This evaluation provides evidence that our Bayesian approach can outperform conventional methods by up to 12.8%. In Proc. International Conference on Human Robot Interaction (HRI) Pioneers Workshop. [32% Acceptance Rate]
arXiV Papers
Esmaeil Seraj, Andrew Silva, and Matthew C. Gombolay Safe Coordination of Human-Robot Firefighting Teams \| Abstract Wildfires are destructive and inflict massive, irreversible harm to victims’ lives and natural resources. Researchers have proposed commissioning unmanned aerial vehicles (UAVs) to provide firefighters with real-time tracking information; yet, these UAVs are not able to reason about a fire’s track, including current location, measurement, and uncertainty, as well as propagation. We propose a model-predictive, probabilistically safe distributed control algorithm for human-robot collaboration in wildfire fighting. The proposed algorithm overcomes the limitations of prior work by explicitly estimating the latent fire propagation dynamics to enable intelligent, time-extended coordination of the UAVs in support of on-the-ground human firefighters. We derive a novel, analytical bound that enables UAVs to distribute their resources and provides a probabilistic guarantee of the humans’ safety while preserving the UAVs’ ability to cover an entire fire. In: arXiv preprint arXiv:1903.06847.
2018
Journal Papers
Matthew C. Gombolay, Reed Jensen, Jessica Stigile, Toni Golen, Neel Shah, Sung-Hyun Son, and Julie A. Shah Human-Machine Collaborative Optimization via Apprenticeship Scheduling \| BibTeX @article{DBLP:journals/corr/abs-1805-04220, author = {Matthew C. Gombolay and Reed Jensen and Jessica Stigile and Toni Golen and Neel Shah and Sung{-}Hyun Son and Julie A. Shah}, title = {Human-Machine Collaborative Optimization via Apprenticeship Scheduling}, journal = {CoRR}, volume = {abs/1805.04220}, year = {2018}, url = {http://arxiv.org/abs/1805.04220}, archivePrefix = {arXiv}, eprint = {1805.04220}, timestamp = {Mon, 13 Aug 2018 16:48:02 +0200}, biburl = {https://dblp.org/rec/bib/journals/corr/abs-1805-04220}, bibsource = {dblp computer science bibliography, https://dblp.org}} \| Abstract Coordinating agents to complete a set of tasks with intercoupled temporal and resource constraints is computationally challenging, yet human domain experts can solve these difficult scheduling problems using paradigms learned through years of apprenticeship. A process for manually codifying this domain knowledge within a computational framework is necessary to scale beyond the “single-expert, single-trainee” apprenticeship model. However, human domain experts often have difficulty describing their decision-making processes, causing the codification of this knowledge to become laborious. We propose a new approach for capturing domain-expert heuristics through a pairwise ranking formulation. Our approach is model-free and does not require enumerating or iterating through a large state space. We empirically demonstrate that this approach accurately learns multifaceted heuristics on a synthetic data set incorporating job-shop scheduling and vehicle routing problems, as well as on two real-world data sets consisting of demonstrations of experts solving a weapon-to-target assignment problem and a hospital resource allocation problem. We also demonstrate that policies learned from human scheduling demonstration via apprenticeship learning can substantially improve the efficiency of a branch-and-bound search for an optimal schedule. We employ this human-machine collaborative optimization technique on a variant of the weapon-to-target assignment problem. We demonstrate that this technique generates solutions substantially superior to those produced by human domain experts at a rate up to 9.5 times faster than an optimization approach and can be applied to optimally solve problems twice as complex as those solved by a human demonstrator. Journal of Artificial Intelligence Research, 63, 1-49.
Rose Molina, Matthew C. Gombolay, Jennifer Jonas, Anna M. Modest, Julie A. Shah, Toni H. Golen, and Neel T. Shah Association Between Labor and Delivery Unit Census and Delays in Patient Management: Findings From a Computer Simulation Module \| Preprint \| BibTeX @article{Molina:2018, author={Rose Molina, Matthew C. Gombolay, Jennifer Jonas, Anna M. Modest, Julie A. Shah, Toni H. Golen, and Neel T. Shah} title={Association between Labor and Delivery Unit Census and delays in patient management: Findings from a computer simulation module}, journal={Obstetrics and gynecology}, url={https://pubmed.ncbi.nlm.nih.gov/29420404/}, publisher={U.S. National Library of Medicine}, year={2018}} \| Abstract OBJECTIVE: To demonstrate the association between increases in labor and delivery unit census and delays in patient care decisions using a computer simulation module. METHODS: This was an observational cohort study of labor and delivery unit nurse managers. We developed a computer module that simulates the physical layout and clinical activity of the labor and delivery unit at our tertiary care academic medical center, in which players act as clinical managers in dynamically allocating nursing staff and beds as patients arrive, progress in labor, and undergo procedures. We exposed nurse managers to variation in patient census and measured the delays in resource decisions over the course of a simulated shift. We used mixed logistic and linear regression models to analyze the associations between patient census and delays in patient care. RESULTS: Thirteen nurse managers participated in the study and completed 17 12-hour shifts, or 204 simulated hours of decision-making. All participants reported the simulation module reflected their real-life experiences at least somewhat well. We observed 1.47-increased odds (95% CI 1.18-1.82) of recommending a patient ambulate in early labor for every additional patient on the labor and delivery unit. For every additional patient on the labor and delivery unit, there was a 15.9-minute delay between delivery and transfer to the postpartum unit (95% CI 2.4-29.3). For every additional patient in the waiting room, we observed a 33.3-minute delay in the time patients spent in the waiting room (95% CI 23.2-43.5) and a 14.3-minute delay in moving a patient in need of a cesarean delivery to the operating room (95% CI 2.8-25.8). CONCLUSION: Increasing labor and delivery unit census is associated with patient care delays in a computer simulation. Computer simulation is a feasible and valid method of demonstrating the sensitivity of care decisions to shifts in patient volume. Obstetrics and Gynecology, volume 131, number 3, pages 545-552.
Matthew C. Gombolay, Ron J. Wilcox, and Julie A. Shah Fast Scheduling of Robot Teams Performing Tasks with Temporospatial Constraints \| Abstract The application of robotics to traditionally manual manufacturing processes requires careful coordination between human and robotic agents in order to support safe and efficient coordinated work. Tasks must be allocated to agents and sequenced according to temporal and spatial constraints. Also, systems must be capable of responding on-the-fly to disturbances and people working in close physical proximity to robots. In this paper, we present a centralized algorithm, named “Tercio,” that handles tightly intercoupled temporal and spatial constraints. Our key innovation is a fast, satisficing multi-agent task sequencer inspired by real-time processor scheduling techniques and adapted to leverage hierarchical problem structure. We use this sequencer in conjunction with a MILP solver and empirically demonstrate the ability to generate near-optimal schedules for real-world problems an order of magnitude larger than those reported in prior art. Finally, we demonstrate the use of our algorithm in a multi-robot hardware testbed. IEEE Transactions on Robotics (IEEE T-RO), volume 34, number 1, pages 220-239.
Conference Papers
Joseph Kim, Matthew E. Woicik, Matthew C. Gombolay, Sung-Hyun Son, and Julie A. Shah Learning to Infer Final Plans in Human Team Planning \| Abstract We envision an intelligent agent that analyzes conversations during human team meetings in order to infer the team’s plan, with the purpose of providing decision support to strengthen that plan. We present a novel learning technique to infer teams’ final plans directly from a processed form of their planning conversation. Our method employs reinforcement learning to train a model that maps features of the discussed plan and patterns of dialogue exchange among participants to a final, agreed-upon plan. We employ planning domain models to efficiently search the large space of possible plans, and the costs of candidate plans serve as the reinforcement signal. We demonstrate that our technique successfully infers plans within a variety of challenging domains, with higher accuracy than prior art. With our domain-independent feature set, we empirically demonstrate that our model trained on one planning domain can be applied to successfully infer team plans within a novel planning domain. In Proc. International Joint Conference on Artificial Intelligence (IJCAI). [20% Acceptance Rate]
2017
Journal Papers
Matthew C. Gombolay, Reed Jensen, and Sung-Hyun Son Machine Learning Techniques for Analyzing Training Behavior in Serious Gaming \| BibTeX @ARTICLE{Gombolay:2017d, author={Matthew Gombolay and Reed Jensen and Sung-Hyun Son}, title={Machine Learning Techniques for Analyzing Training Behavior in Serious Gaming} journal={IEEE Transactions on Computational Intelligence and AI in Games (T-CIAIG)}, month = {Accepted September 2017, To Appear}, year={2017} } \| Abstract Training time is a costly, scarce resource across domains such as commercial aviation, healthcare, and military operations. In the context of military applications, serious gaming – the training of warfighters through immersive, realtime environments rather than traditional classroom lectures – offers benefits to improve training not only in its hands-on development and application of knowledge, but also in data analytics via machine learning. In this paper, we explore an array of machine learning techniques that allow teachers to visualize the degree to which training objectives are reflected in actual play. First, we investigate the concept of discovery: learning how warfighters utilize their training tools and develop military strategies within their training environment. Second, we develop machine learning techniques that could assist teachers by automatically predicting player performance, identifying player disengagement, and recommending personalized lesson plans. These methods could potentially provide teachers with insight to assist them in developing better lesson plans and tailored instruction for each individual student. IEEE Transactions on Computational Intelligence and AI in Games (T-CIAIG), number 99, pages 1-12.
Matthew C. Gombolay, Anna Bair, Cindy Huang, and Julie A. Shah Computational Design of Mixed-Initiative Human-Robot Teaming that Considers Human Factors: Situational Awareness, Workload, and Workflow Preferences \| BibTeX @ARTICLE{Gombolay:2017e, author={Matthew Gombolay and Anna Bair and Cindy Huang and Julie Shah}, title={Computational Design of Mixed-Initiative Human-Robot Teaming that Considers Human Factors\: Situational Awareness, Workload, and Workflow Preferences}, journal={International Journal of Robotics Research (IJRR)}, month = {Accepted December 2016, To Appear}, year={2017} } \| Abstract Advancements in robotic technology are making it increasingly possible to integrate robots into the human workspace in order to improve productivity and decrease worker strain resulting from the performance of repetitive, arduous physical tasks. While new computational methods have significantly enhanced the ability of people and robots to work flexibly together, there has been little study into the ways in which human factors influence the design of these computational techniques. In particular, collaboration with robots presents unique challenges related to preserving human situational awareness and optimizing workload allocation for human teammates while respecting their workflow preferences. We conducted a series of three human subject experiments to investigate these human factors, and provide design guidelines for the development of intelligent collaborative robots based on our results. International Journal of Robotics Research (IJRR), volume 36, issue 5-7, pages 598-617.
Conference Papers
Matthew C. Gombolay, Xi Jessie Yang, Brad Hayes, Nicole Seo, Zixi Liu, Samir Wadhwania, Tania Yu, Neel Shah, Toni Golen, and Julie A. Shah Robotic Assistance in Coordination of Patient Care \| BibTeX @article{Gombolay:2017c, title={Queueing theoretic analysis of labor and delivery}, author={Gombolay, Matthew and Golen, Toni and Shah, Neel and Shah, Julie}, journal={Health Care Management Science}, pages={1–18}, year={2017}, publisher={Springer} } \| Abstract We conducted a study to investigate trust in and dependence upon robotic decision support among nurses and doctors on a labor and delivery floor. There is evidence that suggestions provided by embodied agents engender inappropriate degrees of trust and reliance among humans. This concern is a critical barrier that must be addressed before fielding intelligent hospital service robots that take initiative to coordinate patient care. Our experiment was conducted with nurses and physicians, and evaluated the subjects’ levels of trust in and dependence on high- and low-quality recommendations issued by robotic versus computer-based decision support. The support, generated through action-driven learning from expert demonstration, was shown to produce high-quality recommendations that were accepted by nurses and physicians at a compliance rate of 90%. Rates of Type I and Type II errors were comparable between robotic and computer-based decision support. Furthermore, embodiment appeared to benefit performance, as indicated by a higher degree of appropriate dependence after the quality of recommendations changed over the course of the experiment. These results support the notion that a robotic assistant may be able to safely and effectively assist in patient care. Finally, we conducted a pilot demonstration in which a robot assisted resource nurses on a labor and delivery floor at a tertiary care center In Proc. Robotics: Science and Systems (RSS). [24% Acceptance Rate]
Matthew C. Gombolay, Jessica Stigile, Reed Jensen, Sung-Hyun Son, and Julie A. Shah Apprenticeship Scheduling: Learning to Schedule from Human Experts \| BibTeX @inproceedings{Gombolay:2016a, author = {Matthew Gombolay and Reed Jensen and Jessica Stigile and Sung-Hyun Son and Julie Shah}, title = {Decision-Making Authority, Team Efficiency and Human Worker Satisfaction in Mixed Human-Robot Teams}, booktitle = {Proceedings of the International Joint Conference on Artificial Intelligence ({IJCAI})}, address = {New York City, NY, U.S.A.}, month = {July 9-15}, year = {2016} } \| Abstract Training time is a costly, scarce resource across domains such as commercial aviation, healthcare, and military operations. In the context of military applications, serious gaming – the training of warfighters through immersive, realtime environments rather than traditional classroom lectures – offers benefits to improve training not only in its hands-on development and application of knowledge, but also in data analytics via machine learning. In this paper, we explore an array of machine learning techniques that allow teachers to visualize the degree to which training objectives are reflected in actual play. First, we investigate the concept of discovery: learning how warfighters utilize their training tools and develop military strategies within their training environment. Second, we develop machine learning techniques that could assist teachers by automatically predicting player performance, identifying player disengagement, and recommending personalized lesson plans. These methods could potentially provide teachers with insight to assist them in developing better lesson plans and tailored instruction for each individual student. In Proc. International Joint Conference on Artificial Intelligence (IJCAI). [25% Acceptance Rate]
Workshop/Symposium Papers
Matthew C. Gombolay, Reed Jensen, Jessica Stigile, Sung-Hyun Son, and Julie Shah Learning to Tutor from Expert Demonstrators via Apprenticeship Scheduling \| BibTeX @inproceedings{Gombolay:2017b, author = {Matthew Gombolay and Reed Jensen and Jessica Stigile and Sung-Hyun Son and Julie Shah}, title = {Learning to Tutor from Expert Demonstration via Apprenticeship Scheduling}, booktitle = {Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI) Workshop on Human-Machine Collaborative Learning (HMCL)}, address = {San Francisco, California}, month = {February 4}, year = {2017}, } \| Abstract We have conducted a study investigating the use of automated tutors for educating players in the context of serious gaming (i.e., game designed as a professional training tool). Historically, researchers and practitioners have developed automated tutors through a process of manually codifying domain knowledge and translating that into a human-interpretable format. This process is laborious and leaves much to be desired. Instead, we seek to apply novel machine learning techniques to, first, learn a model from domain experts’ demonstrations how to solve such problems, and, second, use this model to teach novices how to think like experts. In this work, we present a study comparing the performance of an automated and a traditional, manually-constructed tutor. To our knowledge, this is the first investigation using learning from demonstration techniques to learn from experts and use that knowledge to teach novices. In Proc. AAAI Workshop on Human-Machine Collaborative Learning.
Rose Molina, Matthew C. Gombolay, Jennifer Jonas, Julie Shah, Toni Golen, and Neel T. Shah Learning to Infer Final Plans in Human Team Planning \| Abstract INTRODUCTION: Clinical managers in charge of labor and delivery units are challenged by the need to allocate scarce resources, such as beds and nursing staff, when there are surges in patient volume. When particularly busy, managers may resort to a variety of strategies, such as calling in additional staff or delaying new admissions. The thresholds for applying strategies that preserve labor floor resources have not been previously well defined and may have significant implications for patient safety. METHODS: We developed a virtual labor floor environment in JAVA to simulate dynamic labor floor conditions, including the expected waxing and waning of patient acuity and volume over the course of a nursing shift. We recorded an inter-professional cohort of clinicians making resource allocation decisions over multiple simulated shifts. Using logistic regression, we determined the odds of delaying admission for patients in early labor when labor floor occupancy varied. RESULTS: Eight nurses played the game for 20 minutes on average, which translates to 64 simulated hours of decision-making. In a logistic regression model, we found there is an increased odds of delaying admissions for early labor as the percent of labor and delivery bed occupancy increases (unadjusted odds ratio 1.11, 95% confidence interval 1.03, 1.19). CONCLUSION: Early labor admissions were more often delayed with increasing bed occupancy, indicating that the care of patients on labor and delivery units may be sensitive to the total occupancy of the unit. Additional research is needed to understand the impact of resource management on patient safety. In: Poster Presentation, ACOG Annual Clinical and Scientific Meeting.
Thesis
Matthew C. Gombolay Human-Machine Collaborative Optimization via Apprenticeship Scheduling \| BibTeX @PHDTHESIS{Gombolay:2017a, author={Matthew Gombolay}, title={Human-Machine Collaborative Optimization via Apprenticeship Scheduling}, school={Department of Aeronautics and Astronautics, Massachusetts Institute of Technology}, month = {January}, year={2017a} } \| Abstract I envision a future where intelligent service robots become integral members of human-robot teams in the workplace. Today, service robots are being deployed across a wide range of settings; however, while these robots exhibit basic navigational abilities, they lack the ability to anticipate and adapt to the needs of their human teammates. I believe robots must be capable of autonomously learning from humans how to integrate into a team ‘ la a human apprentice. Human domain experts and professionals become experts over years of apprenticeship, and this knowledge is not easily codified in the form of a policy. In my thesis, I develop a novel computational technique, Collaborative Optimization Via Apprenticeship Scheduling (COVAS), that enables robots to learn a policy to capture an expert’s knowledge by observing the expert solve scheduling problems. COVAS can then leverage the policy to guide branch-and-bound search to provide globally optimal solutions faster than state-of-the-art optimization techniques. Developing an apprenticeship learning technique for scheduling is challenging because of the complexities of modeling and solving scheduling problems. Previously, researchers have sought to develop techniques to learn from human demonstration; however, these approaches have rarely been applied to scheduling because of the large number of states required to encode the possible permutations of the problem and relevant problem features (e.g., a job’s deadlines, required resources, etc.). My thesis gives robots a novel ability to serve as teammates that can learn from and contribute to coordinating a human-robot team. The key to COVAS’ ability to efficiently and optimally solve scheduling problems is the use of a novel policy-learning approach – apprenticeship scheduling – suited for imitating the method an expert uses to generate the schedule. This policy learning technique uses pairwise comparisons between the action taken by a human expert (e.g., schedule agent a to complete task [tau]i at time t) and each action not taken (e.g., unscheduled tasks at time t), at each moment in time, to learn the relevant model parameters and scheduling policies demonstrated in training examples provided by the human experts. I evaluate my technique in two real-world domains. First, I apply apprenticeship scheduling to the problem of anti-ship missile defense: protecting a naval vessel from an enemy attack by deploying decoys and countermeasures at the right place and time. I show that apprenticeship scheduling can learn to defend the ship, outperforming human experts on the majority of naval engagements (p < 0.011). Further, COVAS is able to produce globally optimal solutions an order of magnitude faster than traditional, state-of-the-art optimization techniques. Second, I apply apprenticeship scheduling to learn how to function as a resource nurse: the nurse in charge of ensuring the right patient is in the right type of room at the right time and that the right types of nurses are there to care for the patient. After training an apprentice scheduler on demonstrations given by resource nurses, I found that nurses and physicians agreed with the algorithm’s advice 90% of the time. Next, I conducted a series of human-subject experiments to understand the human factors consequences of embedding scheduling algorithms in robotic platforms. Through these experiments, I found that an embodied platform (i.e., a physical robot) engenders more appropriate trust and reliance in the system than an un-embodied one (i.e., computer-based system) when the scheduling algorithm works with human domain experts. However, I also found that increasing robot autonomy degrades human situational awareness. Further, there is a complex interplay between workload and workflow preferences that must be balanced to maximize team fluency. Based on these findings, I develop design guidelines for integrating service robots with autonomous decision-making capabilities into the human workplace. Ph.D. Thesis: Doctor of Philosophy in Autonomous Systems.
2016
Conference Papers
Giancarlo Sturla, Matthew C. Gombolay, and Julie A. Shah Incremental Scheduling with Upper and Lowerbound Temporospatial Constraints \| BibTeX @inproceedings{Sturla:2016, author = {Giancarlo Sturla and Matthew Gombolay and Julie Shah}, title = {Incremental Scheduling with Upper and Lowerbound Temporospatial Constraints}, booktitle = {Proceedings of AIAA SciTech}, month = {January}, year = {2016}, } In Proc. AIAA SciTech
Workshop/Symposium Papers and Doctoral Consortia
Matthew C. Gombolay and Julie A. Shah Apprenticeship Scheduling for Human-Robot Teams. In Proc. Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence (AAAI-16). Doctoral Consortium, 2016. [39% Acceptance Rate]
Matthew C. Gombolay and Ankit Shah Appraisal of Statistical Practices in HRI vis-a-vis the T-Test for Likert Items/Scales \| BibTeX @inproceedings{Gombolay:2016c, author = {Matthew Gombolay and Ankit Shah}, title = {Appraisal of Statistical Practices in HRI vis-\Â´{a}-vis the T-Test for Likert Items/Scales}, booktitle = {Proceedings of AAAI Fall symposium Series on Artificial Intelligence for Human-Robot Interaction (AI-HRI)}, address = {Arlington, Virginia}, month = {November 17â€“19}, year = {2016}, } \| Abstract Likert items and scales are often used in human subject studies to measure subjective responses of subjects to the treatment levels. In the field of human-robot interaction (HRI), with few widely accepted quantitative metrics, researchers often rely on Likert items and scales to evaluate their systems. However, there is a debate on what is the best statistical method to evaluate the differences between experimental treatments based on Likert item or scale responses. Likert responses are ordinal and not interval, meaning, the differences between consecutive responses to a Likert item are not equally spaced quantitatively. Hence, parametric tests like ttest, which require interval and normally distributed data, are often claimed to be statistically unsound in evaluating Likert response data. The statistical purist would use non-parametric tests, such as the Mann-Whitney U test, to evaluate the differences in ordinal datasets; however, non-parametric tests sacrifice the sensitivity in detecting differences a more conservative specificity – or false positive rate. Finally, it is common practice in the field of HRI to sum up similar individual Likert items to form a Likert scale and use the t-test or ANOVA on the scale seeking the refuge of the central limit theorem. In this paper, we empirically evaluate the validity of the ttest vs. the Mann-Whitney U test for Likert items and scales. We conduct our investigation via Monte Carlo simulation to quantify sensitivity and specificity of the tests. In Proc. AAAI Fall Symposium Series on AI-HRI.
2015
Conference Papers
Matthew C. Gombolay, Reymundo A. Gutierrez, Shanelle G. Clarke, Giancarlo F. Sturla, and Julie A. Shah Decision-Making Authority, Team Efficiency and Human Worker Satisfaction in Mixed Human-Robot Teams \| BibTeX @ARTICLE{Gombolay:2015a, author={Gombolay, Matthew and Gutierrez, Reymundo and Clarke, Shanelle and Sturla, Giancarlo and Shah, Julie}, title={Decision-making authority, team efficiency and human worker satisfaction in mixed human-robot teams}, journal={Autonomous Robots}, issn={0929-5593}, volume={39}, number={3}, doi={10.1007/s10514-015-9457-9}, url={http://dx.doi.org/10.1007/s10514-015-9457-9}, publisher={Springer US}, pages={293-312}, year={2015} } \| Abstract In manufacturing, advanced robotic technology has opened up the possibility of integrating highly autonomous mobile robots into human teams. However, with this capability comes the issue of how to maximize both team efficiency and the desire of human team members to work with these robotic counterparts. To address this concern, we conducted a set of experiments studying the effects of shared decision-making authority in human-robot and human-only teams. We found that an autonomous robot can outperform a human worker in the execution of part or all of the process of task allocation (p < 0.001 for both), and that people preferred to cede their control authority to the robot (p < 0.001). We also established that people value human teammates more than robotic teammates; however, providing robots authority over team coordination more strongly improved the perceived value of these agents than giving similar authority to another human teammate (p < 0.001). In post-hoc analysis, we found that people were more likely to assign a disproportionate amount of the work to themselves when working with a robot (p < 0.01) rather than human teammates only. Based upon our findings, we provide design guidance for roboticists and industry practitioners to design robotic assistants for better integration into the human workplace. In: Autonomous Robots, volume 39, issue 3, pages 293-312.
2014
Journal Papers
Matthew C. Gombolay and Julie A. Shah Schedulability Analysis of Task Sets with Upper- and Lower-Bound Temporal Constraints \| BibTeX @ARTICLE{Gombolay:2014d, author={Matthew Gombolay and Julie Shah}, title={Schedulability Analysis of Task Sets with Upper- and Lower-Bound Temporal Constraints}, journal={Journal of Aerospace Information Systems (JAIS)}, volume={11}, number={12}, pages={821-841}, month = {December}, year={2014} } \| Abstract Increasingly, real-time systems must handle the self-suspension of tasks (that is, lower-bound wait times between subtasks) in a timely and predictable manner. A fast schedulability test that does not significantly overestimate the temporal resources needed to execute self-suspending task sets would be of benefit to these modern computing systems. In this paper, a polynomial-time test is presented that is known to be the first to handle nonpreemptive selfsuspending task sets with hard deadlines, where each task has any number of self-suspensions. To construct the test, a novel priority scheduling policy is leveraged, the jth subtask first, which restricts the behavior of the self-suspending model to provide an analytical basis for an informative schedulability test. In general, the problem of sequencing according to both upper-bound and lower-bound temporal constraints requires an idling scheduling policy and is known to be nondeterministic polynomial-time hard. However, the tightness of the schedulability test and scheduling algorithm are empirically validated, and it is shown that the processor is able to effectively use up to 95% of the self-suspension time to execute tasks. Journal of Aerospace Information Systems, volume 11, number 12, pages 821-841.
Conference Papers
Matthew C. Gombolay, Reymundo A. Gutierrez, Giancarlo F. Sturla, and Julie A. Shah Decision-Making Authority, Team Efficiency and Human Worker Satisfaction in Mixed Human-Robot Teams \| BibTeX @inproceedings{Gombolay:2014b, author = {Matthew Gombolay and Reymundo Gutierrez and Giancarlo Sturla and Julie Shah}, title = {Decision-Making Authority, Team Efficiency and Human Worker Satisfaction in Mixed Human-Robot Teams}, booktitle = {Proceedings of Robots: Science and Systems (RSS)}, address = {Berkeley, California}, month = {July 12-16}, year = {2014} } \| Abstract In manufacturing, advanced robotic technology has opened up the possibility of integrating highly autonomous mobile robots into human teams. However, with this capability comes the issue of how to maximize both team efficiency and the desire of human team members to work with these robotic counterparts. To address this concern, we conducted a set of experiments studying the effects of shared decision-making authority in human-robot and human-only teams. We found that an autonomous robot can outperform a human worker in the execution of part or all of the process of task allocation (p < 0.001 for both), and that people preferred to cede their control authority to the robot (p < 0.001). We also established that people value human teammates more than robotic teammates; however, providing robots authority over team coordination more strongly improved the perceived value of these agents than giving similar authority to another human teammate (p < 0.001). In post-hoc analysis, we found that people were more likely to assign a disproportionate amount of the work to themselves when working with a robot (p < 0.01) rather than human teammates only. Based upon our findings, we provide design guidance for roboticists and industry practitioners to design robotic assistants for better integration into the human workplace. In Proc. Robotics: Science and Systems (RSS). [32% Acceptance Rate]
Workshop/Symposium Papers
Matthew C. Gombolay and Julie A. Shah Increasing the Adoption of Autonomous Robotic Teammates in Collaborative Manufacturing \| Abstract Advancements in robotic technology are opening up the opportunity to integrate robot workers into the labor force to increase productivity and efficiency. However, removing control from human-workers for the sake of efficiency may create resistance from the human workers, preventing this technology from being successfully integrated into the workplace. We describe our ongoing work developing an autonomous robotic teammate to work alongside human workers in a collaborative manufacturing environment. Specifically, we want to understand how to maximize team efficiency and human-worker acceptance of their robotic teammates by utilizing carefully designed human-subject experiments. In Proc. International Conference on Human-Robot Interaction (HRI) Pioneers Workshop. [36% Acceptance Rate]
Matthew C. Gombolay, Cindy Huang, and Julie A. Shah Coordination of Human-Robot Teaming with Human Task Preferences \| BibTeX @inproceedings{Gombolay:2014a, author = {Matthew Gombolay and Julie Shah}, title = {Challenges in Collaborative Scheduling of Human-Robot Teams}, booktitle = {Proceedings of AAAI Fall symposium Series on Artificial Intelligence for Human-Robot Interaction (AI-HRI)}, address = {Arlington, Virginia}, month = {November 13â€“15}, year = {2014}, } \| Abstract Advanced robotic technology is opening up the possibility of integrating robots into the human workspace to improve productivity and decrease the strain of repetitive, arduous physical tasks currently performed by human workers. However, coordinating these teams is a challenging problem. We must understand how decision-making authority over scheduling decisions should be shared between team members and how the preferences of the team members should be included. We report the results of a human-subject experiment investigating how a robotic teammate should best incorporate the preferences of human teammates into the team’s schedule. We find that humans would rather work with a robotic teammate that accounts for their preferences, but this desire might be mitigated if their preferences come at the expense of team efficiency In Proc. AAAI Fall Symposium Series on AI-HRI.
Matthew C. Gombolay and Julie A. Shah Challenges in Collaborative Scheduling of Human-Robot Teams \| BibTeX @inproceedings{Gombolay:2015c, author = {Matthew Gombolay and Cindy Huang and Julie Shah}, title = {Coordination of Human-Robot Teaming with Human Task Preferences}, booktitle = {Proceedings of AAAI Fall symposium Series on Artificial Intelligence for Human-Robot Interaction (AI-HRI)}, address = {Arlington, Virginia}, month = {November 12â€“14}, year = {2015} } \| Abstract We study the scheduling of human-robot teams where the human and robotic agents share decision-making authority over scheduling decisions. Our goal is to design AI scheduling techniques that account for how people make decisions under different control schema. In Proc. AAAI Fall Symposium Series on AI-HRI.
2013
Conference Papers
Matthew C. Gombolay, Ron J. Wilcox, and Julie A. Shah Fast Scheduling of Multi-Robot Teams with Temporospatial Constraints \| BibTeX @inproceedings{Gombolay:2013b, author = {Matthew Gombolay and Ronald Wilcox and Julie Shah}, title = {Fast Scheduling of Multi-Robot Teams with Temporospatial Constraints}, booktitle = {Proceedings of Robots: Science and Systems (RSS)}, address = {Berlin, Germany}, month = {June 24-28}, year = {2013} } \| Abstract New uses of robotics in traditionally manual manufacturing processes require the careful choreography of human and robotic agents to support safe and efficient coordinated work. Tasks must be allocated among agents and scheduled to meet temporal deadlines and spatial restrictions on agent proximity. These systems must also be capable of replanning onthe-fly to adapt to disturbances in the schedule and to respond to people working in close physical proximity. In this paper, we present a centralized algorithm, named Tercio, that handles tightly intercoupled temporal and spatial constraints and scales to larger problem sizes than prior art. Our key innovation is a fast, satisficing multi-agent task sequencer that is inspired by real-time processor scheduling techniques but is adapted to leverage hierarchical problem structure. We use this fast task sequencer in conjunction with a MILP solver, and show that we are able to generate near-optimal task assignments and schedules for up to 10 agents and 500 tasks in less than 20 seconds on average. Finally, we demonstrate the algorithm in a multi-robot hardware testbed. In Proc. Robotics: Science and Systems (RSS). [30% Acceptance Rate]
Workshop/Symposium Papers
Matthew C. Gombolay, Ron J. Wilcox, Ana Diaz, Fei Yu, and Julie A. Shah Towards Successful Coordination of Human and Robotic Work using Automated Scheduling Tools: An Initial Pilot Study \| BibTeX @inproceedings{Gombolay:2013c, author = {Matthew Gombolay and Ronald Wilcox and Ana Diaz Artiles and Fei Yu and Julie Shah}, title = {Towards Successful Coordination of Human and Robotic Work using Automated Scheduling Tools: An Initial Pilot Study}, booktitle = {Proceedings of Robots: Science and Systems (RSS) Human-Robot Collaboration Workshop}, address = {Berlin, Germany}, month = {June 24-28}, year = {2013} } \| Abstract With the latest advancements in robotic manufacturing technology, there is a desire to integrate robot workers into the labor force to increase productivity and efficiency. However, coordinating the efforts of humans and robots in close physical proximity and under tight temporal constraints poses challenges in planning and scheduling and the design of human-robot interaction. In prior work, we present a scheduling algorithm capable of performing the coordination of heterogeneous multi-agent teams. Given this capability, we now want to understand how best to implement this technology from a human-centered perspective. Humans derive purpose and identity in their roles at work, and requiring them to dynamically change roles at the direction of an automated scheduling algorithm may result in the human worker feeling devalued. Ultimately, overall productivity of the human-robot team may degrade as a result. In this paper, we report the results of a human-subject pilot study aimed at answering how best to implement such an automated scheduling system. Specifically, we test whether giving humans more control over the task allocation process improves worker satisfaction, and we empirically measure the trade-offs of giving this control in terms of overall process efficiency. In Proc. Robotics: Science and Systems (RSS), Human-Robot Collaboration Workshop
Thesis
Matthew C. Gombolay and Julie A. Shah Fast Methods for Scheduling with Applications to Real-Time Systems and Large-Scale, Robotic Manufacturing of Aerospace Structures \| BibTeX @MASTERSTHESIS{Gombolay:2013a, author={Matthew Gombolay}, title={Fast Methods for Scheduling with Applications to Real-Time Systems and Large-Scale Robotic Manufacturing of Aerospace Structures}, school={Department of Aeronautics and Astronautics, Massachusetts Institute of Technology}, month = {June}, year={2013} } S.M. Thesis: Master of Science in Aeronautics and Astronautics.
2012
Conference Papers
Matthew C. Gombolay and Julie A. Shah A Uniprocessor Scheduling Policy for Non-Preemptive Task Sets with Precedence and Temporal Constraints \| BibTeX @inproceedings{Gombolay:2012, author = {Matthew Gombolay and Julie Shah}, title = {Multiprocessor Scheduler for Task Sets with Well-formed Precedence Relations, Temporal Deadlines, and Wait Constraints}, booktitle = {Proceedings of AIAA Infotech@Aerospace}, address = {Garden Grove, California}, month = {June 19-21}, year = {2012} } \| Abstract We present an idling, dynamic priority scheduling policy for non-preemptive task sets with precedence, wait constraints, and deadline constraints. The policy operates on a well-formed task model where tasks are related through a hierarchical temporal constraint structure found in many real-world applications. In general, the problem of sequencing according to both upperbound and lowerbound temporal constraints requires an idling scheduling policy and is known to be NP-complete. However, we show through empirical evaluation that, for a given task set, our polynomial-time scheduling policy is able to sequence the tasks such that the overall duration required to execute the task set, the makespan, is within a few percent of the theoretical, lowerbound makespan. In Proc. AIAA Infotech@Aerospace. [AIAA Best Intelligent Systems Paper Award 2012]
2011
Conference Papers
Matthew C. Gombolay, Sam Beder, Robert Boggio, John Samsundar, P. Stadter, and P. Binning Scheduling of Oversubscribed Space-Based Sensors for Dynamic Objects of Interest. \| In Proc. of 9th Annual U.S. Missile Defense Conference and Exhibit, 2011.
T. Safko, D. Kelly, S. Guzewich, S. Bell, A. S. Rivkin, K. W. Kirby, R. E. Gold, A. F. Cheng, T. M. Aldridge, C. M. Colon, A. D. Colson, D. V. Lantukh, P. Pashai, D. Quinn, E. H. Yun, and the ASTERIA team. ASTERIA: A Robotic Precursor Mission to Near-Earth Asteroid 2002 TD60. \| In Proc. of Lunar and Planetary Science Conference.