Representation Learning and Reinforcement Learning for Event Cameras (Blogpost)

Event cameras are novel vision sensors that output fast, asynchronous data about pixel changes instead of conventional frames. This presents a challenge in applying machine learning methods to event cameras, as most learning based methods rely on frame-based inputs and convolutional neural networks. In this work, we present a way to learn representations directly from asynchronous event streams. Through a PointNet styled network coupled with a variational autoencoder, we show that the context from a fast event stream can be compressed into a latent vector, and furthermore, we show that such compressed representations are also beneficial for reinforcement learning. We validate this approach through a simulated obstacle avoidance task in AirSim.

3DB: Debugging Computer Vision Models through Simulation (Blogpost)

Modern machine learning based computer vision models are known to fail in ways that aren’t anticipated during training these models. It has been shown by several works that models suffer in the face of small rotations, common corruptions (such as snow or fog), and changes to the data collection pipeline. While such brittleness is widespread, it is often hard to understand its root causes, or even to characterize the precise situations in which this unintended behavior arises. In this work, we introduce 3Debugger (3DB): a framework for automatically identifying and analyzing the failure modes of computer vision models. This framework makes use of a 3D simulator to render images of near-realistic scenes that can be fed into any computer vision system. Users can specify a set of extendable and composable transformations within the scene, such as pose changes, background changes, or camera effects, which we refer to as “controls”. 3DB renders multiple object configurations according to these controls, records the behavior of the model on each rendered scene, and finally presents the user with an interactive, user-friendly summary of the model’s performance and vulnerabilities.

Unadversarial Examples (Blogpost)

Computer vision systems often operate in a world that is mainly designed for humans. Floor markings direct robots’ courses, stop signs signal self-driving cars to stop. While these might be naturally good ‘features’ for humans, that is not necessarily the case for neural networks. In scenarios where designers have a level of control over the target objects, what if the objects can be designed a way that makes them more detectable by neural networks, even under conditions that normally break such systems, such as bad weather or variations in lighting? We introduce a framework that exploits computer vision systems’ well-known sensitivity to perturbations of their inputs to create robust, or unadversarial, objects—that is, objects that are optimized specifically for better performance and robustness of vision models. Instead of using perturbations to get neural networks to wrongly classify objects, as is the case with adversarial examples, we use them to encourage the neural network to correctly classify the objects we care about with high confidence.

Uncertainty-aware Planning for Micro Aerial Vehicle Swarms

In this project that forms the second part of my PhD thesis, I developed an algorithm for collaborative uncertainty-aware path planning for vision based micro aerial vehicles. For vehicles that are equipped with cameras and can localize collaboratively (see below), I achieved this with a two-phase approach: first, the vehicles collaborate to improve an existing map by choosing better viewpoints, and secondly, create localization-aware path plans. A heuristic based approach attempts to capture the estimated “quality” of localization from various viewpoints. Evolutionary algorithms were integrated with an RRT based path planning framework to result in plans which allow the vehicles to navigate intelligently towards areas that can improve their vision based localization accuracy: such as moving only through well-mapped locations and observing texture-rich objects.

Collaborative Localization for Micro Aerial Vehicle Swarms

As the first part of my PhD thesis, I developed a collaborative localization pipeline that is applicable for a swarm of multirotor aerial vehicles with each vehicle using a monocular camera as its primary sensor. Images are captured continuously from each vehicle and Feature detection and matching are performed between the individual views, thus allowing for reconstruction of the surrounding environment. This sparse reconstruction is then used by the vehicles for individual localization in a decentralized fashion. The vehicles are also capable of computing relative poses between each other and fusing them with individual pose estimation occasionally for enhanced accuracy. Even when cross-correlations between vehicles are not tracked, covariance intersection allows for robust pose estimation between vehicles.

Drone Detection through Depth Images

In collaboration with researchers from Universidad Politecnica de Madrid and MIT’s ACL lab, I worked on a framework for detecting and localizing multirotor UAVs using depth images. A specific advantage of depth sensing versus other detection methods is that a depth map is able to provide 3D relative localization of the objects of interest, making it easier to develop strategies such as collision avoidance. In our work, a dataset of synthetic depth maps of drones was first generated in the Microsoft AirSim UAV simulator and used to train a deep drone detection model. Domain randomization in the simulation allowed the proposed detection technique, while trained only on simulation, to perform well on several real life trajectories. It also generalized well to multiple types of drones and achieved a record detection range of 9.5 meters.

Real Time Cancer Tumor Tracking for Proton Beam Therapy

In collaboration with Mayo Clinic Arizona, I developed a real-time computer vision based tracking system for markers implanted in cancer tumors. The target application is to control a state-of-the-art proton beam targeting system according to tumor motion which is caused by the breathing cycles of the patient and other kinds of natural organ motion. It is common practice to embed tiny fiducial markers in the tumors in order to be visible in the X-ray spectrum. Computer vision techniques such as normalized cross correlation, image saliency maps etc. are utilized in conjunction with kernelized cross-correlation filters to track these tiny markers during X-ray fluoroscopy. The tracking method is able to handle high amounts of noise and various types of markers in order to achieve accurate and real time tracking.

Ars Robotica: Robots in Theater

Ars Robotica was a collaboration between artists, theater performers and roboticists: to understand the fluidity and expressiveness of human movement and the possibility of its reproduction by robotic platforms. Using the Rethink Robotics Baxter as a test platform, we worked on defining and achieving human-like movement on the robot. We obtained movement data from expert human performers through various sensors: all the way from a Microsoft Kinect to a 12 camera high-precision, Optitrack system; which we then used as training data to construct “primitives”, thus forming a vocabulary for motion. We later managed to express complex movements as a temporal combination of such primitives: thus helping create a framework for autonomous interpretation and expression of human-like motion through Baxter.

Micro Subglacial Lake Exploration Device

As part of a team at the Extreme Environment Robotics Laboratory, I worked on the development of onboard firmware and ground station software for a subglacial lake and aquatic exploration robot called MSLED. MSLED consists of a submersible mothership/explorer combination, and is designed specifically to explore deep, remote and chemically challenging aquatic environments. Its sensory payload consists of a camera, inertial measurement unit, CTD sensor, data from which is transferred to the surface through a fiber optic cable. MSLED was deployed successfully twice in Lake McMurdo Sound, Antarctica as part of the WISSARD program, while the WISSARD expedition resulted in the discovery of microbial life under the Antarctic ice.

BENTO Volcano Monitor

I led a team of graduate students on a project involving hardware and software design of expendable “volcanic monitor” capsules which monitor and transmit data about rapidly evolving volcanic conditions. The monitors are equipped with a number of sensors (seismic, gas, temperature etc.) and use a minimal data packaging and transmission protocol using the Iridium satellite modems that allows for real time compilation and dissemination of scientific data. Volcano monitors were deployed in Nicaragua, Italy, Iceland and Greenland.