Current Projects

  • PIRE: Building Decarbonization via AI-empowered District Heat Pump Systems

    NSF OISE #2230748 (co-PI)

    Increasing concerns about climate change, clean energy, and energy security demand that our society transition to a net-zero carbon economy that serves the triple bottom line of planetary health, societal well-being, and economic prosperity. This PIRE project is focused on innovating Artificial Intelligence (AI) techniques with an understanding of human need and behaviors to enable an efficient, human-centered, resilient, and socially justifiable operation of district- and community-scale heat pumps systems that promote a regional scale adoption of building decarbonization. In an increasingly urbanized world, there is a pressing need to address the critical challenges of climate change through the built environment because the building sector accounts for nearly 40% of the primary energy use in the U.S. and associated greenhouse gas and CO2 emissions, with about 50% of that energy dedicated to heating, cooling, ventilation, and lighting. In addition, people spend more than 90% of their time indoors which means that addressing their needs and comfort in a sustainable way is critical for climate resilience planning. Electrification of heating and cooling systems is widely acknowledged as a core and non-negotiable strategy for decarbonization. Many major U.S. metropolitan areas have put the adoption of electric heat pump technologies on the roadmap reaching building decarbonization in the next decade. Taking advantage of the wide adoption of district heating/cooling heat pump systems in Nordic countries, this project seeks to leverage the data and testbeds provided by our core international partners from Sweden and Denmark and the AI innovations provided by the U.S. team to catalyze the readiness to support the scaling up and adoption of human-centered and equity focused AI empowered district system operation strategies at a regional and global scale. Findings from this project will be disseminated through two International Energy Agency?s Annex teams (81 and 84) which will reach to researchers from more than 20 countries. Through their outreach activities, including focus group discussions and workshops, the team will work closely with partners from government and community stakeholders to promote community-focused and equitable district heat pump adoption, implementation, and operation. Two new cross-institutional education programs are designed to promote convergent international education in human centered sustainable built environment: a) Summer International Graduate Bootcamp and Exchange Program; and b) Smart Built Environment Certification Program. Leveraging partner institutions? other existing flagship programs ranging from K-12, undergraduate, and workforce training, the project will train a diverse, convergent workforce well-versed in science, technology, engineering, arts, and mathematics (STEAM), AI, and socioeconomics to tackle global challenges of climate change.

    Grant info:https://www.nsf.gov/awardsearch/showAward?AWD_ID=2230748&HistoricalAwards=false

  • PIPP Phase I: Predicting Emergence in Multidisciplinary Pandemic Tipping-points (PREEMPT)

    NSF PIPP #2200140 (co-PI)

    Pandemics arise from the confluence of many contributing factors. These factors may be individually inconsequential but become critical when acting together, and a complex set of seemingly unrelated factors can result in a perfect storm for pandemic emergence. Yet prevailing approaches to predicting pandemic emergence remain focused on disciplinary investigations of individual or subsets of factors. Preparing for and preventing the next pandemic will require multidisciplinary approaches that leverage knowledge of complex interdependencies across scales from molecular to social and from individual diagnosis to global surveillance. This project assembles a multi-disciplinary team of scientists, representing expertise spanning the gamut from basic biology, to social, behavioral, and economic sciences, to engineering, computer, and information sciences, to focus on understanding how to identify, recognize, and predict when emerging disease threats create a perfect storm of factors that cause an otherwise localized outbreak to ?tip over? into a pandemic. The project team will work together to leverage their collective diversity of expertise, experience, and perspective to innovate a collaborative framework for knitting together disciplinary pursuits into a complete, multifaceted, and predictive understanding of pandemic tipping points. Going beyond the confines of this project, the resulting framework will serve as a blueprint for all institutions dedicated to the discovery and analysis of complex linkages and thus will improve capacity to predict and prevent coming pandemics and other emergent threats to the modern world.

    See https://www.preemptpandemics.org/ for more information

    Grant info: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2125246&HistoricalAwards=false

  • PanCommunity: Leveraging Data and Models for Understanding and Improving Community Response in Pandemics

    NSF SCC-IRG JST #2125246 (PI)

    The goal of this integrative research effort is to enhance the understanding of the complex relationships characterizing pandemics and interventions under crisis. The global-scale response to the COVID-19 pandemic triggered drastic measures including economic shutdowns, travel bans, stay-home orders, and even complete lockdowns of entire cities, regions, and countries. The need to effectively produce and deliver PPE, testing and vaccines has affected different communities of stakeholders in different ways, requiring coordination at family/business units, counties/states to federal level entities. This project, therefore, considers communities at local, federal, and international (US and Japan) scales and investigate impact of testing, preventative measures and vaccines, when used in combination, to improve community and inter-agency response at the different scales. The impacts of this research includes technologies to help save lives, restore basic services and community functionality, and establish a platform that supports core capabilities including planning, public information, and warning. The project organizes an interdisciplinary community, bringing together (a) computer/data scientists, (b) domain and social scientists and policy experts, (c) federal, state, local governments, (d) industry and nonprofits, and (e) educators, to serve as a nexus for major research collaborations that will: overcome key research barriers and explore and catalyze new paradigms and practices in cross-community response to pandemics; enable development and sharing of sustainable and reusable technologies, coupled with extensive broader dissemination activities; act as a resource for public policy guidance on relevant strategies and regulations; and provide education, broadening participation, and workforce development at all levels (K12 to postgraduate) for the next generation of scientists, engineers, and practitioners. The project involves a close collaboration between Arizona State University in the United States, and Kyoto University in Japan. The project involves interfaces with community partners in Tempe, Arizona and Kyoto, as well as national-level civic organizations in both the U.S. and Japan.

    See https://pancommunity.org for more information.

    Grant info: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2125246&HistoricalAwards=false

  • Designing Nature to Enhance Resilience of Built Infrastructure in Western US Landscapes

    Army Corps of Engineers (USACE) (co-PI)

    We are developing a modeling toolkit that will allow for rapid, scenario-based assessment of outcomes of combinations of natural and built infrastructure. This toolkit will include domain science (hydrology and geohydrology) and data science (data integration, artificial intelligence, operations research and visualization).

    Specifically we ask: “What combinations of data science are needed to advance EWN solutions within a particular hydrologic and water resources context?” In other words, what is the missing data science ingredient necessary to bring EWN solutions to decision makers such that EWN solutions are implemented, replicated and scaled strategically?  We evaluate this question in the context of impact (water resources and environmental flows), economics (cost efficacy), institutional tractability and equity of benefit. 

  • pCAR: Discovering and Leveraging Plausibly Causal (p-causal) Relationships to Understand Complex Dynamic Systems

    NSF III #1909555 (PI)

    In this project, the research team hypothesizes that data analysis provides opportunities for identifying relationships that can potentially be causal. They further hypothesize that data can be used for strengthening and weakening causal assumptions, and for pruning relationships that are certainly not causal. To validate and leverage these hypotheses, they introduce the novel concepts of ‘plausible causality’ (p-causality) and ‘plausibly causal (p-causal) relationships’ and they develop techniques to (i) discover p-causal interactions and relationships, (ii) maintain these p-casual relationships in systems where causality itself evolves over time, and (iii) use discovered p-causal relationships to support efficient and effective data analytics to study complex, dynamic systems. In particular, they develop new models to capture context-sensitive plausibly causal (p-causal) relationships in complex, dynamic systems and design novel and scalable causally-aware data analysis algorithms that leverage known or hypothesized p-causal relationships among entities within different and potentially evolving contexts to deal with data sparsity, imprecision, and noise.

    Grant info: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1909555&HistoricalAwards=false

  • Data-Driven Services for High Performance and Sustainable Buildings

    NSF PFI-RP #1827757 (co-PI)

    The broader impact/commercial potential of this PFI project will lead to the creation of a truly new breed of building services companies that can provide guaranteed building performance. The results from this proposal will contribute to both building energy efficiency research and practices. Beyond energy conservation, outcomes of the project also have an important societal benefit in advancing the role of a skilled workforce in conjunction with automation. In addition, the proposed effort builds an innovation ecosystem which spans the supply chain for building services and building analytics and automated intelligence suppliers. The proposed research will fulfill an important role in sustainable environments and will enable services with significant economic and human impact. While the building service industry is the focus here, the results obtained have the potential to lead to a better understanding of other human-centered services driven by big data, and will influence the development of novel service platforms applicable to urban operations, such as city-wide transportation control, power grids, and public health, in which system-wise fault discovery and recovery have to be achieved through dynamic spatio-temporal data streams of varying quality and coverage.

    The proposed project of Building Doctor’s Medicine Cabinet (BDMC) service platform, empowered by novel multi-resolution (temporal and spatial) data analytics, high-dimensional robust modeling, and human-centered interface design, will help building doctors, who are engineers and technicians working in the building service industry, to effectively troubleshoot building problems and to identify systematic and prognostic solutions. BDMC will integrate building datasets from in-situ and control system measurements, knowledge from building doctors, and leverage the patterns and anomalies discovered on this data to (1) diagnose and prognose whole building problems with greatly reduced engineering labor input, false alarms and false dismissals, 2) identify building system hierarchy and develop data-driven energy models; and (3) provide human-centered data visualization and feedback (fault impact analysis and prioritization). The proposed effort will transform the current labor-intensive building service industry into a smart service industry. The main outcomes from this PFI-RP project are algorithms and codes that can be licensed to the industry to be integrated with their existing market products.

    Grant info: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1827757&HistoricalAwards=false

  • Securing Grid-interactive Efficient Buildings (GEB) through Cyber Defense and Resilient System (CYDRES)

    DOE

    The current generation of Building Automation System (BAS) is designed and operated with little consideration of cyber security. Many building systems, especially the emerging Grid-interactive Efficient Buildings (GEBs), are vulnerable to cyber-attacks that may have adverse or even severe consequences, e.g., occupant discomfort, energy wastage, equipment downtime, and disruption of grid operation. Importantly, current physical behavior-based anomaly detection methods fail to differentiate cyber-attacks from equipment or operational faults. Such distinction is needed to ensure appropriate automated mitigation via control response to cyber-threats and provide actionable recommendations to the facility manager. To address these issues, the project team proposes to research, develop, and demonstrate a real-time advanced building resilient platform, called CYber Defense and REsilient System (CYDRES), deployable for existing and emerging Building Automation Systems (BAS), to empower GEB with cyber-attack-immune capabilities through multi-layer prevention, detection, and adaptation.

  • Deep Phenotyping for Physiologic Biomarkers of Post-Traumatic Epilepsy in Children

    DOD/PCH (co-PI)

    Traumatic brain injury (TBI) is a leading public health concern among children. Seizures are a common complication of TBI and young children are particularly susceptible. Early post-traumatic seizures (PTS; seizures 1-7 days after injury) respond to anti-epileptic drugs but do not prevent the development of post-traumatic epilepsy (PTE; chronic seizures > 7days after injury) that impact normal brain development in children. Effective strategies to prevent PTE are not yet available due to our lack of understanding of the events after TBI that trigger the epileptogenic process.  Aims and Hypotheses: We aim to (1) identify biomarkers of PTS and PTE after severe TBI and then (2) test the feasibility of predictive PTS and PTE biomarker(s) for clinical application. We hypothesize that in-depth analysis of data including longitudinal physiological parameters will enable development of a computational model that can separate (a) patients who had PTS from those who did not and (b) those who developed PTE from those who only had PTS. Further, that physiologic changes during acute critical care monitoring are associated with post-traumatic epileptogenesis and can help predict children likely to develop PTE.

Recently Completed Projects

  • RTEM: Rapid Testing as Multi-fidelity Data Collection for Epidemic Modeling

    NSF RAPID/DEB#2026860 (co-PI)

    The novel coronavirus (COVID-19) epidemic is generating significant social, economic, and health impacts and has highlighted the importance of real-time analysis of the spatio-temporal dynamics of emerging infectious diseases. COVID-19, which emerged out of the city of Wuhan in China in December 2019 is now spreading in multiple countries. It is particularly concerning that the case fatality rate appears to be higher for the novel coronavirus than for seasonal influenza, and especially so for older populations and those with prior health conditions such as cardiovascular disease and diabetes. Any plan for stopping the epidemic must be based on a quantitative understanding of the proportion of the at-risk population that needs to be protected by effective control measures in order for transmission to decline sufficiently and quickly enough for the epidemic to end. Different data collection and testing modalities and strategies available to help calibrate transmission models and predict the spread/severity of a disease, have variable costs, response times, and accuracies. In this Rapid Response Research (RAPID) project, the team will examine the problem of establishing optimal practices for rapid testing for the novel coronavirus. The result will be the Rapid Testing for Epidemic Modeling (RTEM), which will translate into science-based predictions of the COVID-19 epidemic’s characteristics, including the duration and overall size, and help the global efforts to combat the disease. The RTEM will fill an important gap in data-driven decision making during the COVID-19 epidemic and, thus, will enable services with significant national economic and health impact. The educational impact of the project will be on mentoring of post-doctoral and PhD researchers and on curricula by incorporating research challenges and outcomes into existing undergraduate and graduate classes.

    Computational models for the spatio-temporal dynamics of emerging infectious diseases and data- and model-driven computer simulations for disease spreading are increasingly critical in predicting geo-temporal evolution of epidemics as well as designing, activating, and adapting practices for controlling epidemics. In this project, the researchers tackle a Rapid Testing for Epidemic Modeling (RTEM) problem: Given a partially known target disease model and a set of testing modalities (from surveys to surveillance testing at known disease hotspots), with varying costs, accuracies, and observational delays, what is the best rapid testing strategy that would help recover the underlying disease model? Several scientific questions arise: What is the value of testing? Should only sick people be tested for virus detection? What level of resources should be devoted to the development of highly accurate tests (low false positives, low false negatives)? Is it better to use only one type of test aiming at the best cost/effectiveness trade off, or a non-homogeneous testing policy? Naturally these questions need to be investigated at the interface of epidemiology, computer science, machine learning, mathematical modeling and statistics. As part of the work, the team will develop a model of transmission dynamics and control, tailored to COVID-19 in a way that accommodates diagnostic testing with varying fidelities and delays underlying a rapid testing regimen. The investigators will further integrate the resulting RTEM-SEIR model with EpiDMS and DataStorm for executing continuous coupled simulations.

    This project is jointly funded through the Ecology and Evolution of Infectious Diseases program (Division of Environmental Biology) and the Civil, Mechanical and Manufacturing Innovation program (Engineering).

    Grant info: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2026860&HistoricalAwards=false

    Project website:  https://rtem.live

  • Discovering Context-Sensitive Impact in Complex Systems

    NSF BIGDATA #1633381 (PI)

    Successfully tackling many urgent challenges in socio-economically critical domains (such as sustainability, public health, and biology) requires obtaining a deeper understanding of complex relationships and interactions among a diverse spectrum of entities in different contexts. In complex systems, (a) it is critical to discover how one object influences others within specific contexts, rather than seeking an overall measure of impact, and (b) the context-aware understanding of impact has the potential to transform the way people explore, search, and make decisions in complex systems. This project establishes the foundations of big data driven Context-Sensitive Impact Discovery (CSID) in complex systems and fills an important hole in big data driven decision making in many critical application domains, including epidemic preparedness, biological pathway analysis, climate, and resilient water/energy infrastructures. Thus, it enables applications and services with significant economic and health impact. The educational impacts of this project include the mentoring of graduate and undergraduate students, and the enhancement of graduate and undergraduate Computer Science curricula at both Arizona State University (ASU) and New Mexico State University (NMSU) through the incorporation of research challenges and outcomes into existing classes.

    The technical goal of this project is to establish the theoretical, algorithmic, and computational foundations of big data driven context-sensitive impact discovery in complex systems. This project develops probabilistic and tensor-based models to capture context-sensitive impact from complex systems, often modeled as graphs, and designs efficient learning algorithms that can capture both the contexts and the impact scores among entities within these different contexts. The modeling of the context sensitive impact considers dynamic nature of relevant contexts and the diverse applications. This requires addressing several major challenges, including latent contexts of impact, heterogeneous networks of entities, dynamicity of impact in varying contexts, and high computational and I/O costs of context-sensitive impact discovery. Therefore, this project designs novel scalable probabilistic and tensor-based algorithms to capture and represent context-sensitive impact. These algorithms and the novel data platforms they are deployed in are efficient and scalable in terms of off-line and on-line running times and their space requirements. To achieve necessary scalabilities, the developed platforms employ novel multi-resolution data partitioning and resource allocation strategies and the research enables massive parallelism and efficient data access through new non-volatile memory based data management techniques.

    Grant info: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1633381&HistoricalAwards=false

  • DataStorm: A Data Enabled System for End-to-End Disaster Planning and Response

    NSF CDS&E #1610282 (PI)

    Natural disasters affect our society in profound ways. Between 2000 and 2009, disasters killed 1 million people, affected an additional 2.5 million individuals and caused a loss of about $1 trillion (2010 World Disasters Report). Effective disaster response requires a near-real-time effort to match available resources to shifting demands on a number of fronts. Experts today lack the means to provide emergency response agencies with validated strategies for disaster planning and response on a timely basis. Data-driven models and computer simulations for disaster preparedness and response can play a key role in predicting the evolution of disasters and effectively managing emergencies through a diverse set of intervention measures. This project will establish an approach that includes (a) planning disaster response, (b) public information and warning, (c) critical transportation services, (d) mass population care services, and (e) public health and medical services. Effective use of this integrated modeling approach may lead to enhanced safety, quality of life and community resilience. The project also provides an excellent context for doctoral, masters, and undergraduate level research and students will be introduced to career pathways through their participation in research, publication, and partnership with public agencies and data-driven science and engineering researchers.

    This project will enhance disaster response and community resilience through multi-faceted research to create a big data system to support data-driven simulations with the necessary volume, velocity, and variety and integrate and optimize the key aspects and decisions in disaster management. This includes (a) a novel computational infrastructure capable of executing multiple coupled simulations synergistically, under a unified probabilistic model, (b) addressing computational challenges that arise from the need to acquire, integrate, model, analyze, index, and search, in a scalable manner, large volumes of multi-variate, multi-layer, multi-resolution, and interconnected and inter-dependent spatio-temporal data that arise from disaster simulations and real-world observations, (c) a new high performance data processing system to support continuous observation of the numerical results for simulations from different domains with diverse resource demands and time constraints. These models, algorithms, and systems will be integrated into a disaster data management cyber-infrastructure (DataStorm) that will enable innovative applications and generate broad impacts–through close collaborations with domain experts from transportation, public health, and emergency management–in disaster planning and response.

    Grant info: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1610282&HistoricalAwards=false

  • GEARS - An Infrastructure for Energy-Efficient Big Data Research on Heterogeneous and Dynamic Data

    NSF II #1629888 (co-PI)

    GEARS is an enerGy-Efficient big-datA Research System at arizona state university, for studying heterogeneous and dynamic data by employing heterogeneous computing and storage resources and co-designing the software and hardware components of the system.

    Big data technologies have been successfully applied to many disciplines for knowledge discovery and decision making, but the further growth and adoption of the big data paradigm face several critical challenges. First, it is challenging to meet the performance needs of modern big data problems which are inherently more difficult, e.g., learning of heterogeneous and imprecise data, and have more stringent performance requirements, e.g., real-time analysis of dynamic data. Second, power consumption is becoming a serious limiting factor to the further scaling of big data systems and the applications that it can support. These challenges demand a new type of big data systems that incorporate unconventional hardware capable of accelerating data processing and accesses while lowering the system’s power consumption. Therefore, this project is developing the needed computational infrastructure to support GEARS (an enerGy-Efficient big-datA Research System) for studying heterogeneous and dynamic data using heterogeneous computing and storage resources. GEARS is a one-of-kind, energy-efficient big-data research infrastructure based on cohesively co-designed software and hardware components. It enables a variety of important studies on heterogeneous and dynamic data and advances the scientific knowledge in computer science as well as other data-driven disciplines. It enhances the training of a large body of undergraduate and graduate students, including many from underrepresented groups, by supporting unique research and education activities. Finally, it also benefits the society by contributing new open-source solutions and with potential commercial applications in support of heterogeneous and dynamic data analysis.

    The hardware of GEARS includes a cluster of data nodes equipped with heterogeneous processors and storage devices and fine-grained power management capability. The software is developed upon widely-used big data frameworks to support unified programming across CPUs, GPUs, and FPGAs and transparent data access across a deep storage hierarchy integrating DRAM, NVM, SSD, and HDD. GEARS also enables novel systems and algorithms research on learning heterogeneous and dynamic data, including (1) new algorithm partitioning and scheduling schemes for using heterogeneous accelerators and optimizing the performance and energy efficiency of big data tasks; (2) new I/O scheduling and data staging strategies for performance and energy efficiency of the deep big-data storage hierarchy; (3) multi-phase, out-of-core decomposition techniques for large-scale tensors; (4) real-time visual analytics system that links streaming media with simulations for anticipatory analytics; (5) multi-modal deep learning methods with heterogeneous social data; (6) new computational tools for real-time analysis of social unrest using social media; (7) scalable, adaptive, and interactive team detection and assemble system for designing high-performing teams using big network data; (8) rare category analysis and heterogeneous learning algorithms for fast and accurate rare event discoveries with large and heterogeneous social data; and (9) new distributed machine learning framework for learning semantic knowledge from Web-scale images/videos with incomplete/noisy textual annotations. All project results will be shared with the broader community via the project website (https://gears.asu.edu). Publications will be listed on the website with links to their publishers. Data and software downloads will listed on the website with instructions on how to use them. Source code will be hosted on GitHub and a direct link to the repository will also be listed on the project website.

    Grant info: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1629888&HistoricalAwards=false

  • E-SDMS: Energy Simulation Data Management System Software

    NSF SI2-SSE #1339835 (PI)

    The building sector was responsible for nearly half of CO2 emissions in US in 2009. According to the US Energy Information Administration, buildings consume more energy than any other sector, with 48.7% of the overall energy consumption, and building energy consumption is projected to grow faster than the consumptions of industry and transportation sectors. As a response to this, by 2030 only 18% of the US building stock is expected to be relying on the current energy management technologies, with the rest either having been retrofitted or designed from the ground up using smart and cleaner energy technologies. These building energy management systems (BEMSs) need to integrate large volumes of data, including (a) continuously collected heating, ventilation, and air conditioning (HVAC) sensor and actuation data, (b) other sensory data, such as occupancy, humidity, lighting levels, air speed and quality, (c) architectural, mechanical, and building automation system configuration data for these buildings, (d) local whether and GIS data that provide contextual information, as well as (e) energy price, consumption, and cost data from electricity (such as smart grid) and gas utilities. In theory, these data can be leveraged from the initial design and/or retrofitting of buildings with data driven building optimization (including the evaluation of the building location, orientation, and alternative energy-saving strategies) to total cost of ownership (TCOs) simulation tools and day-to-day operation decisions. In practice, however, because of the size and complexity of the data, the varying spatial and temporal scales at which the key processes operate, (a) creating models to support such simulations, (b) executing simulations that involve 100s of inter-dependent parameters spanning multiple spatio-temporal frames, affected by complex dynamic processes operating at different resolutions, and (c) analyzing simulation results are extremely costly. The energy simulation data management system (e-SDMS) software will address challenges that arise from the need to model, index, search, visualize, and analyze, in a scalable manner, large volumes of multi-variate series resulting from observations and simulations. e-SDMS will, therefore, fill an important hole in data-driven building design and clean-energy (an area of national priority) and will enable applications and services with significant economic and environmental impact.

    The key observations driving the research is that many data sets of urgent interest to energy simulations include the following: (a) voluminous, (b) heterogeneous, (c) multi-variate, (d) temporal, (e) inter-related (meaning that the parameters of interest are dependent on each other and constrained with the structure of the building), and (f) multi-resolution (meaning that simulations and observations cover days to months of data and may be considered at different granularities of space, time, and parameters). Moreover, generating an appropriate ensemble of simulations for decision making often requires multiple simulations, each with different parameters settings corresponding to slightly different, but plausible, scenarios. Therefore, significant savings in modeling and analysis can be obtained through data management software supporting modular re-use of existing simulation results in new settings, such as re-contextualization and modular recomposition (or “sketching”) of building models and if-then analysis of simulation traces under new parameters, new building floorplans, and new contexts. In developing the energy simulation data management system (e-SDMS), the research addresses the key data challenges that render data-driven energy simulations, today, difficult. This requires (a) a novel building models, simulation traces, and sensor/actuation traces (BSS) data model to accommodate energy simulation data and models, (b) feature analysis and indexing of sensory data and simulation traces along with the corresponding building models, and (c) algorithms for analysis and exploration of simulation traces and re-contextualization of models for new building plans and contextual metadata. This research will therefore, impact computational challenges that arise from the need to model, analyze, index, visualize, search, and recompose, in a scalable manner, large volumes of multi-variate series resulting from energy observations and simulations. E-SDMS consists of an (a) eViz server, which works as a frontend to e-SDMS, an (b) eDMS middleware for feature extraction, indexing, simulation analysis, and sketching, and an (c) eStore backend for data storage. To avoid waste and achieve scalabilities needed for managing large data sets, e-SDMS employs novel multi-resolution data partitioning and resource allocation strategies. The multi-resolution data encoding, partitioning, and analysis algorithms are efficiently computable, leverage massive parallelism, and result in high quality, compact data descriptions.

    Grant info: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1339835&HistoricalAwards=false

  • Data Management for Real-Time Data Driven Epidemic Spread Simulations

    NSF III #1318788 (PI)

    The speed with which recent pandemics had immense global impact highlights the importance of realtime response and public health decision making, both at local and global levels. For instance, the SARS (Severe Acute Respiratory Syndrome) epidemic is estimated to have started in China in November 2002, had spread to 29 countries by August 2003, and generated a total of 916 confirmed deaths. A pandemic similar to the swine flu in 2009 is estimated to cost $360 billion in a mild scenario to the global economy and up to $4 trillion in an ultra scenario, within just the first year of the outbreak. Today, the key arsenal in the hands of decision makers who try to plan for and/or react to these outbreaks is software that enable model-driven epidemics and as well as the impacts of pharmaceutical and computer simulations for disease spreading. These software help predict geo-temporal evolution of non-pharmaceutical control measures and interventions, relying on data and models including social contact networks, local and global mobility patterns of individuals, transmission and recovery rates, and outbreak conditions. Unfortunately, because of the volume and complexity of the data and the models, the varying spatial and temporal scales at which the key transmission processes operate and relevant observations are made, today running and interpreting simulations to generate actionable plans are extremely difficult.

    If effectively leveraged, models reflecting past outbreaks, existing simulation traces obtained from simulation runs, and real-time observations incoming during an outbreak can be collectively used for obtaining a better understanding of the epidemic’s characteristics and the underlying diffusion processes, forming and revising models, and performing exploratory, if-then type of hypothetical analyses of epidemic scenarios. More specifically, the proposed epidemic simulation data management system (epiDMS) will address computational challenges that arise from the need to acquire, model, analyze, index, visualize, search, and recompose, in a scalable manner, large volumes of data that arise from observations and simulations during a disease outbreak. Consequently, epiDMS fill an important hole in data-driven decision making during health-care emergencies and, thus, will enable applications and services with significant economic and health impact.

    The key observation is that the modeling and execution can be significantly reduced using a data-driven approach that supports data and simulation reuse in new settings and contexts. Relying on this observation, in order to support data-driven modeling and execution of epidemic spread simulations, this team will develop

    + an epidemic data and model store (epiStore) to support acquisition and integration of relevant data and models.

    + a novel networks-of-traces (NT) data model to accommodate multi-resolution, interconnected and inter-dependent, incomplete/imprecise, multi-layer (networks), and temporal (time series or traces) epidemic data.

    + algorithms and data structures to support indexing of networks-of-traces (NT) data sets, including extraction of salient multi-variate temporal features from inter-dependent parameters, spanning multiple simulation layers and geo-spatial frames, driven by complex dynamic processes operating at different resolutions.

    + algorithms to support the analysis of networks-of-traces (NT) datasets, including identification of unknown dependencies across the

    input parameters and output variables spanning the different layers of the observation and simulation data.

    The proposed NT data model and algorithms will be brought together in an epidemic simulation data management system (epiDMS). For broadest impact, the proposed epidemic simulation data management system (epiDMS) will be designed in a way that interfaces with the popular Global Epidemic and Mobility (GLEaM) simulation engine, a publicly available software suit to explore epidemic spreading scenarios at the global scale. To achieve necessary scalabilities, epiDMS will employ novel multiresolution data partitioning and resource allocation strategies and will leverage massive parallelism.

    Grant info: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1318788&HistoricalAwards=false

  • Intelligent AirPlane: Assessment of Onboard Data Processing Architectures and ML Algorithms for “pilot in the loop” Operations

    CASCADE I/UCRC (co-PI)

    Aviation has for decades been an early adopter of intelligent technologies, just to mention few such as the coupled autopilots, full authority digital engine control, adaptive autopilots, etc. History of modern intelligent onboard systems dates back to 2003 when NASA began flight testing using an artificial neural network that analysed aircraft flight characteristics and could create alternative flight control laws. Research into intelligent onboard technologies has intensified over past years. This project has for its objectives to (a) use real flight profile data to assess quality and completeness of provided flight profile data in terms of their suitability for onboard data analytics for Class E and Class D operations and (b) use fuel saving use case (a Class E operation) as a practical example to demonstrate potentials of on board machine learning.

  • Data Management Systems Support for Personalized Recommendation Applications

    NSF EAGER #1654861 (co-PI)

    A recommender system helps users to identify useful and interesting items from a considerably large search space. Recommender systems have been widely used in various commercial services. A recommender system exploits the users’ opinions in order to extract a set of interesting items for each user. This project conducts research, develop requisite knowledge and build software infrastructure to support efficient, salable, and usable data management for personalized recommendation applications. Recommender systems have already been widely used with a strong broad impact on all web users and the project aims to take personalized recommendation applications recommender systems to its next stage and widening its scope to new applications. The project enhances the research infrastructure by distributing a free and portable software artifact. All proposed ideas will be realized inside an open-source recommendation-aware database system maintained at Arizona State University. It is envisioned that the proposed system will be used by researchers world wide as a vehicle for evaluating their research and exchanging new modules related to recommender systems. It is also envisioned that several commercial database systems will adopt the ideas from this project. The project will have a significant educational component. Researchers in both data management and recommender systems will be trained through the proposed project, through curricular innovations as well as workshops and tutorials. Students will be introduced to career pathways through their participations in research.

    The project tackles the following system challenges to support recommendation applications: (1) Flexibility and Usability: The user should be able to declaratively define a variety of recommenders using popular recommendation algorithms that fit the application needs. The system must be able to integrate the recommendation functionality with other data attributes/sources as well as performing the recommendation functionality and other data access operations side by side. (2) Efficiency and scalability: The system is expected to produce personalized recommendations to a high number of users concurrently over a large pool of items. Unfortunately, recommender models are not easily updatable, and hence they are rebuilt periodically. As a result, the model loses its accuracy over time till the next rebuild process. This is not acceptable in modern applications (e.g., social media) where new items and ratings are streaming into the system. To tackle these challenges, the project injects the recommendation functionality inside the core functionality of a database system by: (a) indexing the set of recommenders to efficiently answer of ad-hoc recommendation queries, (b) encapsulating the recommendation functionality inside a pipeline-able query operator that integrates well with other database operators, and designing query optimization techniques that include the recommendation functionality. Moreover, since a common operation to train recommendation models is to factorize multi-relational user, item, and attribute data, in the forms of tensors, this proposal develops a scalable parallelizable data processing software framework that provides co-optimization of tensor-algebraic and relational algebraic operations. The project also leverages database systems to support context (e.g., spatial location and social network)-aware recommendations.

    Grant info: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1654861&HistoricalAwards=false

  • Fraud Detection via Visual Analytics: An Infrastructure to Support Complex Financial Patterns (CFP)-based Real-Time Services Delivery

    NSF PFI-BIC #1430144 (co-PI)

    This Partnerships for Innovation: Building Innovation Capacity (PFI:BIC) project from Arizona State University focuses on building a platform that will integrate data from multiple sources and explore data analysis techniques that can more accurately detect indications of financial fraud. According to Federal Trade Commission’s (FTC) annual “Consumer Sentinel Network Data Book”, the most comprehensive database of U.S. fraud trends, American consumers submitted more than 1.5 million complaints – a 62 percent increase in just three years, and they reported losing over $1.6 billion to fraud in 2013. Detecting increasingly complex fraud schemes requires services that are able to integrate and enrich data from disparate financial and other data sources and hunt for recurring and often interconnected anomalous patterns in large networks. The proposed platform will enable integration and enrichment of limited private financial data with larger publicly available data sets to detect fraud and reduce losses due to fraudulent transactions. The project will also include training and research experience for undergraduate and graduate students.

    The data linkage and financial pattern discovery platform which is to be developed via visual analytics will enable “smart” fraud detection and prevention services. Today, in order to obtain a single unified view of fraud activity across the enterprise and manage fraud on a cross-institution basis, fraud detection companies collect, verify, and analyze consumer data and financial information. Researchers recognize, however, that new insights into fraud and risk patterns require the ability to integrate financial data with domain independent data through real-time entity/identity discovery, resolution, cross-linking and schema mapping techniques. Therefore, the importance of the research discovery underpinning this project includes solving platform and processing challenges that arise from the need to integrate, filter, analyze, and visualize, in a secure and scalable manner, large private knowledge networks, also incorporating uncontrolled, unrestricted, untrusted, unstructured and unpredictable data from external domains. The ability to treat together financial and domain independent data will lead to enriched unified data, unprecedented predictive accuracy in fraud prevention and detection, and an entirely new suite of risk management services and products.

    The partners at the inception of the project are Arizona State University (ASU) (School of Computing, Informatics, and Decision Systems Engineering and, notably, the investigators also are members of ASU’s Information Assurance Center, certified by NSA and DHS); and a small business, Early Warning Services, LLC (Scottsdale, AZ).

    Grant info: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1430144&HistoricalAwards=false