Humanitarian & Social Informatics Lab, GMU

Realtime Data Analytics System for Social Media, Web, and IoT Streams
a human-centered AI solution for Emergency Services and Humanitarian Agencies

by Humanitarian Informatics Lab (@Human_Info_Lab)
Information Sciences & Technology department, George Mason University


Rahul Pandey | Yasas Senarath | Dr. Hemant Purohit
Alumni: Dr. Prakruthi Karuna, Gaurav Bahl, Ganesh Nalluru, Mohammad Rana


Research Initiatives


In this project, we proposed an interactive user-feedback based streaming analytics system 'CitizenHelper-Adaptive' to mine twitter streams to detect how people perceived and reacted to COVID-19 pandamic. This is performed by identifying which tweets mention about prevention and risks in the twitter stream.

One of the goals of this activity is to observe how volunteer annotators contributed in the Human-AI teaming process during the COVID-19 pandemic and to identify associated parameters. Another goal is evaluating factors that can effectively improve the learning process of the CitizenHelper to effectively identify risks and prevention mentions in tweets.

Here is the snapshot of the process of giving active feedback to COVID-19 data (see Figure 1).


(Figure 1.) Illustration of the high level CitizenHelper-Adaptive system in our COVID-19 effort.

  • Montgomery County CERT (Steve Peterson)
  • University of Texas at Austin (Dr. Keri Stephens)
  • Brigham Young University (Dr. Amanda Hughes)
  • Virginia Tech (Dr. Christopher Zobel)
Project Supported by
Related Articles and Publications

AI-Augmented Multimodal Data Analytics for Emergency Response Training

In this project, we analyzed videos of active shooter environment. We have actively monitored a situation in which we mimic an active shooter started shooting in Eagle Bank Arena of GMU, which created ruckus among the crowd. After that, a Rescue Task Force team arrives and neutralizes the shooter as well as saved the hostages.

We used the video recorded for the analysis. First, we created a people detection system to detect the bounding box of every person in the frame. We have used Faster RCNN pre-trained model for detection. Then we had trained a MobileNet model to recognize different actors of people in the frames.

  • The first visualization shows the crowd density of all different actors of people at every seconds.
  • The second visualization emphasize more on the target actors we are concerned about: Patient and Responder.
  • The third and fourth visualization gives an idea of the time taken to neutralize the shooter and rescue the patient.

The below figure demonstrates multimodal input data in the CitizenHelper architecture.


(Figure 2.) CitizenHelper utilizing video data during a simulation of an active shooting emergency response exercise.

Related Articles and Publications
  • Pandey R, Bannan B, Purohit H. CitizenHelper-training: AI-infused System for Multimodal Analytics to assist Training Exercise Debriefs at Emergency Services. In ISCRAM 2020 Conference Proceedings–17th International Conference on Information Systems for Crisis Response and Management 2020 May. (ISCRAM). [pdf - author version]
  • Bannan B, Torres EM, Purohit H, Pandey R, Cockroft JL. Sensor-Based Adaptive Instructional Systems in Live Simulation Training. In: Sottilare RA, Schwarz J, editors. Adaptive Instructional Systems, Cham: Springer International Publishing; 2020, p. 3–14. (HCII). [pdf - author version]
  • Purohit, H., Dubrow, S., & Bannan, B. (2019, July). Designing a multimodal analytics system to improve emergency response training. In International Conference on Human-Computer Interaction (pp. 89-100). Springer, Cham. (HCII). [pdf - author version]

EM-Assistant : A Learning Analytics System for Social and Web Data Filtering to Assist Trainees and Volunteers of Emergency Services

In this project, we proposed a learning analytics system "EMAssistant" for the emergency volunteers or practitioners - referred as the trainee, to enhance their experiential learning cycle with the cause-effect reasoning on providing relevant feedback to the machine learning model. This continuous integration between the cause (providing feedback) and the effect (observing predictions from the updated model) in a visual form will likely to improve the understanding of the trainees to provide more accurate feedback.

Below is the snapshot of the Learning Analytics System for Hurricane Sandy event social media posts.


(Figure 3.) EM-Assistant Dashboard for Hurricane Sandy

Related Articles and Publications
  • Pandey R, Bahl G, Purohit H. EMAssistant:​ A Learning Analytics System for Social and Web Data Filtering to Assist Trainees and Volunteers of Emergency Services. The 16th International Conference on Information Systems for Crisis Response And Management, 2019. (ISCRAM). [pdf - author version]

CitizenHelper-Adaptive: Expert-Augmented Streaming Analytics System for Emergency Services and Humanitarian Organizations

In this project, we proposed an interactive user-feedback based streaming analytics system 'CitizenHelper-Adaptive' to mine social media, news, and other public Web data streams for emergency services and humanitarian organizations.

The main application for this project was implementing a transfer-active learning methods for time-critical events, when there is an availability of abundant labeled data from past events but a scarcity of the sufficient labeled data for the ongoing event.

Here is the snapshot of the dashboard for giving active feedback to Hurricane Harvey data.


(Figure 4.) Illustration of analysis using CitizenHelper-Adaptive system widgets for Hurricane Harvey

Figure 5 shows the effect of human feedback in machine learnt model.


(Figure 5.) The effect of human feedback in machine adaptation

Project Supported by
Related Articles and Publications
  • Pandey R, Purohit H. CitizenHelper-adaptive: expert-augmented streaming analytics system for emergency services and humanitarian organizations. 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE; 2018, p. 630–3. (ASONAM). [pdf - author version]

CitizenHelper: A Streaming Analytics System to Mine Citizen and Web Data for Humanitarian Organizations



(Figure 6.) System Architecture

System Component Details

Data Collection - Data Sources and Apache Kafka Framework: CitizenHelper uses an opensource distributed computing platform to collect data (see Figure 6), which provides flexibility to scale producers (information sources), and consumers (information processors), in addition to a streaming data buffer---valuable for slow downstream processors when needed. The system currently supports real-time data collection using Streaming and Location APIs of Twitter, as well as Instagram and Facebook (for public groups and pages), which are useful during humanitarian disasters for situational awareness information collection. Additionally, the system supports the collection of news (including GDELT) and blogs streams as well as data from Web knowledge bases including Wikipedia, and OpenGov Data.

Metadata Processing - Spark Streaming, Application Web Server, and Analytics Services: The proposed system connects data collection components from to processors in opensource stream computing framework Apache Spark. Different processors perform analytics on the streamed content by leveraging various analytics services, to extract and associate enriched metadata such as information provider classification (e.g., gender, user type such as organization), content classification for topics, intent, etc.

Data Storage and Visual Dashboard: CitizenHelper stores raw data in a file system for long-term archiving, and processed data with extracted metadata in a database that supports a frontend visualization dashboard Kibana for streaming analytics. Our visual dashboard is composed of different analytical widgets, such as volume trend graphs of Twitter posts (tweets) over time. %top active users with a tweet frequency corresponding to a topical hashtag, and so on. These widgets have two unique features. First, when a user interacts with a widget and modifies an analysis unit on the widget (e.g., time slice on a trend graph, the region of interest on the map, topical tag in the word cloud list), then all analytical widgets get updated corresponding to that change in the analysis unit. Second, the visual dashboard supports collaborative teamwork by allowing saving and sharing of a state of the dashboard by an end-user, which in turn allows another collaborating team member studies the same set of analyses from his/her colleague. Also, these widgets can be repositioned and deleted as needed to avoid visual information overload. System details with exemplary analyses and demos are available at the demo link, based on prior interactions with the analysis widgets.

Explore each widget by interacting with that for fine-grained analysis of an event or a topic, such as selecting the timeline for a specific period will render all other widgets accordingly. Widgets include:
  1. Volume Timeline: shows Volume trend for engagement in this topic. Select a timeline slice to analyze, by mouse over selection.
  2. Activity Timeline: shows top Twitter users over time who engaged in this topic.
  3. Tweet Cloud: shows tweet summaries for analyzing public concerns and reactions for this event, by user types.
  4. User Cloud: shows user profile summaries for analyzing the participating demographics in discussions of this topic, by user types (e.g., organization).
  5. Activity Tile Map: shows participation of users across geographical locations. Select a location for constraining analysis in other widgets.
  6. Open Data Map: shows displacement data statistics across the world, to inform comparative analysis of user engagement in the concerned locations.
  7. Tweet Stream: shows specific set of tweets with frequency for the selected constraints of analysis.
  8. User Graph: shows user engagement frequency, to identify actively engaged users.
Future Analytics: user type (gender, user affiliations with organizations), emotion with concerned topics, organization network analysis, etc.

1. Demographics - Content Practices of Specific User Identities for Gender-Violence Events

static/images/image00.png static/images/image00.png
(Figure 7.a.) Organizations tweet about husbands (Figure 7.b.) Tweets by individual users describing themselves as husbands

Figure 7. Husband Portrayal: Individual identity user accounts who identify themselves as husbands in user profiles write pro-women tweet content, whereas when Organization identity user accounts describe Husbands they are portrayed as threats. This data was collected for the domain of anti-gender-based violence over the period of Aug 4th, 2016 to Aug 28th, 2016.

2. Narratives of Diverse Sources - Analysis for Gender-Violence Events

static/images/image02.png static/images/image03.png
(Figure 8.a.) Editorial-news Content Summary (Figure 8.b.) User-generated Content Summary

Figure 8. Our tool allows us to compare the topics related to our research that are currently being covered by the world news vs those that people are talking about on Twitter. In this figure, we observed the diverse nature of narratives being promoted on news media, in contrast to diverse types of issues being carried over under the activism for anti-gender violence, during the period of Aug 4th 2016 to Aug 28th, 2016.

3. Temporal Diffusion - Analysis for Gender - Topic activity over time for Gender-Violence Events

static/images/image11.png static/images/image01.png
(Figure 9.a.) Tweets by Topic (Figure 9.b.) Tweets over time

Figure 9 - Our tool allows us to view the breakdown of topics in the current stream of tweets for a specific domain. In this figure, we observe the trend of the number of tweets by topic vs the total number tweets for the domain anti-gender based violence over the period of July 31st, 2016 to September 1st, 2016.

4. Geographical Engagement - Awareness Analysis for Gender-Violence Events

static/images/image06.png static/images/image04.png
(Figure 10.a.) Tweet count by location (Figure 10.b.) Gender based violence counts provided by open data from FBI UCR

Figure 10. The visual tool allows us to view tweets by originating location, which indicates the user participation and awareness of concerning issues. In this figure, we observe the variation in number of tweets indicating social awareness by location from Aug 4th, 2016 to Aug 28th 2016, and the contrasting pattern of 2014 GBV related reports by location in the FBI Uniform Crime Record data.

static/images/image12.png static/images/image13.png
(Figure 11.a.) Tweet count by location (Figure 11.b.) Global Displacement counts provided by open data from IDP

Figure 11. The geographical contrast analysis capability for another humanitarian issue of Global Displacement. In this figure, we observe the variation in the number of tweets originating by location -- indicating social awareness and reporting for the issue -- during the period of Jan 5th, 2017 to Feb 23rd, 2017, and in contrast to the reported displacement by IDP.

5. Hashtag analysis - #likeagirl

  • #likeagirl - Women are portrayed in a positive light using this hashtag, and both individuals and organizations have endorsed it. It is likely because of pro-women start for this movement by an organization - Tag cloud of hashtags shows the endorsements of other communities and related initiatives in this movement.
  • Reference about #LikeAGirl 1
  • Reference about #LikeAGirl 2
static/images/image07a.png static/images/image07b.png
(Figure 12.a.) Hashtags Mentioned by Individuals (Figure 12.b.) Hashtags Mentioned by Organizations
Project Supported by
Related Articles and Publications
  • Prakruthi Karuna, Mohammad Rana, and Hemant Purohit. (2017). CitizenHelper: A Streaming Analytics System to Mine Citizen and Web Data for Humanitarian Organizations. In The 11th International AAAI Conference on Web and Social Media (ICWSM-17). [pdf - author version]
  • Stabile, B., Grant, A., Purohit, H., & Rama, M. (2019). “She Lied”: Social construction, rape myth prevalence in social media, and sexual assault policy. Sexuality, Gender & Policy, 2(2), 80-96. (SGP). [pdf - author version]