Realtime Data Analytics System for Social Media, Web, and IoT
Streams
a human-centered AI solution for Emergency Services and Humanitarian Agencies
Yasas Senarath |
Hossein
Salemi |
Anuridhi Gupta |
Tarin
Sultana Sharika |
Dr. Hemant Purohit
Alumni: Dr. Rahul Pandey, Dr. Falah Amro, Dr. Prakruthi Karuna, Gaurav Bahl,
Ganesh
Nalluru,
Mohammad Rana
Researcher mines social media for insights into health behaviors amid pandemic
by Ryley McGinnis on May 4th, 2020
by Rachel Smith on October 19th, 2020
CIT’s SCITI Labs and George Mason University Collaborate on Smart Building Technology Research
First-of-its kind operational test will help enhance first responders’ safety and efficiency.
by Center for Innovative Technology on November 18, 2019
Mason research and smart technology enhance emergency response
by Meredith Muckerman on May 29th, 2019
Tweet data mining tool could help emergency responders
by Volgenau School of Engineering on March 21st, 2017
In this chapter, we introduce and illustrate different human-centered AI system design applications in managing SM and IoT data to support use cases in different disaster management phases using an example of one such system called Citizen-Helper. The lessons from implementing Citizen-Helper use cases will facilitate understanding challenges and practitioner expectations for future research and development of human-centered AI applications within different phases of disaster management.
The growing adoption of sources like Social Media (SM) and Internet of Things (IoT) networks provide a unique opportunity to collect additional data to aid all disaster management phases. Artificial Intelligence (AI) technologies present an unprecedented era to design information systems that can process data from SM and IoT sources at scale in real-time to enhance disaster response processes and training performance analyses. They require a systematic design to employ AI-assisted data processing with the Modularity to create processing pipelines for different modalities, including text, videos, images, and numeric sensor data; Extensibility to newer analytical capabilities; and Interactivity for greater human control; and adaptability to dynamic human needs. Hence, it is achievable with the help of a human-centered approach.
Here is the Citizen-Helper System for human-centered AI use in disaster management (see Figure 1).
(Figure 1.) Illustration of the Citizen-Helper System for human-centered AI use in disaster management.
In this project, we proposed an interactive user-feedback based streaming analytics system 'CitizenHelper-Adaptive' to mine twitter streams to detect how people perceived and reacted to COVID-19 pandamic. This is performed by identifying which tweets mention about prevention and risks in the twitter stream.
One of the goals of this activity is to observe how volunteer annotators contributed in the Human-AI teaming process during the COVID-19 pandemic and to identify associated parameters. Another goal is evaluating factors that can effectively improve the learning process of the CitizenHelper to effectively identify risks and prevention mentions in tweets.
Here is the snapshot of the process of giving active feedback to COVID-19 data (see Figure 2).
(Figure 2.) Illustration of the high level CitizenHelper-Adaptive system in our COVID-19 effort.
In this project, we analyzed videos of active shooter environment. We have actively monitored a situation in which we mimic an active shooter started shooting in Eagle Bank Arena of GMU, which created ruckus among the crowd. After that, a Rescue Task Force team arrives and neutralizes the shooter as well as saved the hostages.
We used the video recorded for the analysis. First, we created a people detection system to detect the bounding box of every person in the frame. We have used Faster RCNN pre-trained model for detection. Then we had trained a MobileNet model to recognize different actors of people in the frames.
The below figure demonstrates multimodal input data in the CitizenHelper architecture.
(Figure 3.) CitizenHelper utilizing video data during a simulation of an active shooting emergency response exercise.
In this project, we proposed a learning analytics system "EMAssistant" for the emergency volunteers or practitioners - referred as the trainee, to enhance their experiential learning cycle with the cause-effect reasoning on providing relevant feedback to the machine learning model. This continuous integration between the cause (providing feedback) and the effect (observing predictions from the updated model) in a visual form will likely to improve the understanding of the trainees to provide more accurate feedback.
Below is the snapshot of the Learning Analytics System for Hurricane Sandy event social media posts.
(Figure 4.) EM-Assistant Dashboard for Hurricane Sandy
In this project, we proposed an interactive user-feedback based streaming analytics system 'CitizenHelper-Adaptive' to mine social media, news, and other public Web data streams for emergency services and humanitarian organizations.
The main application for this project was implementing a transfer-active learning methods for time-critical events, when there is an availability of abundant labeled data from past events but a scarcity of the sufficient labeled data for the ongoing event.
Here is the snapshot of the dashboard for giving active feedback to Hurricane Harvey data.
(Figure 5.) Illustration of analysis using CitizenHelper-Adaptive system widgets for Hurricane Harvey
Figure 6 shows the effect of human feedback in machine learnt model.
(Figure 6.) The effect of human feedback in machine adaptation
(Figure 7.) System Architecture
Data Collection - Data Sources and Apache Kafka Framework: CitizenHelper uses an opensource distributed computing platform to collect data (see Figure 7), which provides flexibility to scale producers (information sources), and consumers (information processors), in addition to a streaming data buffer---valuable for slow downstream processors when needed. The system currently supports real-time data collection using Streaming and Location APIs of Twitter, as well as Instagram and Facebook (for public groups and pages), which are useful during humanitarian disasters for situational awareness information collection. Additionally, the system supports the collection of news (including GDELT) and blogs streams as well as data from Web knowledge bases including Wikipedia, and OpenGov Data.
Metadata Processing - Spark Streaming, Application Web Server, and Analytics Services: The proposed system connects data collection components from to processors in opensource stream computing framework Apache Spark. Different processors perform analytics on the streamed content by leveraging various analytics services, to extract and associate enriched metadata such as information provider classification (e.g., gender, user type such as organization), content classification for topics, intent, etc.
Data Storage and Visual Dashboard: CitizenHelper stores raw data in a file system for long-term archiving, and processed data with extracted metadata in a database that supports a frontend visualization dashboard Kibana for streaming analytics. Our visual dashboard is composed of different analytical widgets, such as volume trend graphs of Twitter posts (tweets) over time. %top active users with a tweet frequency corresponding to a topical hashtag, and so on. These widgets have two unique features. First, when a user interacts with a widget and modifies an analysis unit on the widget (e.g., time slice on a trend graph, the region of interest on the map, topical tag in the word cloud list), then all analytical widgets get updated corresponding to that change in the analysis unit. Second, the visual dashboard supports collaborative teamwork by allowing saving and sharing of a state of the dashboard by an end-user, which in turn allows another collaborating team member studies the same set of analyses from his/her colleague. Also, these widgets can be repositioned and deleted as needed to avoid visual information overload. System details with exemplary analyses and demos are available at the demo link, based on prior interactions with the analysis widgets.
1. Demographics - Content Practices of Specific User Identities for Gender-Violence Events
![]() |
![]() |
(Figure 8.a.) Organizations tweet about husbands | (Figure 8.b.) Tweets by individual users describing themselves as husbands |
Figure 8. Husband Portrayal: Individual identity user accounts who identify themselves as husbands in user profiles write pro-women tweet content, whereas when Organization identity user accounts describe Husbands they are portrayed as threats. This data was collected for the domain of anti-gender-based violence over the period of Aug 4th, 2016 to Aug 28th, 2016.
2. Narratives of Diverse Sources - Analysis for Gender-Violence Events
![]() |
![]() |
(Figure 9.a.) Editorial-news Content Summary | (Figure 9.b.) User-generated Content Summary |
Figure 9. Our tool allows us to compare the topics related to our research that are currently being covered by the world news vs those that people are talking about on Twitter. In this figure, we observed the diverse nature of narratives being promoted on news media, in contrast to diverse types of issues being carried over under the activism for anti-gender violence, during the period of Aug 4th 2016 to Aug 28th, 2016.
3. Temporal Diffusion - Analysis for Gender - Topic activity over time for Gender-Violence Events
![]() |
![]() |
(Figure 10.a.) Tweets by Topic | (Figure 10.b.) Tweets over time |
Figure 10 - Our tool allows us to view the breakdown of topics in the current stream of tweets for a specific domain. In this figure, we observe the trend of the number of tweets by topic vs the total number tweets for the domain anti-gender based violence over the period of July 31st, 2016 to September 1st, 2016.
4. Geographical Engagement - Awareness Analysis for Gender-Violence Events
![]() |
![]() |
(Figure 11.a.) Tweet count by location | (Figure 11.b.) Gender based violence counts provided by open data from FBI UCR |
Figure 11. The visual tool allows us to view tweets by originating location, which indicates the user participation and awareness of concerning issues. In this figure, we observe the variation in number of tweets indicating social awareness by location from Aug 4th, 2016 to Aug 28th 2016, and the contrasting pattern of 2014 GBV related reports by location in the FBI Uniform Crime Record data.
![]() |
![]() |
(Figure 12.a.) Tweet count by location | (Figure 12.b.) Global Displacement counts provided by open data from IDP |
Figure 12. The geographical contrast analysis capability for another humanitarian issue of Global Displacement. In this figure, we observe the variation in the number of tweets originating by location -- indicating social awareness and reporting for the issue -- during the period of Jan 5th, 2017 to Feb 23rd, 2017, and in contrast to the reported displacement by IDP.
5. Hashtag analysis - #likeagirl
![]() |
![]() |
(Figure 13.a.) Hashtags Mentioned by Individuals | (Figure 13.b.) Hashtags Mentioned by Organizations |