Fire Incident Dispatch Data Analysis
The Fire Incident Dispatch Data file contains data that is generated by the Starfire Computer Aided Dispatch System. The data spans from the time the incident is created in the system to the time the incident is closed in the system. It covers information about the incident as it relates to the assignment of resources and the Fire Department’s response to the emergency. To protect personal identifying information in accordance with the Health Insurance Portability and Accountability Act (HIPAA), specific locations of incidents are not included and have been aggregated to a higher level of detail.
In this analysis we have restricted the number of observations to the last 50.000, from 5th of September to 30th of the same month.
Dataset Desciption
The dataset is formed by the following columns:
- STARFIRE_INCIDENT_ID: An incident identifier comprising the 5 character julian date, 4 character alarm box number, 2 character number of incidents at the box so far for the day, 1 character borough code , 4 character sequence number.
- INCIDENT_DATETIME: The date and time of the incident.
- ALARM_BOX_BOROUGH: The borough of the alarm box.
- ALARM_BOX_LOCATION: The location of the alarm box.
- ALARM_BOX: The alarm box number.
- INCIDENT_BOROUGH: The borough of the incident.
- ZIPCODE: The zip code of the incident.
- POLICEPRECINCT: The police precinct of the incident.
- CITYCOUNCILDISTRICT: The city council district.
- COMMUNITYDISTRICT: The community district.
- COMMUNITYSCHOOLDISTRICT: The community school district.
- CONGRESSIONALDISTRICT: The congressional district.
- ALARM_SOURCE_DESCRIPTION_TX: The description of the alarm source.
- ALARM_LEVEL_INDEX_DESCRIPTION: The alarm level index.
- HIGHEST_ALARM_LEVEL: The highest alarm level.
- INCIDENT_CLASSIFICATION: The incident classification.
INCIDENT_CLASSIFICATION_GROUP: The incident classification roll up group.
- FIRST_ASSIGNMENT_DATETIME: The date and time of the first unit assignment.
- FIRST_ACTIVATION_DATETIME: The date and time of the first unit acknowledgement of the assignment.
- FIRST_ON_SCENE_DATETIME: The date and time of the first unit at the scene of the incident.
INCIDENT_CLOSE_DATETIME: The date and time that the incident was closed in the dispatch system.
- VALID_DISPATCH_RSPNS_TIME_INDC: Indicates that the components comprising the generation of the DISPATCH_RESPONSE_SECONDS_QY are valid.
DISPATCH_RESPONSE_SECONDS_QY: The elapsed time in seconds between the INCIDENT_DATETIME and the FIRST_ASSIGNMENT_DATETIME.
- VALID_INCIDENT_RSPNS_TIME_INDC: Indicates that the components comprising the generation of the INCIDENT_RESPONSE_SECONDS_QY are valid.
INCIDENT_RESPONSE_SECONDS_QY: The elapsed time in seconds between the INCIDENT_DATETIME and the FIRST_ON_SCENE_DATETIME.
INCIDENT_TRAVEL_TM_SECONDS_QY: The elapsed time in seconds between the FIRST_ASSIGNMENT_DATETIME and the FIRST_ON_SCENE_DATETIME.
- ENGINES_ASSIGNED_QUANTITY: The number of engine units assigned to the incident.
- LADDERS_ASSIGNED_QUANTITY: The number of ladder units assigned to the incident.
- OTHER_UNITS_ASSIGNED_QUANTITY: The number of units that are not engines or ladders that were assigned to the incident.
Analysis Description
We will try to create two different analyses.
- The aim to predict the INCIDENT_RESPONSE_SECONDS_QY which is the time difference between the FIRST_ON_SCENE_DATETIME and INCIDENT_DATETIME.
- The focus is to predict the EMERGENCY_TIME which is the time difference between the FIRST_ON_SCENE_DATETIME and INCIDENT_CLOSE_DATETIME.
In both analyses we tried to use a linear regression model, however we will see that the assumptions for applying the linear regression are not met, thus we will simplify our project moving into classification, dividing in two or more ranges the two responses. In addition to this we will perform data exploration and cleaning, studying the presence or not of patterns of NA values and invalid values.
Note
If you want to execute the .Rmd file remember to change the working directory with the corresponding path.
Analysis Subdivision
- Part 1 - Dataset Analysis and Cleaning: here we explore and cleand the dataset
- Part 2 - Linear Regreasion: we try to fit a linear regression model
- Part 3 - Classification: cast the analysis to a classification task