Crime Risk Prediction in Chicago

Introduction

Crime has always been a key factor in cities that has a great impact on land values, zoning, resource allocation, etc. Analyzing crime-related factors and making predictions on crimes can help policymakers decide on regional policies, help police deploy resources more rationally, and also help people visualize the safety of a region more directly. This time, I build a model by borrowing experiences from places where crime has been observed and test whether these experiences can generalize to places that may be a risk for crime.

The results of the test showed that in Chicago, the crime location data predicted by the previous year’s (2018) experience for the following year (2019) matched well with the actual occurrence, validating that my model is worth being replicated.

Analysis

1.Map of Weapons Violation

Here, I show a map of crimes related to weapons violation that occurred in Chicago in 2018. As you can see, there are very clear clusters near the northwest and southwest areas.

2.Map of Weapons Violation in fishnet

Considering crime risk not as a phenomenon that varies across administrative units, but one varying smoothly across landscape, it’s more suitable to build grid cells to show these crime “hotspots”.

3.Risk factors in fishnet

Here, I collected and introduced some independent factors that I thought would be strongly associated with crime and presented them in the previously established spatial grid.

## Reading layer `chicago' from data source 
##   `https://raw.githubusercontent.com/blackmad/neighborhoods/master/chicago.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 98 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -87.94011 ymin: 41.64454 xmax: -87.52414 ymax: 42.02304
## Geodetic CRS:  WGS 84

Also, I calculated their average distance from the three closest crimes and created new distance variables that are presented in the map.

4.Local Moran’s I map

From the map, we can see that Local Moran’s I has higher values in regions where more crimes occur, indicating that regional crime spatial autocorrelation is relatively high.

5.Small multiple scatter plots with correlations

Here, we can test the correlation between our selected variables and the crime. It can be seen that the highest correlation is for vacant houses.

6.A histogram of dependent variable

Here I counted the value of how many of crimes occurred in each cell.

7.A table of MAE and standard deviation MAE by regression

Here, I conducted spatial cross-validation to test and improve model’s prediction spatial accuracy .

## This hold out fold is Riverdale 
## This hold out fold is Hegewisch 
## This hold out fold is West Pullman 
## This hold out fold is South Deering 
## This hold out fold is Morgan Park 
## This hold out fold is Mount Greenwood 
## This hold out fold is Roseland 
## This hold out fold is Pullman 
## This hold out fold is East Side 
## This hold out fold is Beverly 
## This hold out fold is Washington Heights 
## This hold out fold is Burnside 
## This hold out fold is Calumet Heights 
## This hold out fold is Chatham 
## This hold out fold is South Chicago 
## This hold out fold is Auburn Gresham 
## This hold out fold is Ashburn 
## This hold out fold is Avalon Park 
## This hold out fold is West Lawn 
## This hold out fold is Grand Crossing 
## This hold out fold is South Shore 
## This hold out fold is Chicago Lawn 
## This hold out fold is Englewood 
## This hold out fold is Woodlawn 
## This hold out fold is Clearing 
## This hold out fold is Jackson Park 
## This hold out fold is Washington Park 
## This hold out fold is Garfield Ridge 
## This hold out fold is West Elsdon 
## This hold out fold is Gage Park 
## This hold out fold is Hyde Park 
## This hold out fold is New City 
## This hold out fold is Fuller Park 
## This hold out fold is Archer Heights 
## This hold out fold is Brighton Park 
## This hold out fold is Grand Boulevard 
## This hold out fold is Kenwood 
## This hold out fold is Oakland 
## This hold out fold is Little Village 
## This hold out fold is Mckinley Park 
## This hold out fold is Bridgeport 
## This hold out fold is Armour Square 
## This hold out fold is Douglas 
## This hold out fold is Lower West Side 
## This hold out fold is North Lawndale 
## This hold out fold is Chinatown 
## This hold out fold is Near South Side 
## This hold out fold is Museum Campus 
## This hold out fold is Little Italy, UIC 
## This hold out fold is West Loop 
## This hold out fold is Austin 
## This hold out fold is Printers Row 
## This hold out fold is Garfield Park 
## This hold out fold is Grant Park 
## This hold out fold is United Center 
## This hold out fold is Greektown 
## This hold out fold is Loop 
## This hold out fold is Millenium Park 
## This hold out fold is Humboldt Park 
## This hold out fold is West Town 
## This hold out fold is River North 
## This hold out fold is Streeterville 
## This hold out fold is Ukrainian Village 
## This hold out fold is East Village 
## This hold out fold is Rush & Division 
## This hold out fold is Wicker Park 
## This hold out fold is Gold Coast 
## This hold out fold is Galewood 
## This hold out fold is Old Town 
## This hold out fold is Lincoln Park 
## This hold out fold is Belmont Cragin 
## This hold out fold is Hermosa 
## This hold out fold is Logan Square 
## This hold out fold is Bucktown 
## This hold out fold is Montclare 
## This hold out fold is Sheffield & DePaul 
## This hold out fold is Dunning 
## This hold out fold is Avondale 
## This hold out fold is North Center 
## This hold out fold is Lake View 
## This hold out fold is Portage Park 
## This hold out fold is Irving Park 
## This hold out fold is Boystown 
## This hold out fold is Wrigleyville 
## This hold out fold is Uptown 
## This hold out fold is Albany Park 
## This hold out fold is Lincoln Square 
## This hold out fold is Norwood Park 
## This hold out fold is Jefferson Park 
## This hold out fold is Sauganash,Forest Glen 
## This hold out fold is North Park 
## This hold out fold is Andersonville 
## This hold out fold is Edgewater 
## This hold out fold is West Ridge 
## This hold out fold is Edison Park 
## This hold out fold is Rogers Park

I calculated the mean and standard deviation in errors by regression. The results confirms our conclusion that the spatial process features improve the model.

Regression	Mean_MAE	SD_MAE
Random k-foldCV:JustRiskFactors	0.46	0.36
Random k-foldCV:SpatialProcess	0.41	0.36
Spatial LOGO-CV:JustRiskFactors	0.81	0.84
Spatial LOGO-CV:SpatialProcess	0.52	0.52

8.A small multiple map of model errors by random k-fold and spatial cross validation

The map shows the largest errors are in the hotspot location.

9.A table of raw errors by race context for a random k-fold vs. spatial cross validation regression

The model on average, under-predicts in Majority_Non_White neighborhoods and over-predicts in Majority_White neighborhoods.

Mean Error by neighborhood racial context
Regression	Majority_Non_White	Majority_White
Random k-foldCV:JustRiskFactors	-0.3868085	0.4241534
Random k-foldCV:SpatialProcess	-0.1467751	0.1620336
Spatial LOGO-CV:JustRiskFactors	-0.4406241	0.4337530
Spatial LOGO-CV:SpatialProcess	-0.1721860	0.1661009

10.The map comparing kernel density to risk predictions for the next year’s crime

I created maps of crime occurrence for 2019 that I predicted through 2018 data, and when compared to the actual crime maps, you can see that they highly matches each other in spatial distribution.

11.The bar plot making this comparison

Finally, comparing the predicted data with the actual data, it also shows that they are very close. This proves my model is successful.

Conclusion

In general, I highly recommend my model to predict urban crimes, the results of prediction can be used to allocate police response across space.

First, by comparing the predictions with the actual results, we know that it is highly accurate and can accurately predict areas with a higher risk of crime occurrence. The second point is that it can provide strong evidence to assist relevant departments in deploying urban resources and carrying out relevant policies (It can be part of the cost-benefit analysis). If our model provides sufficient evidence that an area needs stronger policing and more social resources, then the area will be more likely to receive government-backed investment and is most likely to receive positive feedback from the investment. Finally, the model is so valuable in its application that it can be used to predict not only crime but also other dispersed events such as store deployments. Therefore, I recommend my model.