February 22, 2013

The code source of this blog

http://code.google.com/p/the-criminality-usa/


February 1, 2013

Pennsylvania expenses and Chester city criminality


As we can see in this picture the evolution of the total expense in Pennsylvania follow the curve of the violent crimes in Chester City . The Health and human services is the cause of this principal expense.
The curve of the protections of person is following the violent crimes. But in 2011 the decreased expense has created certainly the growth of the violent crime of the same year.



January 9, 2013

Predict violent crime for 2013 at chester city , PA

This is a study of crime prediction at chester city, PA.
Prediction is an hard task. With a low data content the must is to exponential method to predict the violent crimes rate.
For 2013 , the violent crimes should decrease lowly .
It is a tendency.

Predicted crimes value by 100k hab are

Real value for 2012 Predict for 2013
Violent
crime
3174 3165.4009559846
Murder and
nonnegligent
manslaughter
64 64.2627077813
Forcible
rape
50 51.7887222942
Robbery 637 635.089397512
Aggravated
assault
2423 2414.2601283972





January 8, 2013

Chester city criminality evolution


I continue my study with chester city in pennsylvania. A very hard city where the criminality is crazy.
I give the violent crimes and property crimes values extracted from FBI and Pennsylvania Uniform Crime Reporting Program.
We can see an long evolution of the criminality during this last decade, apparently more effort  are made in year 1999 and 200 to resolve some issues in security. but the crime is increasing since this period. 


Brut data

2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1995 1990 1985 1980
Violent
crime
3174.882629108 3148.4741784038 2441.8355312534 2647.2118043199 2879.5239914843 2568.0045653415 2652.9346741797 2524.9384985537 1968.2813215219 1491.2918248136 1847.8495785606 1540.4643089607 3022.7383730396 3178.8539264549 5484.1647273095 5191.6093272171 3107.1683309558 3216.278306531
Murder and
nonnegligent
manslaughter
64.5539906103 61.6197183099 64.3293663557 38.3257138164 46.399912659 73.3715590098 48.5292928204 40.5504041524 51.1591588357 18.8430374976 51.32915496 35.2571056628 51.5547837412 22.3164472216 55.6511180816 35.8371559633 15.6027104137 35.0071108194
Forcible
rape
49.882629108 67.4882629108 83.0920982095 68.4387746722 114.635078334 67.9366287127 56.6175082904 91.914249412 83.4702065214 88.8314624889 83.747568619 43.3933608158 70.5486514354 128.9394728359 204.8972983912 222.1903669725 187.2325249643 126.9007767203
Robbery 636.7370892019 627.9342723005 552.1603945534 531.0848914561 646.8693705988 627.7344493057 649.7533094282 619.0695033927 457.7398422144 288.029287464 437.6485843959 368.8435669343 697.3462853422 662.0546009075 1558.2313062835 1017.7752293578 815.7988587732 1207.7453232688
Aggravated
assault
2423.7089201878 2391.4319248826 1742.2536721347 2009.3624243752 2071.6196298925 1798.9619283133 1898.0345636408 1773.4043415966 1375.9121139503 1095.5880373631 1275.1242705857 1092.9702755478 2203.2886525208 2365.5434054899 3665.3850045533 3915.8065749236 2088.5342368046 1846.6250957226
Property
crime
3606.220657277 3879.1079812207 3355.8486115578 3556.0787319664 3788.416398275 3614.2286475176 3957.8334366828 4000.9732096997 3473.437626215 2543.810062182 3317.4843311001 3221.9570405728 4862.430129701 5482.4072007736 8714.4591723161 7530.5810397554 6851.8188302425 7679.6849360026
Burglary 1364.4366197183 1625.5868544601 1249.0618634073 1335.9248816009 1228.2329821497 1187.5322698986 1197.0558895689 1297.6129328756 1066.2645736288 619.1283749226 645.6667387076 659.0366673899 1240.0282194606 1522.4776215627 2592.8361833451 2119.1704892966 2342.6355206847 3369.4344163658
Larceny-
theft
1866.1971830986 1836.8544600939 1710.0889889568 1730.1322237127 1948.7963316775 1725.5903693035 1995.0931492815 1919.3857965451 1658.6337812004 1152.117149856 1745.1912686406 1659.7960512042 2243.9897975796 2534.1565622753 4062.5316199535 3741.3990825688 3586.3944365193 3435.0727491522
Motor
vehicle
theft
299.2957746479 416.6666666667 396.6977591937 490.0216266528 611.3870844478 701.1060083154 765.6843978324 783.974480279 748.5392713859 699.8842499125 842.8787551329 816.3376003471 1191.1868453899 1157.9756502765 2059.0913690175 1670.0114678899 922.7888730385 875.1777704846
Arson 76.2910798122 76.2910798122 88.4528787391 109.5020394755 81.88219881 84.2414196038 67.4017955838 97.3209699657 72.6998572929 72.6802874909 83.747568619 86.7867216316 187.2252672709 267.7973666592



January 6, 2013

Improving house can decrease the violent criminality



Introduction

The subject of the study has permitted to obtain a linear model. A model is entirely good to test or stress a system. In this idea , I propose some test suggestions.
I choose to follow the city of Chester in Pennsylvania that is our maximum value for the violent crimes in USA and the mean of all the violent crimes in USA.

In our study,the actual violent crimes of chester city is 4877 for 100k habitant .It should be different today.
Our model estimate the value to 3916 violent crimes. An error of 20%
Poverty and crime rose as the chester city declined since 1950.
The racial makeup of the city was 17.2% White, 74.7% Black, 0.4% Native American, 0.6% Asian, 0.1% Native Hawaiian, 3.9% of some other race, and 3.0% from two or more races. 9.0% were Hispanic or Latino of any race.

The model estimate the mean of the violent crimes in USA to 1174 violent crimes to 100k habitant

Some question could be given to stress our model .

Immigration reducing

The first I propose : is reducing recent immigrant a solution ?
If we raise 10% of the recent immigration the model give this result :
on chester city the violent crime appear stable to 3917 violent crimes
on the usa the mean increase to 1186(+1%) violent crimes  per 100k.
Recent immigration appears not a problem for the security.

In second : Is reducing all immigrant generation a solution ?
Chester city decrease the violent crime to 3915 violent crimes per 100k .
On the mean in USA the violent crime decrease to 1171 violent crimes.
The result is not signifiant.

The idea of reducing immigration is not a good idea and not offers a good result.

Education

Is giving a better education a solution ?
If we decrease by 10% the not graduate or less graduate the model is offer this result:
Chester city increase their violent crimes value to 3922.
And the mean violent crime in USA to 1181.

The result is not corresponding on the simple idea that a better education can really change the given actual criminality.

Ecthic

Is increasing ethnic mix a solution ?
If we increase the white population by 10% , the model is offering this data:
for chester city : 3916 violent crimes . No impact on the criminality.
for mean in USA : 1166 violent crimes

As other result,we can improve the criminality in adding more white population.

Housing

Is decreasing house boarded a solution?
House boarded is a ravage in a city. Offering inglorious area and give bad image on the land this house need to be cleaned.
If we decrease by 10% the house boarded , the criminality decrease :  
for chester city we obtain 3888 violent crimes per 100k that is  -0.7%
for the mean in USA the value decrease by 0.5% at 1168 violent crimes

Employment

Is increasing employment a solution ?
The employment permits a more great stability on a family.
if we increase the employment by 10%, the criminality increase:

for chester city we obtain 3946 violent crimes per 100k that is  +0.8%
for the mean in USA the value decrease by 5,3% at 1235 violent crimes


It s an interesting point to save . How to explain if we increase the employment the criminality increase. The model don t explain some relation between the employment and the violent crimes.

Mathematical approach

I have an other approach , more mathematic. For example we can search some extrema in our model. 
I start in reviewing the linear model and finish in testing each value independently.

Best regression values

There is two extrema value in our model. A positive value that if we decrease it should decrease the violent crimes and a negative value that inversely do the same thing.

For the positive value, the name is OwnOccMedVal that corresponds to the owner occupied housing on median value. If we decrease this value on all county or in cherster city the result decrease.
For chester city the criminality decrease to 3902 (-0.35 %)
For the mean in USA the value is important and decrease to 1084 violent crimes by 100k hab (-7,60%).


For the negative value, the name is numbUrban that corresponds to the number of people living in areas classified as urban.
If we increase this value by 10 pourcents, the violent crimes decrease on chester city and in the USA.
for chester city the value decrease to 3895 violent crimes (-0,52%) that is better that the previous value.
fro the mean in the USA the result is 1155 violent crimes and decreasing by 1,60%.

Testing each variables independently

Testing value by value is more mechanic but permit to search between the model on the data which value to change to obtain a best result.

I have tested in increasing each value by 10 %
I obtain two values , one for chester city and one for the mean in the USA.
Fo chester city the value is PctLargHouseOccup corresponding to the percent of all occupied households that are large (6 or more people). We decrease the criminality to 3786 crimes (-3,31%).
For the mean of the USA is PctPersOwnOccup corresponding to the percent of people in owner occupied households. We decrease to the principal value to 1016 (-13,43%) .

In testing in decreasing each value by 10% we obtain different value for chester and the mean value in the USA.
For chester city the more impacting variable is racepctblack correspond to the pourcent of black people in the city. In chester city the number is more than 74%. In decreasing this value by 10 pourcent we can decrease the violent crimes to 3757 (-4,05%).
For the mean in the USA the value is PersPerOccupHous corresponding to the mean persons per household. In decreasing this value by 10% the result is decreasing to 1048 (-10,72%).


Conclusion

It s difficult to give a valid conclusion on the fact that the most important variables that is apt to decrease  the violent crimes are based on house. It ' s easy to imagine that in giving more possibility to be owner of their house we can improve the socio economic environment . But to enhance this need , the population have to get a better economic situation that is not the case at Chester city.   
In chester city, reducing black concentration and increasing the house size could affect durably the criminality. It s not impossible to politics to take action on this parameters.


December 30, 2012

Four groups: a visual point of view


I follow the study written on the message the groups of my study article

We start with the first screen that colorize my first group explained. This first group describes the most hard and poor people where we found violent crime problems. 
We can see that the value is shared on the entire map. Certain states are more represented as California  New York, Michigan, Delaware and Kansas

Group around the violent criminality 

The fourth group the group of the ideal family , white with two children ,..  This perfect group is shared along the united states of america. More representative states are based on the north . Where the south is more sweet.

The group of the ideal family

 The group of workers. We found this group in all the state with an equal representation.

The group of poor workers

The group of manager is really interesting. because the structure of this group is really important in California , New Jersey, Connecticut and Massachusetts . Idaho is lowest represented as Kentucky , Louisiana, ...

The group of the managers 


Images of violent crimes by state in USA

Images of violent crimes by state in USA


The first image propose the max county value of the state
The second image is the mean value of the violent crime of the counties
From the max violent crimes on county
From the mean violent crimes of the counties

Predict and model estimation

Short summary

For study and to estimate a good model we need a scientific approach in segment values to construct the model and values to tes the model.
For certain method as neural network , we need to add some validation values to stop and find the stability of the engine.

Our exercise is to try to extract from the existing data , a stable model to predict the value of the violent crimes for 100k habitant .The typology of the violent crimes is large each county communities and countries have their proper approach. We can include murders, sexual acts, ...
More approach is considered on my study by regression , SVM and neural network.


The values segmentation

The data are cut as this:

  • 1094 values for the model estimation
  • 401 values for the validation of the model
  • 101 values to verify and test the precision of the model


The first strategies

The data are really hard to be separated , it s why in ours strategies the classification could be short.

The first analysis should permit to estimate models in a classical manner.
A first good point to compare with strategy of data reduction.
A second analysis should cut the data in more than 2 classes and models. Yes , to have a more accuracy estimation we can cut our model and use a SVM for automatic classification on new values.
A third approach is to remove unnecessary data as extrema or non connected individuals or variables.
The last study should mix these strategies.    

The groups resulting of my study on classification

Short intro

I want to present the final elements resulting of the review of the variables of my study on violent crimes.
Why grouping variables ? It is a good question but we need to know how variables interact and coexist . At the same time their understanding permit to implement some strategy on research as vectorization or map reducing.
When I have started my study I haven't searched to define group of people. But the study has moving me to identify some humans group. When I say humans group , I want to talk about social relationship.
It s really a surprise for me to distinguish clearly social and cultural group.
I let you discover my final analysis.

Results

The first group : grouped around the violent crime per 100k hab variable 

HousVacant, LandArea, LemasPctOfficDrugUn, numbUrban ,NumIlleg ,NumImmig, NumInShelters, NumStreet, NumUnderPov, PctForreignBorn, PctHousNoPhone ,PctIlleg, PctLargHouseFam, PctLargHouseOccup, PctLess9thGrade , PctNotSpeakEnglWell , PctPersDenseHous, PctPopUnderPov ,PctRecentImmig ,PctRecImmig10,PctRecImmig5,PctRecImmig8, PctVacantBoarded , PctWOFullPlumb , pctWPubAsst , PopDens ,population  , racepctblack  ,racePctHisp   , ViolentCrimesPerPop.

I know that the readability of this group is not easy. But I can give some information. 
This group is a group of people living in an area where house are vacant and/or boarded, really urban and very dense, with illegitime children, immigrant, where people don t have a phone, living in large house with public assistance and not graduate. Black and hispanic race. 
We have all the principal values around the violent criminality. Reducing one of this factor can have a real impact on crime activity.

The second group : the poor workers 

agePct12t21,agePct12t29,agePct16t24,agePct65up, FemalePctDiv, householdsize, indianPerCap, MalePctDivorce, MalePctNevMar  , MedOwnCostPctInc ,MedOwnCostPctIncNoMtg , MedRentPctHousInc ,MedYrHousBuilt , PctEmplManu, PctEmplProfServ, PctHousLess3BR, PctImmigRec10,PctImmigRec5,PctImmigRec8 ,PctImmigRecentPctNotHSGrad, PctOccupManu, PctUnemployed, PctUsePubTrans, PctVacMore6Mo, pctWFarmSelf  pctWSocSec, PersPerFam,PersPerOccupHous,PersPerOwnOccHous,PersPerRentOccHous , racePctAsian, TotalPctDiv 
This group correspond to the mean of the population that use public transport, have manual work or unemployed , without social security, immigrant , without diploma. Indian and Asian are represented.


The third group : the managers

AsianPerCap ,blackPerCap, HispPerCap, medFamInc ,medIncome ,MedNumBR , MedRent, OwnOccHiQuart ,OwnOccLowQuart, OwnOccMedVal , PctBSorMore, PctOccuptMgmtProf, perCapInc, RenLowQ,RentHighQ,RentMedian, white per cap   
This group is interesting, because we mix some race as white, black , asian and hispanic . An this group is composed by managers whose living in their proper house or renting it. We can say that this group manage the second group.


The fourth group : the ideal family

PctBornSameState, PctEmploy, PctFam2Par, PctKids2Par, PctSameCity85 ,PctSameHouse85,PctSameState85,PctSpeakEnglOnly,PctTeen2Par , pctUrban, pctWInvInc, PctWorkMom,PctWorkMomYoungKids , PctWRetire, pctWWage ,PctYoungKids2Par , racePctWhite  , PctHouseOccup ,PctHouseOwnOccupPctPersOwnOccup
This incredible group is a perfect family as we can see in the idealiste literature.
With two kids, living in the same area since a long time , speaking in english , working or retired and not unemployed and white race.
Stability of the group on their area permit the employment and the tv dict some idea as two children by family, ...






November 19, 2012

French study

Predict by mixing strategy

My last prediction analysis is  


Method
RMSE
MAE
MSE
ARV
likelihood, gaussian mixture    
0.1241
0.087554
0.015401
0.42164
Full data set
0.13475
0.098797
0.018157
0.49709
Full data set cut in 2 classes
0.13686
0.097323
0.01873
0.51276
 Full data set cut in 3 classes
0.13521
0.094228
0.018282
 0.50052
Removed Variables
0.13406
0.097274
0.017972
0.49202
Removed Communities
0.12757
0.092739
0.016275
0.44557
Mixte avec 2 classes
0.1241
0.087554
0.015401
0.42164
linear regression
0.12437
0.087327
0.015467
0.42344
Full data set
0,13499
0.099144
0.018222
0.49888
 Full data set cut in 2 classes
0.13763
0.099092
 0.018942
0.51857
 Full data set cut in 3 classes
0.13501
0.096144
0.018227
0.49899
Removed Variables
0,134
0,097173
0,017957
0,49161
Communautés supprimées
0.12747
0.092553
0.016248
0.44483
Mixed with 2 classes
0.12437
0.087327
0.015467
0.42344
PLS regression 1st
0.12438
 0.08572
0.015472
 0.42357
Full data set
0,13347
0.09774
0.017815
0.48772
Full data set cut in 2 classes
0.13245
 0.094019
0.017542
0.48025
 Full data set cut in 3 classes
0.13047
 0.091678
0.017021
0.466    
Removed Variables
0,13291
0,09554
0,017665
0,48362
Communautés supprimées
0.12764
0.091114
0.016292
 0.44602
Mixed with 2 classes
0.12438
 0.08572
 0.015472
 0.42357
PLS regression advanced
0.1207
0.085773
0.01457
0.39888
Full data set
0.12743
0.093526
0.016238
 0.44455
Full data set cut in 2 classes
0.12396
0.089755
0.015366
0.42067
 Full data set cut in 3 classes
0.12021
 0.087285
 0.014451
0.39562    
Removed Variables
0.12829
0.094293
0.016458
 0.45057
Communautés supprimées
0.12429
 0.088444
 0.015448
0.4229
Mixed with 2 classes
0.1207
0.085773
0.01457
0.39888
SVM Polynomial
0.12175
0.08589
0.014822
0.40579
Full data set
0.12985
0.092377
0.01686
0.46268
Full data set cut in 2 classes
0.12911
0.088887
0.01667
0.45637
 Full data set cut in 3 classes
0.13302 
0.092129
0.017695
0.48444
Removed Variables
0,12925
0,089951
0,016705
0,45733
Communautés supprimées
0.12797
0.090735
0,017175
0,47019
Mixed with 2 classes
0.12175
0.08589
0.014822
0.40579
Neural network
0,11787
0,086258
0,013893
0,40909
Full data set
0,11787
0.086258
0.013893
0.40909
Full data set cut in 2 classes
0,13692
0.10066
0.018747
0.51323
 Full data set cut in 3 classes
0.13393
0.094034
0.017938
 0.4911
Removed Variables
0,13351
0,095503
0,017824
0,48797
Communautés supprimées
 0.13552
0.094944
 0.018367
0.50283
Mixed with 2 classes
0,13283
0,097711
0,017645
0,48306