NYU Center for Urban Science & Progress - 21DOCS Test Area

http://www.cusp.nyu.edu

by author

by title

by keyword

City of New Orleans Emergency Medical Services Resource Optimization

Matt Sloane

and 3 more

July 30, 2017

NYU CENTER FOR URBAN SCIENCE AND PROGRESS CAPSTONE PROJECT

A Data-Driven Evaluation of Delays in Criminal Prosecution

Hrafnkell Hjörleifsson

and 6 more

April 15, 2019

ABSTRACT The District Attorney’s office of Santa Clara County, California has observed long durations for their prosecution processes. It is interested in assessing the drivers of prosecutorial delays and determining whether there is evidence of disparate treatment of accused individuals in pre-trial detention and criminal charging practices. A recent report from the county's civil grand jury found that only 47% of cases from 2013 were resolved in less than year, far less than the statwide average of 88%. We describe a visualization tool and analytical models to identify factors affecting delays in the prosecutorial process and any characteristics that are associated with disparate treatment of defendants. Using prosecutorial data from January through June of 2014, we find that the time to close the initial phase of prosecution (the entering of a plea), the initial plea entered, the type of court in which a defendant is tried and the main charged offense are important predictors of whether a case will extend beyond one year. Durations for prosecution are found not significantly different for different racial and ethnic population, and do not appear as important features in our modeling to predict case durations longer than one year. Further, we find that, in this data, 81% of felony cases were resolved in less than one year, far greater than the value reported by the civil grand jury.

Parallel implementations of a TV-\(L^{1}\) image-denoising algorithm

Christopher Prince

May 12, 2017

INTRODUCTION Imaging acquisition involves many hardware and software stages that introduce error sources. This is seen as visual artifacts in the image, typically recognized as noise. This can be especially noticeable in images acquired with low levels of illumination, such as night photography. Two common manifestations of noise in digital images are Gaussian noise and salt-and-pepper noise. Gaussian noise is typically associated with errors in detection. It produces pixel values that vary within a quasi-normally distributed range about the “true” value at that point in the image. Salt-and-pepper noise typically arises from transmission errors, and the pixel value is recorded as either fully on or fully off (in grayscale, white or black). Removing these artifacts can be desirable from an aesthetic perspective or in order to pre-process images for other workflows. Common techniques to address this include Gaussian blurring, which takes the convolution of an image with a Gaussian kernel window, and median filtering, which replaces pixels with the median value of a sliding window. These techniques can reduce noise; however, they are also susceptible to blurring edges of features. The total variation technique was introduced in 1992 by Rudin, Osher and Fatemi (ROF) as an alternative denoising method. The method works by iteratively constructing a function u on a domain Ω that minimizes its energy with an input function f: \left}\right\Vert + \lambda(u-f)^2 with ‖ ⋅ ‖ being the L² norm. The first term is the total variation of the image. It is a regularizer for the minimization function. The second term also includes an L² norm, which means that the problem is convex and has a unique solution . Replacing the second term above with an L¹ norm leads to the TV-L¹ model: \left}\right\Vert + \lambda\left\right\vert The TV-L¹ model is not a (necessarily) convex model; however, when applied to discrete images, the model tends to offer better performance at removing salt-and-pepper noise from images than the ROF model and is thus an important image processing technique. It is a standard method implemented in many computer vision packages, including OpenCV .

WiFind: Analyzing Wi-Fi Density around NYCHA Housing Projects

Charlie Mydlarz

and 7 more

May 03, 2017

EXECUTIVE SUMMARY

Tips and impressions from CUSP hackathon 2016-2017 series

federica B bianco

April 09, 2017

At the NYU Center for Urban Science and Progress (CUSP) we hold regular hackdays for our students. As the 2016-2017 hackathon season ends, here are some of my impressions from the events we had, and some tips on what you, students, should work on and improve.

Hypertemporal Imaging: An alternative technology to monitor grid dynamics

Victor Sette Gripp

and 5 more

March 21, 2017

ContextMotivationThe motivation of this report is to explore the team contribution to the Dr. Bianco's "Hypertemporal Imaging of NYC Grid Dynamics" proof of concept project, an alternative technology to monitor grid dynamics and energy consumption patterns, in contrast to a traditional approach which is based on in-situ monitoring of energy grids and buildings. This approach can, by analysing the lights of a city landscape, infer similar results obtained by sensors through images taken by a single camera.On the traditional approach, in order to provide reliable and affordable energy distribution, cities have to monitor the health of electric grid and energy consumption patterns. Those measurements are fundamental to provide good service during peak hours, to guarantee that the electrical grid is working in its healthy condition and to support future plans for increasing demand as cities grow and citizens use more electrical equipment.Measurements about electric grid are collected by a Phase Monitor Unit (PMU), a synchrophasor measurement device that captures information about the voltage and phase angle of the system which allows identifying possible shifts on phase and measure grid stability. Also, to access individualized energy consumption information, it is necessary the deployment of smart meters, devices that have to be installed on buildings to report real-time energy consumption information.The deployment of sensors and equipments is expensive and not possible to all cities worldwide. The PMU unitary overall cost is at a range from $40,000.00 - $180,000.00 (U.S. Department of Energy, 2009) and it is estimated that to monitor building energy consumption NYC will spend $1.5B for 1 million buildings during the next five years, values that can be impeditive for many cities in the world.The Hyperthemporal Imaging technology comes as an indirect, real-time, and affordable way to get electric grid health information and a non-intrusive and indirect way to achieve energy disaggregation and observe energy consumption patterns. In this context, this study will explore the expected improvement by using a new camera to capture images of city landscape and a study about cost and area covered by a single camera, estimating the ideal number of cameras needed to cover NYC. Previous Research\citet{bianco_hypertemporal_2016} showed a proof of concept that hypertemporal visible imaging can be used to monitor grid dynamics and identify phase changes in individual light sources from the city landscape. This technique relies on the fact that the United States grid provides electricity as an alternate current (AC) with a frequency of 60 Hz (some countries use 50 Hz standard instead). The alternate current at 60 Hz induces a flickering twice as fast, at 120 Hz, in most of the lights in the city, including incandescent, halogen, and some fluorescent lights. LED and more modern fluorescent lights have a different behavior.Analyzing a signal with a frequency of 120 Hz would require at the very least a four times faster sampling rate, at 480 Hz, ideally eight times faster, which would require specialized and more expensive equipment. Since one of the main goals of developing this alternative technology is to provide an affordable way for cities in developing nations to monitor the dynamics of their electric grid, the equipment costs had to be kept low and, therefore, the solution was to use a liquid crystal shutter mounted at the lens aperture of the camera. The shutter is then set to oscillate between the states 'open' and 'closed' at a frequency (119.75 Hz) close to the one corresponding to the flickering of the lights (120 Hz), which, in turn, down-converts the flicker to 0.25 Hz, the beat frequency given by Equation 1.

Prediction of Natural Gas Leak Events in New York City from Open Data

Christopher Prince

and 3 more

March 10, 2017

IntroductionIn New York City, there were more than 60,000 emergency calls made by the New York City Fire Department (FDNY) related to gas leaks between 2013 and 2015. While most of these calls resulted in no damage or injury, several incidents did end with fatalities. Infrastructure in Manhattan leaks three to five times more natural gas than cities with newer infrastructure \citep{gallagher_natural_2015} and of the 6,400 miles of gas main lines running under New York City's streets, 53% were installed prior to 1960. In 2012, Con Edison experienced 83 leaks for every 100 miles of gas main. Furthermore, replacing a main in NYC can cost from $2.2 million to $8 million per mile, so prioritizing investment is critical. \citep{forman_caution_2014}

PUI2016 Extra Credit

Ozgur L. Akkas

and 1 more

December 16, 2016

2016 U.S. Election Exit Poll Results Modeling

Xianbo Gao

December 16, 2016

PUI2015 Extra Credit Project2016 U.S. Election Exit Poll Results Modeling <Xianbo Gao, gaogxb, xg656>Abstract: Using PCA and Lasso regression to build a regression model for 2016 U.S. Election Exit Poll Results to find which factors and to what extent contribute to the result.Introduction: In this project, I aim to discover the main factors influence the percentage of people voting for Trump and Clinton in state level in the 2016 U.S. Election Exit Poll Result, how much each factor contributes to the percentage and build a model to fit the percentage of the voting result in state level. Then I can explain the reason which Trump won the election by the election exit poll result.Data: County level election results and information of people provided by United States Department of AgricultureEconomic Research ServiceElection results and information of people in excel format provided by uselectionatlas.orgThe data only have population in 2014. Besides, there are only information of 37 states, not all the states.There are 51 columns which are factors or variables. The names of these columns are codes which should be replaced by description, so I rename these columns. I try to convert all the data into percentage format. 30 factors are or can be converted into percentage (such as percentage of age under 18). 21 factors which are not able to be converted into percentage level are normalized (such as mean time to work). After that, the data are summed into state level by weighted average which is based on population in each County. The format of data is shown below.

PUI2016 Extra Credit Project

Chunqing Xu

December 16, 2016

Time Series Analysis of Beijing Air Pollution<Chunqing Xu, cx495, cx495>

PUI2016 Extra Credit Project Report

Yue Cai

December 15, 2016

The impact of urban qualities on the popularity of Taxi (main report)

Pooneh Famili

and 1 more

December 15, 2016

Pooneh Famili pf910 Github: poonehfamiliAbstract: This research project seeks to find the impact of the socioeconomic factors (age, income), city facility (proximity to subway station), and safety score of the streets on the taxi trips rate at census tract level. I used multivariate regression technique for analyzing the data. The result indicates that the most important factor that affect the popularity of taxi in an area is the median income of the neighborhood. Also, there is a significant negative correlation between distance to subway and age with the number of Taxi pick-ups, as well. Key words: Taxi pickup numbers, socioeconomic factors, Multivariate regression, NYC Introduction: The question that this study seeks to find an answer to it is: “How much socioeconomic s’ indices and city facilities can have impact on the popularity of Taxi at the neighborhood level in NYC”. This question could be important since it could help the Taxi agencies and transportation organizations to plan more effectively, Taxi drivers could find out in which neighborhoods the chance of having more trips are higher, and also gives a good view regarding the difference between the Taxi pickups and its today’s main competitor: Uber.To doing so, first I picked four indices to do analysis on them, median age, income of the neighborhood, the distance from subway station, and the safety score of the street. Then after data wrangling and cleaning data I have done multivariate regression on my data to find out the correlation between each of above factors and the number of pickups. Data: In the data collection phase, first I got the data for all Taxi trips of one month in summer (June) and one in winter(January) for the Taxi in 2012 and 2014 from https://github.com/toddwschneider/nyc-taxi-data/blob/master/raw_data_urls.txt. Since the data for Uber is just available from April 2014 I used the data for June 2014 to be comparable with taxi trips from https://github.com/fivethirtyeight/uber-tlc-foil-response/tree/master/uber-trip-data.Regarding socioeconomic metrics, I used median age and income data from http://nyu.policymap.com/ which are available at census tract level for 2010. For accessibility, I got the data of subway locations all over the NYC from https://data.ny.gov/Transportation/NYC-Transit-Subway-Entrance-And-Exit-Data/i9wp-a4ja/data. Finally, for safety issue, I used the safety score of NYC streets (which is available in the link below).The data for safety scores include points that are not even at intersections, in order to get the safety of the taxi pickup points, I calculated the distance between each point and all city safety scores then for each point I selected the one which is closest to it. This is the strategy that I used to find the proximity of each point from subway stations. For age and median income, some of the census tracts have more than one attributes. Since I couldn’t find any documents attached to find the reason behind that I got the average of them and replace that for their age or income. All the data sets for Taxi include more than 10million trips (mean15 million for 2012 and 13,800,000 for 2014), and for Uber (700,000trips reported). To be able to do the computing process on it I picked 20000 randomly from each of them.As mentioned above the data for subway and safety score include points (lat, lon) as their coordinate and I found and merged them on the common points to the data frame which I was working as the main data set. Since I wanted to do my analysis at the level of census tract I found the intersection of all point with the census tracts, I grouped my data by that and got the mean of the point attributes in each census tract and used.Then I merged data of age and income which were available at census tract with my data set.Since I wanted to do multivariate regression analysis on my data sets, I normalized all my four features.Some of the (less than 10 in all of the datasets) are null which I dropped.I did all the above steps in the separate notebook for each data sets for taxi and Uber and saved the result data frame in csv format to use it in the “analysis notebook”.All the notebooks are available at the link below.In comparison phase, I also merged the related datasets.To get a good understanding from my data I calculated the number of pickups in each census tract, and added it to my data and mapped the frequency of pickups for each census tracts(Figure1&2). The maps show that all my data are from Manhattan. Then this research just can be applied for Manhattan. Methodology: This research used multivariate regression technique to find the correlation between each of the independent variables and the frequency of trips at the census tract level. Since these variables are independent of each other, multivariate regression works for our purpose. This method has been used in several studies that evaluate the coefficient correlation of an independent variable with the dependent one. In the ADS class, we had a real world example that evaluates the impact of the different factors such as income of the residents, the size of the units on the price of the buildings, and multivariate regression technique was used. This method cannot eliminate the multicolinearity between the dependent variables, which PCA doesn’t have this problem, and if we had more available datasets it would be the better option. Conclusions: Findings:The analysis from taxi 2014, January data that has been done through multivariate regression indicates that income, age and distance to subway have significant coefficient correlation with the number of taxi pickups (Pvalue: 0, 0.02, 0.008(all of them smaller than 0.05)), their coefficients are (221, -72, -51), R^2 = 0.241 (Figure 3, 4, 5, 6,7)The other four data sets also have the same trend, except taxi 2012 June, and Uber 2014 June that their distance to subway has p value greater than 0.5. Interpretation:My findings show that, disrespect to the time differences (2012 or 2014, winter or summer) of the datasets income is the most important factor that have positive coefficient correlation with taxi pick up in Manhattan. Also age, and distance from subway station have negative impact in all of my analysis that make sense since the far you are from subway, the more is the probability to tend to take the Taxi, and as you are more aged you have more money to take the taxi and less energy to walk.Based on the maps (Figure 1,2), the other interesting result is that the most popular point for taxi is in the midtown around Times Sq, which is tourist attraction spot, but for Uber is in midtown west which is poor regarding public transportation, but there is no tourist attraction in that area (mostly stores, and vehicle stores), that confirms the previous study on this subject (newsroom.uber.com), that claims most of the Uber trips are destined to transportation hubs.However, I got the same number of trip from all my data sets, 20000, but all of them not distributed equally regarding the most popular census area, for example for Uber 2014 June and Taxi in the same time, the top popular census tracts are different(Figure9), and also from winter to summer these spots are different even for only the Taxi (Figure 8). Future work: If we could add more independent variable to our model like the number of site seeing, the number of people above 18 instead of average age of all people, the number of building units in census tract(density), and use PCA to eliminate the multicolinearity that would give us the important factors with more certainty. Also, if we would run special analysis to find the autocorrelation between the census tract taxi trips rate or clustering the census tracts by their Taxi trips rate, it would give us interesting result. Finding the exact characteristics of census tracts which are significantly different regarding their pickup numbers, for Uber or Taxi, or just Taxi in different times of the year could help us to get a better understanding of reasons behind that. Links: To make my code reproducible, I have put all my data sets on the Github:https://github.com/poonehfamili/PUI2016_pf910/tree/master/extra%20credit Bibliography: http://toddwschneider.com/posts/taxi-uber-lyft-usage-new-york-city/https://newsroom.uber.com/us-new-york/top-destinations-in-nyc-according-to-the-data/https://data.ny.gov/Transportation/NYC-Transit-Subway-Entrance-And-Exit-Data/i9wp-a4ja/datahttps://github.com/fivethirtyeight/uber-tlc-foil-response/tree/master/uber-trip-datahttps://github.com/toddwschneider/nyc-taxi-data/blob/master/raw_data_urls.txthttp://nyu.policymap.com/ Appendix:

Study of impact on mobility: The impact of construction sites on pedestrian traffic i...

Ekaterina Levitskaya

December 15, 2016

Ekaterina Levitskaya, github: el2666, NYU ID: el2666

Determining Factors that Affect a Restaurant’s Yelp Rating

Kevin Han

and 3 more

December 12, 2016

KEYWORDS: Logistic regression, principal component analysis, lasso regression, hot spot analysis, kernel density

NYC Subways Safety Study

Sunny Kulkarni

November 28, 2016

PUI2016 Extra Credit Project Proposal

PUI2016 Extra Credit Project Proposal <yc2839>

Yue Cai

November 28, 2016

Problem Description: Why New York City’s Mental Health Service is Indispensable? The approach to exploring this question is via digging into NYC open data. There are now 836 mental health facilities located in the 5 Boroughs of NYC, among which, Manhattan has the highest number of 284 and Staten Island has the lowest number of 57. Firstly, the 311 complaint data would be analyzed to figure out whether or not the number of facilities is associated with the total compliant number, with the population density, or with the economic status of each Borough. In other words, this investigation is to determine what factors would makes people complain the most, and are they potentially related to mental stress at all? For this part, some linear regressions would help to identify the correlations between those factors. Secondly, NYC leading cause of death provides important information about that during 2007 to 2011, mental health problem can be the fourth high risky factor to death in NYC. Finally, if all data set works well and point out the correlation among potential factors, we can confirm that NYC's Mental Health Service is indeed required supported by the data.

dlk253 PUI2016 Extra Credit Proposal

danak

and 1 more

November 28, 2016

UPDATE:This proposal was updated to reflect the results seen in the report located here.After speaking with the Prof. Fedhere, the analysis shifted from histograms to plotting geographic features onto the raster image. The tiling of the images and hillside were completed as originally proposed. PUI2016 Extra Credit Project ProposalAnthropogenic Impacts found the Bedrock Layer<Dana Karwas, dlk253, dlk253>Problem Description: Coastal Urban ecosystems are under the constant pressure of natural and man made forces. How much is the bedrock layer effected by urban coastal ecosystems? By identifying patterns at the bedrock layer is it possible to identify urban coastal areas through their bedrock profile? Can an algorithm to measure anthropogenic impacts on urban coastal ecosystems be established by applying visual synthesis and analysis techniques to bedrock models? What can the bedrock tell us about the current state of our coastal cities? Can a metric for human impact be established by looking at the shape and topographic details of the bedrock? Data: The dataset this that is available and suitable is the Earth 2014 arcmin global topography and relief models from Curtin University. The data includes a global bedrock only layer which is what I would like to start with. It is available as gridded data and degree‐10,800 spherical harmonic. The bedrock (BED) includes Earth`s relief without water and ice masses.This data was found it in the paper linked below with accompanying data gateway. Paper: http://ddfe.curtin.edu.au/models/Earth2014/Hirt_Rexer2015_Earth2014.pdfData Gateway: href="http://ddfe.curtin.edu.au/models/Earth2014/">http://ddfe.curtin.edu.au/models/Earth2014/Bedrock href="http://ddfe.curtin.edu.au/models/Earth2014/">http://ddfe.curtin.edu.au/models/Earth2014/Bedrock Layer:href="http://ddfe.curtin.edu.au/models/Earth2014/Earth2014_visualisation_Antarctica.jpg">http://ddfe.curtin.edu.au/models/Earth2014/Earth2014_visualisation_Antarctica.jpgData href="http://ddfe.curtin.edu.au/models/Earth2014/Earth2014_visualisation_Antarctica.jpg">http://ddfe.curtin.edu.au/models/Earth2014/Earth2014_visualisation_Antarctica.jpgData Source: Western Australian Center for Geodesy, Curtin University PerthData PerthData Contact: [email protected] data is suitable for my questions because it has an isolated bedrock layer for the entire globe. The analysis will be made on three coastal urban cities in the US (New York City, Los Angeles, and New Orleans). I will have to pay close attention to land and ocean stitching and may need to find additional data to fill in data gaps in resolution if needed. I will also need to play close attention to the coordinate system transformation for my datasets. I will look for topographic anomalies by using imaging processing techniques on the shape files. I will establish a search criteria through image processing (histogram matching/analysis) for "man-made" interventions - ultimately leading to machine learning (this is very ambitious, and would be happy if I could just begin to compare a few histograms).Other data of interest: Earth1- ETOPO1 (1 arc minute)http://www.ngdc.noaa.gov/mgg/global/global.html2- SRTM30_PLUS (0.5 arc minute ~ 900 meters) and SRTM15_PLUS (0.25 arc minute ~ 450 meters)http://topex.ucsd.edu/WWW_html/srtm30_plus.htmlMarsThe MOLA Mission Experiment Gridded Data Records (MEGDRs) are global topographic maps of Mars http://pds-geosciences.wustl.edu/missions/mgs/megdr.htmlAnalysis Image processing techniques such as histogram matching could be used as a way to compare the datasets. Finding patterns in the histograms would be one way to begin identifying the impacts. ReferencesTechnical References:http://www.machinalis.com/blog/python-for-geospatial-data-processing/https://en.wikipedia.org/wiki/Histogram_matchinghttp://geospatialpython.com/https://github.com/GeospatialPython/pyshphttps://code.google.com/archive/p/pyshp/wikis/CreatePRJfiles.wikiTheoretical References:http://press.uchicago.edu/ucp/books/book/chicago/S/bo18295743.htmlhttp://www.nyu.edu/classes/bkg/methods/daston.pdfDeliverable: The expected deliverable would be an algorithm that can stitch together topographic bedrock data of ANY planetary body, href="http://astrogeology.usgs.gov/search/map/Mars/GlobalSurveyor/MOLA/Mars_MGS_MOLA_DEM_mosaic_global_463m" target="_blank">such href="http://astrogeology.usgs.gov/search/map/Mars/GlobalSurveyor/MOLA/Mars_MGS_MOLA_DEM_mosaic_global_463m">such as mars-- and search for human impact in that dataset. This algorithm can be used by agencies and students to search for "unnatural impacts". This will be interesting, I think, when the impact includes errors from the sensing device - such as those discussed in the mars MOLA dataset and impacts created from human (or other) intervention.

PUI2016 Urban Informatics Class Project Proposal

Ozgur L. Akkas

and 1 more

November 27, 2016

PUI2016 Urban Informatics Class Project Proposal