New York University (NYU) - 21DOCS Test Area

http://www.nyu.edu/

by author

by title

by keyword

Analysis of Citibike Demand Northern Manhattan Border in June 2016

Jonathan Pichot

and 1 more

October 16, 2016

ABSTRACT The Citi Bike bike-share system in New York City has been aggressively expanding its coverage in 2016. Bike-share systems work by building clusters of bicycle docks within its 'service area'. This cluster has a border outside of which there are no stations. To understand where pent-up demand might still exist, we investigated whether there was increased usage of bike share stations along the border in Northern Manhattan as of June 2016. We selected 10 stations roughly along 84th Street. We compared average usage at these stations with 20 stations directly south of this border, what we call the 'non-boundary' stations. We hypothesize that boundary stations will see higher average usage than non-boundary stations since anyone living north of the border will likely use and drop off bikes along the system's edge.

CitiBike Project: Do men take longer bike trips than women?

Ekaterina Levitskaya

October 16, 2016

Do men take longer CitiBike trips than women?The goal of this project was to test whether men take longer CitiBike trips than women. The null hypothesis: The number of women taking longer trips on Citi Bike is the same or higher than the number of men taking Citi Bike trips.The significance level was set at 0.05.The null and alternative hypotheses could be represented with the following formulas:H0 = W(time of the trip) > = M(time of the trip)H1 = W(time of the trip) < M(time of the trip)

not-yet-known not-yet-known not-yet-known...

Dongjie Fan

and 1 more

October 16, 2016

This study aims to find out whether or not young people ride bikes on weekends more often than that of middle-aged people. The analysis performs a hypothesis test (Z-test) to compare the ratio of the number of young people using citi bikes on weekends over weekdays to that of mid-age people. The result shows that under 5% significance level, the ratio of the number of young people biking on weekends over week days(7 days) is greater than the counterpart middle-aged people.

Exploring the relationship between the usage of the CitiBike(s) when used by Custome...

Achilles Edwin Alfred Saxby

and 5 more

October 15, 2016

Co-Authors (Team)Anastasia ShegayPriyanshi SinghAaron D'SouzaVishwajeet ShelarAkshay PenmatchaAchilles Saxby

Does CitiBike sell more 24-hour passes on weekdays or on the weekend?

Jordan Vani

and 1 more

October 15, 2016

ABSTRACT This project investigated CitiBike rider and membership usage data, seeking to determine whether CitiBike sells more 24-Hour passes over the weekend or weekdays during quarter 2, April to June, of 2016. The statistical test used in this investigation was the _T-TEST_, which yielded a t-test statistic of 5.5 and p-value of 6.4 x 10-6. Accordingly, the null hypothesis was rejected and the alternative hypothesis was accepted: concluding that more 24-Hour passes were indeed purchased over the weekend, Saturday - Sunday, than during weekdays in April to June 2016. This analysis can be seen on Github: https://github.com/jvani/PUI2016_jmv423/blob/master/HW6_jmv423/Assignment2_jmv423.ipynb

Citi Bike Project #By Laura Gladson, Santiago Carrillo, Alexey Kalinin, Nonie Mathur,...

Santiago Carrillo

and 5 more

October 14, 2016

ABSTRACT New York City keeps records of Citi Bike services, including demographics of users and statistics on bike use. Here, we performed a statistical analysis to determine the relationship between biker age and trip duration, testing the alternative hypothesis that Citi Bike users under age 35 are more likely to bike for longer durations than the average user. Through a simple Z-test, we were able to reject our null hypothesis, concluding that trip duration of bikers under 35 is significantly greater than the average user. DATA For this project, our research question was: _Are Citi Bike users under 35 years of age significantly more likely bike for longer durations compared to the average user?_ For this analysis, we formed the following hypotheses: _Null Hypothesis:_ The mean trip duration of Citi Bike users under the age of 35 is the same or less than the mean trip duration of an average user, significance level = 0.05. _Alternative Hypothesis:_ The mean trip duration of Citi Bike users under the age of 35 is more than the mean trip duration of an average user, significance level = 0.05 To test these hypotheses, we chose Citi Bike data from December 2015. The information downloaded from the data facility contained more variables than needed to compare age and trip duration. Additionally, it was not organized in columns, which could led to errors, such as interpreting variable names as observations. As such, we first organized our data into columns, then dropped 13 of the 15 categories. We were left with “birth year” as our independent variable, and “trip duration” in seconds as our dependent variable. After plotting both variables, we identified several outliers of impossibly old users, i.e., those born before 1910. Plot 1 shows a scatter plot of the raw data, plotting birth year against trip duration. Histogram 1 shows the raw distribution of age across the data set. In Histogram 3, the distributions of trip duration for the entire data set (in blue) and for the group of those 35 and under (in green) are compared. ANALYSIS Our peer reviews suggested we perform a Z-test to compare the information of users under 35 and the total population. This test is possible because we know the population parameters (since dataset itself represents the entire population of Citi Bike users). Given the size of our sample, and the fact that we know the mean and standard deviation for both both groups, we chose to test our hypothesis with a Z-test. As such, we first had to calculate the mean and standard trip duration for the two groups. These values were plugged into the Z-test formula. RESULTS From our Z-test, we obtained a Z-statistic of 17.79. From the Z-Table, this gave an area of over 0.9998. Thus, our p-value is (1 - 0.9998), or 0.0002, meaning there is a 0.02% probability that the difference observed between the two groups is due to chance alone. Specifically, this p-value is much smaller than our alpha level of 0.05, meaning we can reject our null hypothesis, and can conclude that trip duration times of Citi Bike users are longer for those under age 35 compared the average user. LINK TO ORIGINAL NOTEBOOK https://github.com/jc7344/PUI2016_jc7344/blob/master/HW6_jc7344/HW6_Assignment2.ipynb

Exploring the Relationship between the Hour of Day and CitiBike Ridership

Kevin Han

and 3 more

October 14, 2016

ABSTRACT In this analysis, we explore whether if there is a difference between the number of CitiBike rides during the rush hours of New York City and during non-rush hours. We define the rush hours of New York City to be the hours between 7 to 9 A.M. and 4 to 9 P.M during business days. We state our hypothesis and test it using a two-sided t-test. The test indicates that there is indeed a difference.

Back In My Day They Were Citi-Penny-Farthings: An Analysis of Citi-Bike Ridership by...

Benjamin Miller

and 1 more

October 13, 2016

Abstract:In this projected we intended to find out if older Citi-Bike riders take longer duration trips than younger riders. Our statistical comparison of means indicates that that is indeed the case.

Citibike Sharing Analysis

Henry Lin

October 06, 2016

INTORDUCTION This is a article about CitiBike Sharing Project in PUI2016 class. The Report includes four parts Abstract, Data, Analysis and Result, which will be described in the following parts.

Spatial Analysis of Dengue Fever Risk in Singapore

Sarah

July 08, 2016

Double click to add an Abstract

Social attitudes cannot be predicted from federal court decisions and judge character...

Will Adler

and 4 more

May 11, 2016

QUESTION Federal US circuit courts often make rulings in areas that are socially relevant to the American public, such as capital punishment, affirmative action, or racial discrimination. In this project, we seek to measure how these rulings may affect Americans' social and political attitudes. Our goal is to determine whether court rulings tend to move attitudes in the direction "intended" by the ruling or whether rulings tend to move attitudes in the opposite direction or polarize attitudes. We will focus on rulings about gender discrimination and their impact on on attitudes about gender roles. DATASETS To address this question we used two datasets. First, we used the US General Social Survey (GSS), which is a long running (1972–) survey on social attitudes and behaviors of US American citizens . In addition to social attitudes, the GSS provides demographic and life-course data about respondents. Each row represents one respondent. Second, we used a database of federal appeals court cases that were decided at the level of circuit courts . The cases were separated by issue (e.g. affirmative action, gender discrimination, racial discrimination). The federal court system is divided into 12 circuits, each of which establishes legal precedent for a group of several states. A circuit court case is decided by a randomly assigned panel of three judges chosen from the pool of judges appointed to that circuit. The dataset provides information about the outcome of each case, which is coded as the number of judges who voted in favor the outcome that can be considered to be more "progressive" (for example, pro-affirmative action, and against racial or gender discrimination). Another dataset was used to assign judge characteristics to each case . This included, for instance, the number of panel judges who were female, or who were appointed by a Democratic president. It also included the average number of female judges (and other characteristics) in the pool of judges for that circuit at the time of the ruling. For our analyses reported below, we focused of the issue of gender discrimination, and restricted ourselves to court case data pertaining to that issue. In total, we used 100 cases (one case per row) that were decided between 1995 and 2004. We chose this subset of cases because it was the subset that had the longest period of intersection with several relevant questions on the GSS.

Practical Statistics for the LHC

Kyle Cranmer

March 07, 2015

This document is a pedagogical introduction to statistics for particle physics. Emphasis is placed on the terminology, concepts, and methods being used at the Large Hadron Collider. The document addresses both the statistical tests applied to a model of the data and the modeling itself. The doucment lives on GitHub and authorea; the initial arxiv version is 1503.07622.

Cato

September 17, 2013

Some notes outlining progress on a question concerning Gibbs’ Paradox. Not everything will be totally cogent / concise, as they are working notes. Project supervised by A. Grosberg at NYU, Autumn 2013.

Research Notes, Autumn 2013

Cato

September 07, 2013

These are some ideas stimulated by my research; I write them to make sure I think about / understand certain questions. Some sections will be quite rough / unintelligible.

Ordering My Thoughts (Summer 2013)

Cato

May 27, 2013

ABOUT THIS DOCUMENT In this document I outline my ideas and goals for a handful of projects to work on over the summer (2013). The purpose is to maintain ordered thoughts and to be constructive/directed about tackling problems. I HAVE ABANDONED THESE QUESTIONS (AT LEAST FOR THE MEANTIME) TO PURSUE SOME MORE “USEFUL” WORK.

What to Keep and How to Analyze It: Data Curation and Data Analysis with Multiple Pha...

Alyssa Goodman

and 12 more

April 22, 2013

Overview This open document is being used to describe and record the events at the Radcliffe Exploratory Seminar on Data Curation and Analysis, to be held at the Radcliffe Institute for Advanced Study, May 9-10 2013. This Google Drive Directory should be used to deposit all files contributed by participants before and during the meeting. (Click "Open in Drive" on your browser to make a new folder, e.g. with your name as its name.) This Google Doc is used for collaborative real-time note-taking. ABSTRACT: Rapid advances in technology have allowed us to collect vast amounts of data in myriad fields and forms, but our ability to manage and analyze these data has not kept pace. As a result, the amount of data collected far exceeds what can be analyzed and, often, what can be archived. These issues only become more pressing as data collection accelerates. Astronomers and astrophysicists, for example, collect terabytes of data per night; the phrase “drowning in a data tsunami” is increasingly used to describe this situation. The issues of what to keep and what to distribute are surprisingly complex, even when we put aside technological issues such as long-term storage and retrieval. A central challenge is the fundamental conflict between reducing the size of data and preserving information for future scientific inquires and statistical analyses. Complicating matters further, the parties/teams involved in the entire data collection, curation, and analysis process often have only limited communication with each other owing to the sequential nature of this process. This seminar brings together a core group of leading experts and emerging scholars in information and natural sciences to discuss, debate, and design principles and strategies to address this grand challenge, which increasingly affects almost every aspect of science and society. GOAL: By gathering experts from information and natural sciences, we aim to start building a set of principles and methods that will allow us to understand such problems and to provide better preprocessing, analyses, and data preservation, especially in the context of the natural sciences. The ultimate goals of this research include providing methods for assessing the validity of such collaborative analyses, guidance on statistically-principled preprocessing, and a rich new theory of statistical learning and inference with multiple parties. We believe that this collaboration will simultaneously sow the seeds for innovative mathematical theory and shed light on directly usable guidelines for the construction and curation of scientific databases.