Bay Area Craigslist Posts, 2000 - 2018

Like many cities, San Francisco doesn’t track rents. I created a panel of historic Craigslist rents by scraping posts archived by the Wayback Machine. Please feel free to use the data, and of course, please cite.

Here’s an overview of the methodology and choices about variable creation in the clean data, along with some sample python code.

Raw Bay Area Craigslist Posts, 2000-2012

This is the raw data from scraping Craigslist posts from 2000-2012, archived by the Wayback Machine. For every post, I’ve extracted the posting date, title, and neighborhood.

Citation: Pennington, Kate (2018).  Raw Bay Area Craigslist Rental Housing Posts, 2000-2012. Retrieved from https://github.com/katepennington/historic_bay_area_craigslist_housing_posts/blob/master/raw_2000_2012.csv.zip.

Variables: date, title, neighborhood

Observations: 167,090


Raw Bay Area Craigslist Posts, 2013- 2018

From 2013-2018, it was often possible to enter individual listings and generate more detailed data.

Citation: Pennington, Kate (2018). Raw Bay Area Craigslist Rental Housing Posts, 2013-2018. Retrieved from https://github.com/katepennington/historic_bay_area_craigslist_housing_posts/blob/master/raw_2013_2018.csv.zip.

Variables: post_id, date, neighborhood, price, square footage, number of bedrooms, address, lat, lon, description, title, details, year

Observations: 58,551


Clean Bay Area Craigslist Posts, 2000- 2018

Please read the methodology for important information about how the data was cleaned and how variables were defined.

Citation: Pennington, Kate (2018).  Bay Area Craigslist Rental Housing Posts, 2000-2018. Retrieved from https://github.com/katepennington/historic_bay_area_craigslist_housing_posts/blob/master/clean_2000_2018.csv.zip.

Variables: post_id, date, year, neighborhood, city, county, price, number of bedrooms, number of bathrooms,  square footage, dummy for being a room in an apartment/house, address, lat, lon, title, description, details

Observations: 200,796