Empirical social scientists are often confronted with a variety of data sources and formats that extend beyond structured and handleable survey data. With the emergence of BigData, especially data from web sources play an increasingly important role in scientificresearch. However, the potential of new data sources comes with the need for comprehensive computational skills in order to deal with loads of potentially unstructured information. Against this background, the first part of this course provides an introduction to web scraping and APIs for gathering data from the weband then discusses how to store and manage (big) data from diverse sources efficiently. The second part of the course demonstrates techniques for exploring an dfinding patterns in (non-standard) data, with a focus on data visualization. Tools for reproducible research will be introduced to facilitate transparent and collaborative programming. The course focuses on R as the primary computing environment, with excursus into SQL and Big Data processing tools.
Fundamentals of Computing and Data Display
Instructor(s): Brian Kim
Prerequisites: Some basic experience with programmingin R or Python is helpful, but not strictly necessary.Students without any R knowledge are encouraged to work through one or more R tutorials prioror during the first weeks of the course. Two graduate level courses in statistical methods.