CAP394 - Introduction to Data Science

With Gilberto Ribeiro de Queiroz
These are the lecture notes and additional material for the course
CAP394 - Introduction to Data Science,
part of the Graduate Program in Applied Computing
offered by the
Brazilian National Institute for Space Research.
This course will be offered every
second term of the year.
In this course students will learn the basic concepts of Data Science with a practical approach. Students must complete the assigned exercises and present a complete project, related to his or her research field, that collects and process data and creates, as a result, a data product.
Course material and additional notes are in English. Lectures may be presented in Portuguese. Notes are frequently updated!
See below the course schedule and references and additional material for the course. See also the R notebooks for the lectures and projects.
Course Schedule for 2019
Lectures will be held on the second term (June 21st - September 6th), on Fridays, from 8:30 to 12:00, at the "A" room at the Rotunda, except when noted.
June 21st |
There will be no classes this day. Some reading material will be posted before June 24th. You can also watch the videos in the YouTube channel -- those videos cover the material for the 2018 lecture. Changes to the material will be presented in the classroom. |
June 28th | Vitor Gomes: Introduction to Python and Jupyter Notebooks. |
July 5th | Rolf Simões, Gilberto Ribeiro: Data Science Notebooks Examples. |
July 12th |
Introduction to Data Science: definition, motivation, examples. See the Lecture Notes. More material will be posted soon. |
July 19th |
Introduction to R. Instructions and suggestions on the class projects. Examples of notebooks. Lecture Notes. |
July 26th | Meetings about the projects. Local: Meeting Room #31 at LABAC. |
August 2nd | Tips on R, Python and Jupyter notebooks (Felipe, Leonardo). A very good material about data analysis in R (in portuguese) -- see also their textbook (also in portuguese). |
August 9th |
EDA in R, through code examples.
Lecture Notes.
See some of the examples here.
Let's talk about your projects! |
August 16th | There will be no classes this day. |
August 23th | Leonardo: Introduction to GeoPandas (notebooks). |
August 30th | A very gentle introduction to machine learning. Lecture Notes. |
September 6th |
See also the official schedule for the graduate programs at INPE.
References
Repositories
- Notebooks for examples and projects for this course.
- Links for the student's repositories.
- Introduction to R notebooks used in the lectures.
Books
- Schutt, Rachel; O'Neil, Cathy. Doing Data Science: Straight talk from the frontline. O'Reilly Media, Inc., 2013.
- Cielen, Davy; Meysman, Arno; Ali, Mohamed. Introducing Data Science: Big Data, Machine Learning, and More, Using Python Tools. Manning Publications Company, 2016.
- Harris, Harlan; Murphy, Sean; Vaisman, Marck. Analyzing the Analyzers: An Introspective Survey of Data Scientists and Their Work. O'Reilly Media, Inc., 2013.
- Gutierrez, Sebastian. Data Scientists at Work. Apress, 2014.
- Reese, Richard M; Reese, Jennifer L. Java for Data Science. Packt Publishing, 2017.
Papers, Articles, etc.
Video Lectures
- Learning Data Science: Ask Great Questions (verified in June 2019)
Data from Elsewhere
- Kaggle Competitions.
- Codalab Competitions.
- Awesome Public Datasets.
- How can I download the whole dblp dataset?.
- Springboard - 19 Free Public Data Sets For Your First Data Science Project.
- DataQuest - 18 places to find data sets for data science projects.
- The catalog on Data.gov - see also applications made with open government data.
- My NASA Data.
- Southern Photometric Local Universe Survey (S-PLUS) Data Access.
- ADS-B Exchange - World's largest co-op of unfiltered flight data.
- Data Science Central - A Plethora of Data Set Repositories.
- AWS Public Datasets.
- Amazon product data.
- 8 Useful Databases to Dig for Data (and 100 more).
- Stanford Large Network Dataset Collection.
- A topic-centric list of high-quality open datasets in public domains.
- r-dir Free Datasets.
- KDNuggets Datasets for Data Mining and Data Science.
- Data Society Data Sets.
- Big Data: 33 Brilliant And Free Data Sources Anyone Can Use (Forbes).
- Augmented Intel: Searchable List of Public Data Mining Data Sets.
- Open Research Datasets in Software Engineering.
- Awesome Empirical Software Engineering: A curated repository of software engineering repository mining data sets.
Project Ideas from Elsewhere
- Illinois Tech - Data Science Research Projects.
- Columbia University - Data Science Institute Research - Projects.
- Data Science Research (DSR) Lab at the University of Florida - Projects.
- Brown University - Data Science Initiative - Research Projects
- University of Minnesota - Master's of Science in Data Science - Research Projects.
- Northeastern University - College of Computer and Information Science - Data Science Projects.
- New York University Center for Data Science - Research Projects.
- University of Edinburgh - EPSRC Centre for Doctoral Training - Research.
- University of Sussex - Data Science Research Group.
- Sheffield University - Open Data Science Initiative Thesis Projects.
- Radboud University - Faculty of Science - Data Science Projects.
- Charles Sturt University - Data Science Research Unit - Research and Industry Projects.
- The University of Sidney - Centre for Translational Data Science - Projects.
- Quora - What are some good data science projects?.
- DataKind Projects.
- Data Science for Social Good.
- Data Science Weekly - Aspiring Data Scientist? Here Are Some At Work Project Ideas.
- Analytics Vidhya - 17 Ultimate Data Science Projects To Boost Your Knowledge and Skills (& can be accessed freely).
- Text Visualization Browser - A Visual Survey of Text Visualization Techniques (IEEE PacificVis 2015 short paper).