- Install Postgres & Adventureworks Data Set (see the Getting Started page).
- Run through the code on the Basic page experimenting with the queries and exploring the data yourself.
- Try to answer the questions on the Basic Test page, then request the solutions & review.
- Repeat the Intermediate, Advanced, and Feature Engineering.
Welcome to SQL for Data Science. This is an open source course designed to teach you the SQL skills necessary for data science as quickly as possible. In practice data science frequently involves taking a company’s data and figuring out how to make more money with that data. What does this entail? Something along these lines:
- Receive data (A dump of csv’s over SFTP).
- Load data into a database (Postgres, SQL Server, HIVE, Redshift, … and many other possibilities).
- Explore the data with SQL.
- Figure out how to solve a business problem with SQL and optionally Python or R.
- Implement the solution with SQL and optionally Python or R.
Notice that using SQL is not optional in this process. This is the reality of the situation, and generally speaking becoming good at SQL requires using it in practice. Personally I completed several introductory SQL courses before breaking in to the industry but still felt they were insufficient to prepare me. This course uses the Microsoft’s Adventuresworks dataset because this artificial data is designed to mimic a real company’s data.
The course is presented as though you were actually working with a company’s data in practice (Basic, Intermediate, Advanced, & Feature Engineering). For each of these lessons a matching test lists a set of questions which you will then attempt to answer (Basic Test, Intermediate Test, etc.). If you have gone through the lesson tab thoroughly enough, you should be able to complete each of these tests without reference. Once you feel that you have successfully completed the test questions there is a form at the bottom of the page where you can request the solutions for self review.
Rather than use MS SQL Server we will use Postgres, because (like Python and R) it’s open source and will always be available to you. In short you should take this course because:
- It mimics the reality of data science in practice by using (fairly) realistic data.
- It uses a (popular) open source database (Postgres).
- It is the fastest way to ramp up (focusing on what is necessary for analysis).
Have fun & enjoy!