Data Exploration in SQL — A Study Case

Aprilia
5 min readDec 3, 2022

--

As I am going to dive-in-depth into data analysis, I realize that SQL is inevitable. So I take an online course about fundamental SQL and here I am going to record the material from that course.

I will be acted as a data analyst in a company and there will be some problems needs to be solved using SQL. Here is the datasets that I am working with: online_courses datasets.

Problem #1.

This is my first day at work, the directors wants to know the youngest author that he has just met, a PhD from Spain. I need to find a name that matches this criteria.

Solution: The information I get is country, age, and academic degree while I need a name. Exploring the schema, I only need authors table. The nameI get is Zofia Juan.

SELECT first_name, last_name, birth_date
FROM online_courses.authors
WHERE country_of_residence = 'Spain'
AND academic_degree = 'PhD'
ORDER BY birth_date DESC;

Problem #2.

Marketing director asked me about course pricing. His team is about to create video series of authors that price their work the highest. I need to find out this author.

Solution: Beside author table I also need courses table as I have to get information about price. Joining these 2 tables and limit only 1 result by ordering course price descending I got a name: Aldous Vandelin.

SELECT a.first_name, a.last_name
FROM online_courses.authors a
JOIN online_courses.courses c
ON a.author_id = c.author_id
ORDER BY c.course_price_usd DESC
LIMIT 1;

Problem #3.

During the weekly meeting, the company owner wants a report on number of countries where students have spent more than $6000 altogether on courses released after 2015 and discounted by less than 15% at the moment of purchase. How many countries matches this criteria?

Solution: To solve this problem, I need 3 tables: course_purchases, students, and courses. I aggregated the course_value_usd to get the revenue for each country then filter it to get countries spent >$6000. Not no mention using where clause to filter release-date and discount_pctg. The result shows 2 countries matches this criteria: Italy and Saudi Arabia.

SELECT s.country,
SUM(cp.course_value_usd) total_purchases
FROM online_courses.students s
JOIN online_courses.course_purchases cp
ON s.student_id = cp.student_id
JOIN online_courses.courses c
ON cp.course_id = c.course_id
WHERE c.release_date >= 2016-01-01
AND cp.discount_pctg < 15
GROUP BY s.country
HAVING SUM(cp.course_value_usd) > 6000;

Problem #4.

A colleagues told me that course authors are also allowed to purchase online courses. While predicting the overall student demand for knowledge, the project add feels it would be useful to have some more details about this surprising trend. Based on the online courses database, how many courses have authors purchased in total?

Solution: First I need to know the authors that also bought the courses. Then I use this result as a subquery to calculate the total courses by using count. The result is clearly stated: 5 courses.

SELECT COUNT(cp.course_purchase_id)
FROM online_courses.course_purchases CP
WHERE cp.student_id IN (
SELECT a.student_id
FROM online_courses.authors a
JOIN online_courses.course_purchases cp
ON a.student_id = cp.student_id
)
;

Problem #5.

Average Revenue Per Unit (ARPU) represents the average revenue per course sale. The marketing campaign will be conducted based on the sales made to people aged over 40, in the period between 7th of July 2018 and the 11th of November 2020, sorted by country. So, I should obtain a list with the sales in all countries for the given period and age range for marketing option 0. Find out the country with the lowest ARPU.

Solution: Table course_purchases is clearly needed. To get information about the target of the campaign, I also need students and courses table. To get the ARPU, aggregating course_value_usd using average, then filtering release_date, age_range, and marketing_option to match the criteria. By grouping country and ordering the average course value, I get a list with Austria shows the lowest ARPU.

SELECT s.country,
c.release_date,
AVG(cp.course_value_usd) rev
FROM online_courses.students s
JOIN online_courses.course_purchases cp
ON s.student_id = cp.student_id
JOIN online_courses.courses c
ON cp.course_id = c.course_id
WHERE c.release_date BETWEEN '2018-07-07' AND '2020-11-11'
AND s.age_range = '40+'
AND s.marketing_option = 0
GROUP BY s.country
ORDER BY rev
;

Problem #5.

While analyzing the ARPU results, the executives notice an increasing sales trend in the countries at the bottom of the list. So, they decide to explore the effect of sales made under marketing option number 1. Therefore, I was asked to focus on the average revenue per unit generated from course sales in the last 4 countries on the list, while preserving the rest of the conditions-people aged over 40 in the period between 2018–07–07 and 2020–11–11. What can be concluded about the generated ARPU in the four countries?

Solution: Using the result from the last problem and changing the marketing option to 1, I can add a condition with in clause referring to the last 4 countries.

SELECT s.country,
c.release_date,
AVG(cp.course_value_usd) rev
FROM online_courses.students s
JOIN online_courses.course_purchases cp
ON s.student_id = cp.student_id
JOIN online_courses.courses c
ON cp.course_id = c.course_id
WHERE c.release_date BETWEEN '2018-07-07' AND '2020-11-11'
AND s.age_range = '40+'
AND s.marketing_option = 1
AND s.country IN ('Austria','USA','Germany','Italy')
GROUP BY s.country
ORDER BY rev
;

Conclusions:

This study case helped me a lot to practice using SQL. Explore critical thinking to get the correct answer for each problems and also helped me to be more familiarize with SQL syntax.

--

--

Aprilia
0 Followers

Portfolio about Data Analytics by Aprilia. Any comments or suggestions are welcome.