Designing Database — Secondhand Car Website

12 min readJul 7, 2024

The ERD, code and syntax in this article can be accessed through this Github link.

Designing a database is not merely creating tables with columns and inserting data as their rows — it should follow some principles. With an aim to provide an accurate information, a database designer — can be Data Engineer, Data Architect, or Software Engineer — should determine what data must be stored and how the data elements interrelate.

“Analyzing database is one thing, but designing a database is another.”

There are some principles that must be followed to achieve a good database. First, errors and consistency must be minimize by reducing redundant data or duplicate information. Secondly, the information about the data should be complete an correct. Any false information will lead to false conclusion. Garbage in garbage out. Thus a good database is one that:

Provide subject-base table to reduce redundant data.
There should be an access to information needed to join the tables together for further analysis.
Has an accuracy and integrity about its information.
Can accommodate data processing and reporting needs.

Background Project

This case study is about creating a database for secondhand car website. A relational database is created to store cars that are going to be offered, users who can both sell and bid, as well as to record any advertisement and bidding process. The objective of this secondhand car website are that users can offer any of their used cars and buyers can find cars depends of some criteria.

Finding and organizing the required information

While designing database, a Data Engineer — or any other role, some important information should be gathered. This can be done by discussing with other stakeholders. For secondhand car website, there are several rules that should be followed:

Every user can offer more than 1 cars.
Before selling cars, user should complete his data, such as name, phone, email and location.
Users offer cars by advertising on the website.
Advertisement include title, detail of the product and information about the seller.
Some information about the car that should on the advertisement: Car brand: Toyota, Daihatsu, Honda, etc. Model: Toyota Camry, Toyota Corolla Altis, Toyota Vios, Toyota Camry Hybrid, etc. Car type: manual or automatic. Car year: 2005, 2010, 2011, 2020. Other information such as color, distance on odometer, etc can be added but not mandatory.
Every user can search for an advertisement based on the brand of the car, type of the car and location of sellers.
If buyers interested in an advertisement, he can bid a price.
Transactions are done outside the application thus not included in this project.

Creating Table Structures

Determine Table Relationship

Determine logical relation for table relationship using Primary Key and Foreign Key.

Determine Business Rules

Business rules are applied using constraints on required fields. For example, user information needs name, phone, and city thus cannot be blank or null.

Implement a Relational Database

Using PostgreSQL and Data Definition Language (DDL), the designed database is then used to generate Entity Relationship Diagram (ERD).

Generating Dummy Dataset

Once the database is designed, dataset should be inserted into the database. As there is no data available and the database should be tested for its functionality, sample data can be generated.

In this case library Faker from Python is used to generate dummy data. Library Faker and Random make generating dummy data looks easy and simple, however some logical should be carefully considered:

Every user can have more than 1 car advertised, but 1 car can only advertised once.
Date for advertising cars should be later than car year. Date for bidding the cars should be later than the advertisements of the car. It would be illogical if this approach is not adhered to.
Bid price should be higher than car price on the advertisement.

Table cars and cities are already available. It just needs to modify table cars to meets the business rules and generating data for users, ads, and bids.

Table dependency is then determined as generating dummy data should in order: first generated table should be the ones with zero dependency to the others.

Table users

# Import needed libraries
from faker import Faker
from tabulate import tabulate
import random
from datetime import datetime, timedelta
import csv

#Using Indonesian as localized providers
FAKER = Faker('id_ID')

def generate_name(n_name):
    """
    Fungsi untuk membuat nama dummy
    arg:
        - n_name (int) : jumlah data nama yang ingin dibuatreturn:
        names (list) : list nama yang sudah dibuat
    """

    names = list()
    while len(names) < n_name:
        first_name = FAKER.first_name()
        last_name = FAKER.last_name()
        full_name = (f'{first_name} {last_name}')
        if full_name not in names:
            names.append(full_name)

    return names

def generate_name(n_name):
    """
    Fungsi untuk membuat nama dummy
    arg:
        - n_name (int) : jumlah data nama yang ingin dibuat
    return:
        names (list) : list nama yang sudah dibuat
    """

    names = list()
    while len(names) < n_name:
        first_name = FAKER.first_name()
        last_name = FAKER.last_name()
        full_name = (f'{first_name} {last_name}')
        if full_name not in names:
            names.append(full_name)

    return names

users = users_table(n_users = 100, is_print = True)

2. Table cars

#define function to modify cars table
def cars_table(car_product, user_table, is_print):
    """
    Fungsi untuk membuat dummy data cars table
    header:
        - car_id
        - brand
        - model
        - body_type
        - car_type
        - car_year
        - color
        - km_distance
        - owner_idarg:
        - n_cars (int)  : Tabel cars yang sudah disediakan
        - is_print  (bool) : Jika True akan menampilkan hasil data

    return:
        - table  (list) :
    """

    # list warna mobil
    color = ['White', 'Silver', 'Red', 'Black', 'Blue']

    # list warna mobil
    car_type = ['MT', 'AT']

    # list tahun mobil
    car_year = [2005, 2010, 2011, 2020]

    # Buat table
    table = {}
    table["car_id"] = cars_csv["\ufeffproduct_id"]
    table['brand'] = cars_csv["brand"]
    table['model'] = cars_csv["model"]
    table['body_type'] = cars_csv["body_type"]
    table['car_type'] = [random.choice(car_type) for i in range(len(cars_csv["brand"]))]
    table['car_year'] = [random.choice(car_year) for i in range(len(cars_csv["brand"]))]
    table['color'] = [random.choice(color) for i in range(len(cars_csv["brand"]))]
    table['km_distance'] = [FAKER.random_int(10_000, 50_000, 100) \
                            for i in range(len(cars_csv["brand"]))]
    table["owner_id"] = [random.choice(user_table['user_id']) for i in range(len(cars_csv["brand"]))]
    
    # Print table
    if is_print:
        show_data(table)
    return table

cars = cars_table(car_product=cars_csv, user_table=users, is_print = True)

3. Table ads

def ad_date_generator():
    '''
    Fungsi untuk membuat tanggal iklanargs:
        None
 return:
  - ad_date (str)    : tanggal iklan
    '''
    # definisikan awal tanggal
    start_date = datetime(2022, 1, 1)

    # definisikan akhir tanggal
    end_date = datetime(2023, 12, 31, 23, 59, 59)

    # membuat dummy berdasarkan start_date dan end_date
    ad_date = FAKER.date_time_between(start_date = start_date,
                                        end_date = end_date)

    # mengubah object datetime ke string
    # reservation_date = reservation_date.strftime('%d-%m-%Y %H-%M-%S')
    # start_date = start_date.strftime('%d-%m-%Y %H-%M-%S')
    # end_date = end_date.strftime('%d-%m-%Y %H-%M-%S')

    return ad_date

def ads_table(n_ads, car_table, is_print):
    """
    Fungsi untuk membuat dummy data reservation table
    header:
        - ad_id
        - car_id
        - ad_date
        - price

    arg:
        - user_table (list)  : list of dictionary data users
        - car_table (list)  : list of dictionary data cars
        - is_print  (bool) : Jika True akan menampilkan hasil data
        - n_ads (int)  : Jumlah iklan yang ingin dibuat

    return:
        - table  (list) :
    """

    # Buat table
    table = {}
    table["ad_id"] = [i+1 for i in range(n_ads)]

    car_ids = cars['car_id']
    random.shuffle(car_ids)
    table["car_id"] = car_ids[:n_ads]
    date = [ad_date_generator() for i in range(n_ads)]
    table['ad_date'] = date

    price = []
    for i in range(len(table["car_id"])):
      for j in range(len(car_pd['product_id'])):
        if table["car_id"][i] == car_pd['product_id'][j].astype(str):
          price.append(car_pd['price'][j])
    table['price'] = price

    # Print table
    if is_print:
        show_data(table)
    return table

ads = ads_table(n_ads = 50, car_table = cars, is_print = True)

4. Table bids

def bid_table(n_bid, users_table, ads_table, is_print):
    """
    Fungsi untuk membuat dummy data bid table
    header:
        - bid_id
        - ad_id
        - bidder_id
        - bid_price
        - date_bid
        - bid_status
    arg:
        - n_bid : jumlah bid yang ingin dibuat
        - users_table (list)  : list of dictionary data users
        - ads_table (list)  : list of dictionary data ads
        - is_print  (bool) : Jika True akan menampilkan hasil data

    return:
        - table  (list) :
    """

    # Buat table
    table = {}
    table["bid_id"] = [i+1 for i in range(n_bid)]
    table["ad_id"] = [random.choice(ads['ad_id']) for i in range(n_bid)]
    table['bidder'] = [random.choice(users['user_id']) for i in range(n_bid)]

    bid_price = []
    for i in range(len(table["ad_id"])):
      for j in range(len(ads["ad_id"])):
        if table["ad_id"][i] == ads["ad_id"][j]:
          bid_price.append(FAKER.random_int(ads["price"][j], ads["price"][j]+300_000_000, 10_000_000))
    table['bid_price'] = bid_price

    date_bid = []
    for i in range(len(table["ad_id"])):
      for j in range(len(ads["ad_id"])):
        if table["ad_id"][i] == ads["ad_id"][j]:
          date_bid.append(FAKER.date_time_between(start_date = ads["ad_date"][j], end_date = datetime(2024, 6, 30, 23, 59, 59)))
    table['date_bid'] = date_bid

    table['bid_status'] = ['Sent' for i in range(n_bid)]

    # Print table
    if is_print:
        show_data(table)
    return table

bids = bid_table(n_bid = 100, users_table = users, ads_table = ads, is_print = True)

The data then saved into csv.

def save_to_csv(data, nama_file):
    '''
    Fungsi untuk menyimpan data dummy ke csv

  args:
        - data (list)     : list of dictionary data yang akan dijadikan csv
        - nama_file (str) : nama untuk file csv

 return:
  - None
    '''

    # Membuat file csv
    with open(file = f"{nama_file}.csv", mode = 'w', newline = '') as csv_file:
        # Membuat writer csv
        writer = csv.writer(csv_file)

        # write header csv
        writer.writerow(list(data.keys()))

        # mengetahui panjang data
        len_data = len(list(data.items())[0][1])

        # write data ke file csv
        for i in range(len_data):
            row = []
            for key in data.keys():
                row.append(data[key][i])
            writer.writerow(row)

save_to_csv(data = users,
            nama_file = 'users')

save_to_csv(data = cities,
            nama_file = 'cities')

save_to_csv(data = cars,
            nama_file='cars')

save_to_csv(data = ads,
            nama_file ='ads')

Once all the data is completed, the CSV files can be used to import data into the PostgreSQL database, ensuring adherence to the previously established business rules and constraints.

CREATE TABLE cities
(
 kota_id INTEGER PRIMARY KEY,
 nama_kota VARCHAR(100) NOT NULL,
 long_lat POINT NOT NULL
);

CREATE TABLE users
(
 user_id SERIAL PRIMARY KEY,
 name VARCHAR(100) NOT NULL,
 phone VARCHAR(100) NOT NULL,
 email VARCHAR(100) NOT NULL,
 city INTEGER NOT NULL
);

CREATE TABLE cars
(
 car_id SERIAL PRIMARY KEY,
 brand VARCHAR(100) NOT NULL,
 model VARCHAR(100) NOT NULL,
 body_type VARCHAR(100) NOT NULL,
 car_type VARCHAR(100) NOT NULL,
 car_year INTEGER NOT NULL,
 color VARCHAR(100),
 km_distance INTEGER,
 owner_id INTEGER NOT NULL
);

CREATE TABLE ads
(
 ad_id SERIAL PRIMARY KEY,
 car_id INTEGER NOT NULL,
 ad_date DATE NOT NULL,
 price INTEGER NOT NULL
);

CREATE TABLE bids
(
 bid_id SERIAL PRIMARY KEY,
 ad_id INTEGER NOT NULL,
 bidder INTEGER NOT NULL,
 bid_price INTEGER NOT NULL,
 date_bid DATE NOT NULL,
 bid_status VARCHAR(100) DEFAULT 'sent'
);

--ADDING RELATIONSHIP
ALTER TABLE users
ADD CONSTRAINT fk_cities_users
 FOREIGN KEY(city)
 REFERENCES cities(kota_id)
;

ALTER TABLE cars
ADD CONSTRAINT fk_cars_users
 FOREIGN KEY(owner_id)
 REFERENCES users(user_id)
;

ALTER TABLE bids
ADD CONSTRAINT fk_bids_users
 FOREIGN KEY(bidder)
 REFERENCES users(user_id)
;

ALTER TABLE ads
ADD CONSTRAINT fk_cars_ads
 FOREIGN KEY(car_id)
 REFERENCES cars(car_id)
;

ALTER TABLE bids
ADD CONSTRAINT fk_bids_ads
 FOREIGN KEY(ad_id)
 REFERENCES ads(ad_id)
;

-- inserting the dummy data from csv file
COPY cities
FROM 'D:\APRILIA\Pacmann - SQL\FINAL PROJECT - Popop Cars\Dummy Data\cities.csv'
CSV
HEADER;
COPY users
FROM 'D:\APRILIA\Pacmann - SQL\FINAL PROJECT - Popop Cars\Dummy Data\users.csv'
CSV
HEADER;
COPY cars
FROM 'D:\APRILIA\Pacmann - SQL\FINAL PROJECT - Popop Cars\Dummy Data\cars.csv'
CSV
HEADER;
COPY ads
FROM 'D:\APRILIA\Pacmann - SQL\FINAL PROJECT - Popop Cars\Dummy Data\ads.csv'
CSV
HEADER;
COPY bids
FROM 'D:\APRILIA\Pacmann - SQL\FINAL PROJECT - Popop Cars\Dummy Data\bids.csv'
CSV
HEADER;

--checking data
SELECT * FROM cities;
SELECT * FROM users;
SELECT * FROM cars;
SELECT * FROM ads;
SELECT * FROM bids;

Here, the steps for creating a database is already finish. Now the database can be checked whether all rules and constraint is followed. Also see if the database is ready and able to answers the business questions by undertake transactional query and analytical query.

Creating Transactional Query

Find cars with year > 2015:

--Find cars with year > 2015
SELECT *
FROM cars
WHERE car_year > 2015
ORDER BY car_year
;

2. Add a new bid:

--2 Add a new bid
INSERT INTO bids(bid_id,ad_id,bidder,bid_price,date_bid)
VALUES (101,50,50,500_000_000,'2022-04-15')
;

SELECT * FROM bids
ORDER BY 1 DESC
;

3. Show all cars sold by an account, sorted by the newest. Example: Cars sold by ‘Adiarja Permadi’:

SELECT c.car_id,
 brand,
 model,
 car_year,
 price,
 ad_date
FROM cars c
JOIN ads a USING (car_id)
JOIN users u
 ON u.user_id = c.owner_id
WHERE name = 'Adiarja Permadi'
ORDER BY 6 DESC
;

4. Find the cheapest cars based on a keyword. Example: “Yaris”:

SELECT c.car_id,
 brand,
 model,
 car_year,
 price
FROM cars c
JOIN ads a USING (car_id)
WHERE model LIKE '%Yaris%'
ORDER BY 5
;

5. Find cars based on the nearest location. Example: Nearest to location id 3173:

--Create function to calculate distance using euclidean 
CREATE OR REPLACE FUNCTION euclidean_distance(point1 POINT, point2 POINT)
RETURNS FLOAT AS $$
DECLARE
    lon1 FLOAT := point1[0];
    lat1 FLOAT := point1[1];
    lon2 FLOAT := point2[0];
    lat2 FLOAT := point2[1];
    lon_diff FLOAT;
    lat_diff FLOAT;
    distance FLOAT;
BEGIN
    lon_diff := lon1 - lon2;
    lat_diff := lat1 - lat2;
    distance := SQRT(lon_diff * lon_diff + lat_diff * lat_diff);
    
    RETURN distance;
END;
$$ LANGUAGE plpgsql;

--Find cars with nearest distance with kota_id = 3173
SELECT c.car_id,
 brand,
 model,
 car_year,
 price,
 euclidean_distance(long_lat, (SELECT long_lat FROM cities WHERE kota_id = 3173)) distance
FROM cars c
JOIN users u
 ON c.owner_id = u.user_id
JOIN cities ci
 ON ci.kota_id = u.city
JOIN ads a
 ON a.car_id = c.car_id
WHERE euclidean_distance(long_lat, (SELECT long_lat FROM cities WHERE kota_id = 3173)) = 0
;

Creating Analytical Query

Car brand popularity:

SELECT model,
 COUNT(a.car_id) count_product,
 COUNT(bid_id) count_bid
FROM cars c
FULL OUTER JOIN ads a USING(car_id)
FULL OUTER JOIN bids b USING(ad_id)
GROUP BY 1
ORDER BY 3 DESC
;

2. Comparing car price depends on average price in every city:

SELECT nama_kota,
 brand,
 model,
 car_year,
 price,
 AVG(price) OVER(PARTITION BY nama_kota) avg_car_city
FROM cars c
JOIN users u
 ON c.owner_id = u.user_id
JOIN cities ci
 ON ci.kota_id = u.city
JOIN ads a
 ON a.car_id = c.car_id
;

3. Depends on a car model, find comparison between date bid and its price with next bid:

SELECT model,
 user_id,
 FIRST_VALUE(date_bid) OVER(ORDER BY date_bid) first_bid_date,
 LEAD(date_bid) OVER(ORDER BY date_bid) next_bid_date,
 FIRST_VALUE(price) OVER(ORDER BY date_bid) first_bid_price,
 LEAD(price) OVER(ORDER BY date_bid) next_bid_price
FROM cars c
JOIN users u
 ON c.owner_id = u.user_id
JOIN ads a
 ON a.car_id = c.car_id
JOIN bids b
 ON b.ad_id = a.ad_id
WHERE model = 'Toyota Yaris'
;

4. Comparing the percentage difference in the average car price based on the model and the average bid price offered by customers in the last 6 months:

WITH avg_6mo AS
 (
  SELECT model,
   AVG(price)::float avg_price_6mo
  FROM cars c
  JOIN ads a
   USING(car_id)
  WHERE ad_date >= (SELECT MAX(ad_date) FROM ads) - INTERVAL '6 months' 
  -- WHERE ad_date >= '2023-06-01'
  GROUP BY 1
 ),

avg_price AS
 (
  SELECT model,
   AVG(price)::float avg_price
  FROM cars c
  JOIN ads a
   USING(car_id)
  GROUP BY 1
 )
SELECT *,
 avg_price - avg_price_6mo difference,
 (avg_price - avg_price_6mo)/avg_price*100 difference_pct
FROM avg_price
JOIN avg_6mo
 USING(model)
;

5. Create a window function for the average bid price of a car brand and model over the last 6 months. Example: Toyota Yaris cars over the last 6 months:

WITH avg_bid_price_data AS (
 SELECT 
  brand, 
  model, 
  date_bid, 
  price,
  AVG(price::float) OVER (PARTITION BY brand, model ORDER BY DATE_TRUNC('month', date_bid) ASC) AS avg_price,
  EXTRACT(MONTH FROM (SELECT MAX(date_bid) FROM bids)) - EXTRACT(MONTH FROM DATE_TRUNC('month', date_bid)) AS month_diff
 FROM bids b
 INNER JOIN ads a
 USING (ad_id)
 INNER JOIN cars c
 USING (car_id)
 WHERE date_bid >= (SELECT MAX(date_bid) FROM bids) - INTERVAL '6 months' AND model = 'Honda CR-V'
)

SELECT
  brand,
  model,
  MAX(CASE WHEN month_diff = 6 THEN avg_price ELSE NULL END) AS avg_bid_price_m_6,
  MAX(CASE WHEN month_diff = 5 THEN avg_price ELSE NULL END) AS avg_bid_price_m_5,
  MAX(CASE WHEN month_diff = 4 THEN avg_price ELSE NULL END) AS avg_bid_price_m_4,
  MAX(CASE WHEN month_diff = 3 THEN avg_price ELSE NULL END) AS avg_bid_price_m_3,
  MAX(CASE WHEN month_diff = 2 THEN avg_price ELSE NULL END) AS avg_bid_price_m_2,
  MAX(CASE WHEN month_diff = 1 THEN avg_price ELSE NULL END) AS avg_bid_price_m_1
FROM avg_bid_price_data
GROUP BY brand, model;