Movie Sales — Data Analysis Project

Aprilia
4 min readOct 21, 2022

--

Ok, this is my very first published data analysis.

The dataset is about the movie sales during certain period of years. I only use Python to do the data cleaning until visualization. Here is the steps I took:

Importing libraries and dataset.

Take a glance at the data

Checking the columns, the value counts and duplicated data.

I decided to remove the duplicated data, there are 2 duplicated data.

From dataset info, there is a column that contains date of game release, the type is object, so I change the data type using pd.to_datetime().

I also extract the data to get the year of game release instead of whole date.

How to treat null value:

There is only 1 columns that has several null values: Series. Here I decide to assign ‘No series specified’ to null value because I think not all games have series and null here may indicate that there is no series in the specified games.

Finish cleaning the data, now I move on to exploring the data. First thing first, I want to know how many games is produces every year. Using the column ‘Year’ that I already extract from release date, I calculated how many games released every year.

Then I make the chart, here I choose bar chart, using matplotlib library.

From the chart, we can see that since 1984 the number of games released is increasing until it reached the maximum on 2014.

After that, I am going to take a glance into the genre, so I make a pie chart to see the most published genre, here I take top 10 genres.

But then when I check the dataset, in a column of genre there are multiple genres separated by commas. So to make simple, I take the first mentioned genre and use it as a main genre.

and generate the pie chart.

From the chart, we can see that Real-time strategy is the most genre published. Followed by first-person shooter and action role-playing. Later, this information is going to be used to make the suggestion for developer and publisher to make games of which genre that will give more sales:

First, I want see to 10 developers who had most games.

And then I compare with the sales

As we can see from the chart, Blizzard Entertainment was in the first place both for most games produced and most sales. Otherwise, Maxis was in the second most produced game but it was in the fourth place for sales.

Thus I try to correlate it with the genre of games by Maxim.

We can see that Maxis produce 3 Life simulation games. Compared to the chart of all games genre, Maxis can produce more games of Real-time strategy to push the sales.

I do the same steps to the developers:

First, see top 10 publishers.

And then compare with top 10 publishers with the most sales:

Electronic Arts was at first most published games (19 games). While for the sales it was in the second after Blizzard Entertainment. As what I did before, I try to look at the genre of games published by Electronic Arts:

Electronic Arts has First-person shooter as the most published games genre. So, based on the most genre published in this period, they can try to have more Real-time strategy genre to boost the sales.

This analysis maybe very rough and has many shortcomings. Should the reader have any suggestion, feel free to comment.

You can also access the dataset and python script I used here, in my git account here: https://github.com/aprilialiaa/DAMC-Oct_Games-Analysis

--

--

Aprilia
0 Followers

Portfolio about Data Analytics by Aprilia. Any comments or suggestions are welcome.