Analysis on Amazon Prime Movies — From Cleaning until Dashboarding

Aprilia
4 min readDec 7, 2022

--

Amazon Prime is one of the provider that serves a subscription membership to Amazon. It offers customers premium services for a yearly or monthly fee. It is always interesting to have analysis about movies and TV shows in Amazon Prime.

Datasets is downloaded from kaggle. This time, I am trying to explore the data using spreadsheet and do the visualization using Looker Studio (was Google Data Studio).

After downloading the data, first time to do is to take a glance of the data. There are 12 columns with show_id as a unique identifier. Column type shows the type of shows — Movie or TV Show for each record. The rest, shows as the columns’ name: title of the show, director the show, its cast, country of origin, date added to Amazon Prime, release year of the show, rating given, duration of each show (minutes for movies and seasons for TV shows), listed in means genre, and description of the shows.

I decided to drop the last column (description) as I do not want to use it in further analysis. Then I checked for blank value using pivot table. I use pivot table to know the proportion of blank value so that later I can decide which treatment I will use for those values.

One pivot table for each column with COUNTA of show_id as the value in percent of column so that I get the percentage of blank value. Some columns have blank values with more than 290%, which are country and date_added. I decided to drop the entire columns as there will be bias if I try to fill with arbitrary value. Meanwhile, director, cast, and rating has missing values with less than 25% so I decided to fill with a value and include them to the analysis. Lastly, I decided to take main genre in column listed_in by the first genre listed in the column. Other columns are just fine.

Now we are ready for visualization using Looker Studio. Visualization is one of the most important thing is serving a data. By having data visualized, it makes us easier to understand what the data implies, as we as human is a visual creature: it is easier for us to process information in form of images rather than bunch of texts in form of columns and rows.

I made 2 pages: the first one has the general information, while with the second one, we can explore the data based on the shows: movies or TV shows. The dashboard can be accessed here:

https://datastudio.google.com/reporting/8eca9e87-afad-4c2e-8b17-4b26c8b988d1

First chart is pie chart which shows the composition of the show: about 80% are movie, the rest are TV shows. Then from the line chart which shows the number of shows released in the past 10 years. From the trendline we can see that number of movies is increasing significantly. Horizontal bar chart shows that the most rating of the shows is ‘13+’ and most genre is Drama.

From the second page, we can see number of shows released in Amazon Prime based on type of show. We can also see the average of duration for the shows released each year, top 5 directors with most number of shows and last the popular genre.

By analyzing the Amazon Prime Movies and TV Shows, there are some insights that we can conclude: Number of movies are still bigger than number of TV Shows, while the most rating from the shows published is ‘13+’ followed by ‘16+’. Drama and Action are the most genre of shows with Suspense being the last.

With more information about the most watched movies/TV shows we can have some insight about what to improve in releasing the shows. Maybe to boost the TV Shows or by having more Comedy genre or others.

--

--

Aprilia
0 Followers

Portfolio about Data Analytics by Aprilia. Any comments or suggestions are welcome.