
Vocaloid Popularity:
An Investigation
DATA MANIPULATION
PREPARATION
Survey
The original data file was saved into a separate copy for manipulation, while the original was marked as final to avoid accidental editing.
The manipulate file were then checked for null value in suitable fields (No null value found), duplicated entries were manually searched for and deleted. Number of answered choices for each question were then counted using the Excel function COUNTIF(FormResponses![Starting Cell]:[Ending Cell],"*[Value counted]*") and put into a separate sheet for each question. A number ID was also assigned to each record to allow merging sheets in Tableau.

Figure 1. An example of COUNTIF function, used to count the number of "Yes" responses to the question "Do you think the popularity of Vocaloid as a culture have decreased?"
3 questions in the survey (“If yes, why?”, “When was the last time you found a good Vocaloid song that you liked?” and “Do you have any other opinion on this topic that might be helpful for the research?”) had answering options that allowed qualitative data to be entered (“Other” option that allowed additional info entry for multiple-choice questions or text field). This data was then coded with the following procedures:​
-
If yes, why? Besides the 5 options provided, significant data provided in the “Other” field was coded into 2 extra reasons: “Fanbase’s boredom and lack of creativity due to outdatedness” and “Too many low quality Vocaloids”. The number of these answers were also counted and included in the final analysis. Additionally, answers in the “Opinion” question that contributed to this question was also counted.
Figure 2. Coded and counted answers to "If yes, why?" question in the survey

-
When was the last time you found a good Vocaloid song that you liked? Answers in the “Other” field for this question was coded into suitable, already existing options or a new option: “Not within 1 year” and included in the final analysis.
-
Do you have any other opinion on this topic that might be helpful for the research? Answers to this question were either coded to add to the “If yes, why?” question or offer extra insight to the analysis process.
Google Trends
Data collected for each search term was included in a file, resulting in 24 data files. They were all compiled into 2 main datasets: “CompiledWeb” for Web search data and “CompiledYouTube” for YouTube search data. In the process, Japanese search terms were translated into English, marked by an asterisk (*) at the end.


Figure 3. Raw Google Trends Web search data files
Figure 4. Example of raw
Web search data for
search term "Vocaloid"

Figure 5. Compiled Google Trends Web Search data set
Graphs - Fritz
Graphs were downloaded from the source as image (.gif) files, then copied and saved for translation. Translations were provided by Ichika Matsumoto, a Japanese Peninsula Grammar alumni and personal acquaintance, then cross-checked with Google Translate and a personal lexicon of specific terms to create the final translation. The necessary translations were edited into the graphs using Photoshop.
SURVEY
Previously, the survey responses dataset was manipulated to count the number of different values for each field using the function COUNTIF.
Data was imported into Tableau and used to create the color-coded graphs in the following dashboard:

GOOGLE TRENDS
The compiled data sets were imported into Tableau and manipulated as follows:
-
Time series graphs depicting the data sets were created for each language of search term (Japanese and English) and for each platform (YouTube and Web). Link to all time series graphs.
-
Since there are multiple lines which makes it difficult to fit a trend line and these lines are nearly identical with the exception of the Japanese Vocaloid lines, 2 calculated fields were created, each calculating the average interest over time of all search terms for each language, on each platform. The resulting averages were plotted onto time series graphs and fitted with a trend line each. The trend lines were programmed to display new trend line for highlighted points.
-
Using the highest value point for the English average of each platform as a dividing point (July 2012 for Web and September 2011 for YouTube), the points before (then after) these dividing points were highlighted to display the trend line and correlation coefficient (r-squared) for each period of time.
-
The average correlation coefficient of all decreasing trend periods across 4 time series plots is:


