Since I was recently researching different diffusion models I thought it would be nice to see whether they are applicable to popularity of information researched online.

I could choose any topic but I instantly remembered time when Candy Crash Saga was very popular in Poland. I wondered whether it was similar in other countries.

For those not familiar with the topic. Candy Crush Saga is a mobile game based on connect-4 principle.

https://en.wikipedia.org/wiki/Candy_Crush_Saga

I used gtrendsR package to retrieve relative number of searches of CCS over time.

Below is example how we can retrieve term popularity for given keyword and country:

  library(gtrendsR)
usr <- "user@gmail.com"
psw <- "pwd"
gconnect(usr, psw)

# "/m/0rytj3p" is a hashcode for CCS mobile game
ccs_trend <- gtrends(query = "/m/0rytj3p",
start_date = "2012-04-01",
geo = c("US", "PL"))


There are important notes that I need to make when it comes to retrieve data via this Google Trends API:

• there is an user quota something about 500 requests a day
• you can issue only 10 requests per minute
• it is relative keyword popularity within country
• requested data is always normalized to max = 100 (I kept US as a reference category)
• if there is not enough data in any of the selected countries it will return error
• if you want to use more complex search term like CCS mobile game you need to manually select it in Google Trends and copy the hash code of the term from embedding script - details

I must admit overall this was more challenging than expected but as long as you keep all those limitations in mind everything should be fine. At the end of the day I ended up with weekly data from 61 countries.

Below I plotted randomly selected countries to see what different popularity patterns we can observe:

Diffusion Theory

Diffusion theory explains how information expands through connected systems via three parameters:

• p - appeal of innovation
• q - propensity to immitate
• m - ultimate potential

Specific models have different dependencies on those parameters or assume their heterogeneity in population under study. Here I will use shifted Gompertz distribution but other choices are possible (Bass, Gamma Shifted Gompertz, Weibull). Examples of the evolution curves based on SG distribution may be found here.

Shifted Gompertz diffusion model

We assume that adoption of the app is following shifted Gompertz distribution with pdf:

where and .

Let denote cumulative density function of SG and since we have weekly aggregated data we will assume that is measured in weeks. Then term popularity in week is equal to:

Usually diffusion models are estimated using non-linear least squares technique1 which is equivalent to

where denotes observed term popularity in week .

Below is my Stan implementation of this model:


data {
int<lower=2> T;                         // number of observed periods
real<lower=0> S[T];                     // share of users adoptions
}

parameters {
real<lower=0> b;                          // appeal of innovation
real<lower=0> eta;                        // rate of adoption
real<lower=0> sigma;
real<lower=0> m;
}

transformed parameters {
vector[T] share;
share = sg_weekly_share(T, m, b, eta);    // N_t
}

model {
S ~ normal(share, sigma);
}

generated quantities {
vector[T] pred_s;
real p;
real q;

pred_s = share;
p = b * exp(- eta);
q = b - p;
}


Cross-country comparison

Below I plotted all the countries with respect to the and . Size of the label is proportional to the log of keyword popularity .

I grouped countries with respect to speed and dynamics of adoption. Below I plotted selected countries from the respective groups. Red line is theoretical diffusion curve of fitted SG model. I plotted additionally 95% confidence interval.

Sweden had highest parameter which means that it adopted CCS fastest along with Great Britain, Netherlands and Greece.