Advanced recipes
advanced_recipes.Rmd
This vignette gives some examples of more advanced/involved uses of STbayes.
Multi-network models
To create a multi-network model, you can just add more columns to the
networks
data-frame. Below is a toy data-set of two trials
each with 3 individuals, and we define a kin and inverse distance
network.
event_data <- data.frame(
trial = rep(1:2, each = 3),
id = LETTERS[1:6],
time = c(2, 1, 3, 4, 2, 1),
t_end = c(3, 3, 3, 4, 4, 4)
)
networks <- data.frame(
trial = rep(1:2, each = 3),
from = c("A", "A", "B", "D", "D", "E"),
to = c("B", "C", "C", "E", "F", "F"),
kin = c(1, 0, 1, 0, 1, 1), # first network
inverse_distance = c(0, 1, .5, .25, .1, 0) #second network
)
data_list <- import_user_STb(
event_data = event_data,
networks = networks)
Individual-level variables (ILVs)
Individual-level variables can be either constant (e.g. age, sex) or time-varying (e.g. body condition). Below I define constant ILVs age, sex and weight to add to our toy dataset above.
ILV_c <- data.frame(
id = LETTERS[1:6],
age = c(-1, -2, 0, 1, 2, 3), # continuous variables should be normalized
sex = c(0, 1, 1, 0, 1, 0), # Factor ILVs must be input as numeric
weight = c(0.5, .25, .3, 0, -.2, -.4)
)
The data-frame ILV_c holds the values for each individual. The
formatting of id
should match the event_data and networks
data-frames. At the moment, ILVs can be continuous or binary. You should
normalize continuous variables, and binary variables should be coded as
0, 1. I plan to expand this to multi-category factors, but for the
moment you must directly edit the Stan code to use these.
We need to explicitly tell STbayes which variables are additive (acting independently on intrinsic or social rates) and multiplicative (same effect estimated for intrinsic and social rates). Below, I have specified age as acting independently on the intrinsic and social rate, sex as acting only on the social rate, and weight as a multiplicative effect. Two betas will be estimated for age, and a single beta will be estimated for sex and weight.
data_list <- import_user_STb(
event_data = event_data,
networks = networks,
ILV_c = ILV_c,
ILVi = c("age"),
ILVs = c( "age", "sex"),
ILVm = c("weight")
)
Time-varying ILVs
Below we add distance from resource as a time-varying ILV. Similar to
dynamic networks, these need to be summarized per inter-event interval.
For example, the value of dist_from_resource
at
time=1
should reflect the average distance of the
individual to the resource from the start of the observation period to
the first event.
ILV_tv <- data.frame(
trial = c(rep(1, each = 9),rep(2, each = 9)),
id = c(rep(LETTERS[1:3], each=3), rep(LETTERS[4:6], each=3)),
time = c(rep(1:3, times = 3), rep(1:3, times=3)),
dist_from_resource = rnorm(18)
)
data_list <- import_user_STb(
event_data = event_data,
networks = networks,
ILV_c = ILV_c,
ILV_tv = ILV_tv,
ILVi = c("age", "dist_from_resource"),
ILVs = c("sex"),
ILVm = c("weight")
)
Here, we will include it as an additive effect on the intrinsic rate along with age.
Transmission weights
Transmission weights are vital information that can dramatically improve how the model fits the data. NBDA models have supported static transmission weights (), which are usually some rate of behavioural usage. In the context of cultural transmission, individuals who rarely use a novel behaviour have a much lower chance of serving as demonstrators for others compared with individuals who frequently use it. In a study of disease transmission, this might be something like shedding rate. Transmission weights must be 0 or positive: , where we divide the number of behaviours produced by how the duration that individual is knowledgeable. We just need to define a dataframe with this information. For static transmission weights, we define:
t_weights_static <- data.frame(
id = LETTERS[1:6],
t_weight = rbeta(6, 1, 5)
)
STbayes introduces the possibility of using dynamic transmission
weights, which can change over time. Dynamic transmission weights
are defined with an additional time
column. Similar to
dynamic networks, these integer times correspond to the inter-event
intervals.
are the number of behaviours produced during interval / duration of
interval:
t_weights_dynamic <- data.frame(
trial = c(rep(1, each = 9),rep(2, each = 9)),
id = c(rep(LETTERS[1:3], each=3), rep(LETTERS[4:6], each=3)),
time = c(rep(1:3, times = 3), rep(1:3, times=3)),
t_weight = rbeta(18, 1, 5)
)
These can be included in the model by:
data_list <- import_user_STb(
event_data = event_data,
networks = networks,
t_weights = t_weights_dynamic
)
You can mix and match static/dynamic networks with static/dynamic transmission weights.
Varying effects by individual
You may apply varying effects for each individual for the intrinsic
rate (lambda_0), social learning rate (s) and any of the ILVs. You do
this by specifying specific parameter names using the argument
veff_ID
in the call to generate_STb_model()
or
generate_STb_asocial_model
:
model = generate_STb_model(data_list, veff_ID = c("lambda_0", "s"))
For lambda_0 and s, varying effects are added onto the main effect on
the log scale. For example, the model would calculate a vector of
lambda_0 values for each individual in the
transformed parameters
block:
and use those values when calculating
the likelihood in the model
block.
Other model types: OADA and dTADA
So far we have been building continuous time of acquisition models—we
have measured when individuals experienced an event. You might be less
certain about the timing of events. If you have coarser data where you
know that individuals learned within a given time period, but are
uncertain of exactly when, you can create a discrete TADA (dTADA) model.
If you have no time information, but only the order of acquisition, you
can also create an OADA model. The data_type
argument
allows you to choose between these options:
model_cTADA = generate_STb_model(data_list, data_type = "continuous_time") #default
model_dTADA = generate_STb_model(data_list, data_type = "discrete_time")
model_OADA = generate_STb_model(data_list, data_type = "order")
A flow chart for deciding between models can be found in Hasenjager et al. 2021.
Edge weight uncertainty
Rather than using point estimates for edge weights, it is possible to
import posterior distributions of edge weights from bayesian network
models, such as those fit by the bisonr package or
the STRAND
package. This is done by providing single fits (or a list of fits)
to import_user_STb()
.
bisonr_fit <- STbayes::bisonr_fit
# network has 10 individuals, create mock diffusion data
event_data <- data.frame(
trial = 1,
#bison stores ids as characters, ids should match across dataframes
id = as.character(c(1:10)),
time = sample(1:101, 10, replace = FALSE),
t_end = 100
)
data_list <- import_user_STb(event_data, networks = bisonr_fit)
model <- generate_STb_model(data_list)
# it's also possible to do multi-network models.
# here i add two of the same just to illustrate
data_list <- import_user_STb(event_data, networks = list(bisonr_fit, bisonr_fit))
model <- generate_STb_model(data_list)
Complex transmission
STbayes can be used to fit and create models of complex transmission.
You can create a log-likelihood that includes a frequency-dependent
transmission rules by using the transmission_func
argument
of generate_STb_model
:
data_list = import_user_STb(STbayes::event_data, STbayes::edge_list)
model_standard = generate_STb_model(data_list, transmission_func="standard") # default
model_f = generate_STb_model(data_list, transmission_func="freqdep_f")
model_k = generate_STb_model(data_list, transmission_func="freqdep_k")
The package includes a “freqdep_f” model of frequency-dependent bias as:
STbayes also includes an alternative parameterization “freqdep_k” based on a scaled version of Dino Dini’s normalized tunable sigmoid function. In the first case, f<1 would be evidence of an anti-conformist transmission bias, f=1 would be proportional, and f>1 would be conformist. In the second case, the shape parameter k < 0 is conformist, k=0 proportional, and k>0 anti-conformist. Both parameterizations create a similar relationship between the proportion of informed neighbours and the weight of that information on the rate of an event happening for a given individual:

You might find that one or the other has convergence issues, so we provide both. Complex transmission can be implemented with varying effects, ilvs, oada and ctada type models, etc.
Setting priors
You might want to customize the priors of your model. Here are the default priors:
default_priors <- list(
log_lambda0 = "normal(-4, 2)", # intrinsic rate
log_sprime = "normal(-4, 2)", # social transmission rate
beta_ILV = "normal(0,1)", # beta coefficients for ILVs
log_f = "normal(0,1)", # exponent parameter under freqdep_f models
k_raw = "normal(0,3)", # k parameter under freqdep_k models
z_ID = "normal(0,1)", # varying effects
sigma_ID = "normal(0,1)", # SD of varying effects
rho_ID = "lkj_corr_cholesky(3)", # LKJ prior for veff corr matrix
gamma = "normal(0,1)" # gamma parameter for Weibull intrinsic hazard model
)
To adjust any one of them, call generate_STb_model and provide a named list of those you wish to change in the priors argument. The distributions should be written with the usual Stan convention. You only need to supply those which you wish to change. For example, you might want to tighten your priors on intrinsic rate, social rate and beta params for ILVs.
data_list = import_user_STb(STbayes::event_data, STbayes::edge_list)
model <- generate_STb_model(data_list, priors = list(
log_lambda0 = "normal(-4, 1)",
log_sprime = "uniform(-4, 1)",
beta_ILV = "normal(0, .5)"
))
At the moment, the function will always print all priors that are possible to specify, regardless of the type of model you are creating. At least this makes it easy to reference the names when you need them.
High-resolution data mode
If you have very high-frequency sampling of behaviour (e.g. from a motion capture system), you can take advantage of it to create a model with dynamic networks and dynamic transmission weights that can change in value frequently between events. Sampling intervals should be regular (e.g. 30 fps). Below is a dataset where networks and transmission weights are recorded every time-step, rather than summarized per inter-event interval. If fitting a model with standard transmission, STbayes preprocesses the data for efficient fitting.
data_hires <- import_user_STb(
event_data = STbayes::events_hires,
networks = STbayes::networks_hires,
t_weights = STbayes::t_weights_hires,
high_res = T
)
model_hires <- generate_STb_model(data_hires, transmission_func = "standard")
If you plan to fit a model with complex transmission, it’s not
possible to pre-process the data because of the non-linearity of the
transmission rule. You should use import_user_STb2()
.
High-res complex transmission models will take significantly longer to
fit as they loop through each time-step, rather than inter-event
intervals.
data_hires <- import_user_STb2(
event_data = STbayes::events_hires,
networks = STbayes::networks_hires,
t_weights = STbayes::t_weights_hires,
high_res = T
)
model_hires <- generate_STb_model(data_hires, transmission_func = "freqdep_k")
Import NBDA data objects
STbayes is compatible with most NBDA objects. The function will try
to figure out what was meant to be done with ILVs. However, some
features of STbayes require using import_user_STb
. Below we
read in an NBDA object (taken from Tutorial 4.1 from Hasenjager et
al. 2021):
nbdaData_cTADA <- STbayes::tutorial4_1
data_list = import_NBDA_STb(nbdaData_cTADA)
str(data_list)