Reshaping Network Meta-Analysis Datasets: netmeta to gemtc



To conduct a meta-analysis of (contrast-based) effect size data (i.e. pre-calculated effect sizes of treatment comparisons along with their standard error), different data entry formats are needed for {netmeta} and {gemtc}.

In {netmeta}, each treatment comparison/effect size corresponds with one line in the data set. The treat1 and treat2 columns are used to encode the two treatments that are being compared. This can be seen as a “wider” data format.

In {gemtc}, relative effect data has to be provided in a “longer” format. Each treatment comparison consists of two rows. In the first one, the calculated effect (e.g. SMD, MD, logOR) and its standard error is provided. The second row contains the name of the reference treatment (i.e. the treatment to which the treatment in the first row was compared to), and NA in the effect size and standard error columns (provided the comparison is not part of a multi-arm study; see below).

In our assessment, the data format in {netmeta} is closer to how network meta-analysis data is usually collected, for example in Excel sheets. In this vignette, we will therefore show how to reshape network meta-analysis data from the “wider” {netmeta} to the “longer” {gemtc} format.

More information on {netmeta}, {gemtc} and network meta-analysis can be found in the “Doing Meta-Analysis in R” guide.


In this vignette, we will reshape the TherapyFormats data set. This data set is part of {dmetar}, but can also be downloaded as an .rda file here.


# Load `TherapyFormats` data set
##          author     TE  seTE treat1 treat2
## 1  Ausbun, 1997  0.092 0.195    ind    grp
## 2  Crable, 1986 -0.675 0.350    ind    grp
## 3  Thiede, 2011 -0.107 0.198    ind    grp
## 4 Bonertz, 2015 -0.090 0.324    ind    grp
## 5     Joy, 2002 -0.135 0.453    ind    grp
## 6   Jones, 2013 -0.217 0.289    ind    grp

This data set can be used in {netmeta} as is. To conduct a network meta-analysis on the same data in {gemtc}, however, we have to bring it in a “longer” format. This can be achieved using the pivot_longer function in the {tidyr} package. For this example, we also have to load the {dplyr} und {magrittr} package additionally.

It may take some time to get to accustomed to the logic of pivot_longer, but the function is generally very intuitive and powerful. If you want to learn more about pivot_longer, and its counterpart pivot_wider, you can have a look at this vignette.

To pivot the data, we (1) select the first five columns in TherapyFormats, (2) forward the result in a pipe to pivot_longer, and then (3) define the column names needed for the argument for relative effect size data in


TherapyFormats %>%
  dplyr::select(1:5) %>%
               names_to = c(".value"),
               names_pattern = "(..)") %>% 
  set_colnames(c("study", "diff", 
                 "std.err", "treatment")) -> data

Now, let us have a look at the new data format.

head(data, 10)
## # A tibble: 10 × 4
##    study           diff std.err treatment
##    <chr>          <dbl>   <dbl> <chr>    
##  1 Ausbun, 1997   0.092   0.195 ind      
##  2 Ausbun, 1997  NA      NA     grp      
##  3 Crable, 1986  -0.675   0.35  ind      
##  4 Crable, 1986  NA      NA     grp      
##  5 Thiede, 2011  -0.107   0.198 ind      
##  6 Thiede, 2011  NA      NA     grp      
##  7 Bonertz, 2015 -0.09    0.324 ind      
##  8 Bonertz, 2015 NA      NA     grp      
##  9 Joy, 2002     -0.135   0.453 ind      
## 10 Joy, 2002     NA      NA     grp

We see that the data now has the desired “longer” format, where the reference group values for diff and std.err are NA in each trial.

It is also useful to create another data frame in which the full name of the treatments in our network is stored. This information can be added to the treatments argument in, and makes it easier to create network plots further down the line, among other things. There are certainly more elegant approaches, but this is one way to create such a table:


# Show all treatments
## [1] "ind" "grp" "gsh" "tel" "wlc" "cau" "ush"
c("ind" = "Individual", 
  "grp" = "Group",
  "gsh" = "Guided Self-Help",
  "tel" = "Telephone",
  "wlc" = "Waitlist",
  "cau" = "Care As Usual",
  "ush" = "Unguided Self-Help") %>% 
  data.frame() %>% 
  set_colnames("description") %>% 
  rownames_to_column("id") ->
##    id        description
## 1 ind         Individual
## 2 grp              Group
## 3 gsh   Guided Self-Help
## 4 tel          Telephone
## 5 wlc           Waitlist
## 6 cau      Care As Usual
## 7 ush Unguided Self-Help

We can then put data and into a list so we have all the information in one place.

TherapyFormatsGeMTC <- list(data = data, 

Multi-Arm Trials

Since our data contains a multi-arm trial, we cannot yet use the generated data set in {gemtc} as is. For multi-arm trials, more than one effect size is calculated, and these effect sizes are usually correlated. Suppose a trial contains \(J=3\) conditions; A, B and C. Using a A as the reference group, we can calculate two effect sizes, one for the A-B and another for the A-C comparison. Since A is used twice, the effect sizes are assumed to be correlated (see e.g. Borenstein et al., chapter 25). To account for this non-independence in multi-arm trials, we also need to specify the standard error of the reference arm before we can model our data in {gemtc}.

Franchini and colleagues (2012) have shown that the standard error of the base arm is equal to the square root of the covariance between the treatment contrasts. The problem is that this covariance between treatment comparisons is hardly ever reported in published articles. This means that the value has to be imputed. Franchini et al. mention several, partly quite sophisticated approaches to estimate the covariance between contrast-level data in multi-arm trials. When dealing with log-odds ratios as the summary measure, the simplest way is to calculate the \(\text{SE}\) of the log-odds in the reference arm.

Since our data includes only one three-arm study and two-arm studies otherwise, we will use a rather simple approach in this example. As mentioned, the required standard error in the base/reference arm can be calculated as the square root of the covariance between the two calculated effects in our multi-arm study. Since \(\text{Cov}(x,y) = \text{Cor}(x,y)\sqrt{\text{Var}(x)\text{Var}(y)}\), the following formula can be applied to estimate the reference arm standard error (Schwarzer et al., 2015, chapter 7.1):


\[\text{SE}_{i,j=1} = (\text{Cov}[y^{\text{con}}_{t_{i,1},t_{i,2}},y^{\text{con}}_{t_{i,1},t_{i,3}}])^{0.5} ~\hat{=}~ (\text{SE}^{\text{con}}_{t_{i,1},t_{i,2}}\text{SE}^{\text{con}}_{t_{i,1},t_{i,3}} \hat\rho_{\delta_{i,12}\delta_{i,13}})^{0.5}\]


Where \(\text{SE}_{i,j=1}\) is the standard error in arm 1 (the base arm) of trial \(i\). The value of \(\hat\rho_{\delta_{i,12}\delta_{i,13}}\) is our estimate of the true correlation between the two effect sizes calculated for \(i\). This correlation depends on the design of the study, but will usually hover around 0.5 (see e.g. Borenstein et al., chapter 25). Using the formula above, we can also conduct sensitivity analyses for varying values of \(\hat\rho_{\delta_{i,12}\delta_{i,13}}\) and check how they affect our final results.

In our example data set, the study by Breiman (2001) is a multi-arm trial:

# Show studies with more than two conditions
TherapyFormatsGeMTC$data %>% 
  pull(study) %>% 
  table() %>% 
  {.[. > 2]}
## Breiman, 2001 
##             6
# Select Breiman, 2001 study
TherapyFormatsGeMTC$data %>% 
  filter(study == "Breiman, 2001")
## # A tibble: 6 × 4
##   study           diff std.err treatment
##   <chr>          <dbl>   <dbl> <chr>    
## 1 Breiman, 2001 -0.085   0.516 ind      
## 2 Breiman, 2001 NA      NA     gsh      
## 3 Breiman, 2001 -0.75    0.513 ind      
## 4 Breiman, 2001 NA      NA     wlc      
## 5 Breiman, 2001 -0.664   0.514 gsh      
## 6 Breiman, 2001 NA      NA     wlc

There are two problems concerning the data format of this trial. First, there is one “redundant” effect size. To use {gemtc}, we have to make sure that effect sizes in multi-arm studies are always based on the same reference group. In our example, this is wlc. We can remove the first two lines corresponding with the ind vs. gsh comparison, since this effect can be derived from the other two contrasts anyway (effect sizes in multiarm trials are consistent by design). The fourth line can also be removed because we do not want to specify the base arm twice.

TherapyFormatsGeMTC$data <- TherapyFormatsGeMTC$data[-c(15,16,32),]

Furthermore, in the std.err column, we have to replace NA with the standard error of the base arm (for wlc). Assuming \(\hat\rho=0.5\) for the correlation of the treatment comparisons, for example, we can estimate the co-variance, and thus the standard error for the base arm wlc.

# Show base arm SE
se.base <- sqrt(0.513*0.514*0.5)

## [1] 0.3630992
# The wlc arm is in row 365
TherapyFormatsGeMTC$data[365, "std.err"] <- se.base

Let us check how the final data structure looks like for our multi-arm trial:

TherapyFormatsGeMTC$data %>% 
  filter(study == "Breiman, 2001")
## # A tibble: 3 × 4
##   study           diff std.err treatment
##   <chr>          <dbl>   <dbl> <chr>    
## 1 Breiman, 2001 -0.75    0.513 ind      
## 2 Breiman, 2001 -0.664   0.514 gsh      
## 3 Breiman, 2001 NA       0.363 wlc

As mentioned above, when it is unclear what the true value of \(\hat\rho\) is, one may assume different values of \(0<\hat\rho<1\), and check to what extent these different assumption affect the overall results of the network meta-analysis using {gemtc}.


Borenstein, M., Hedges, L. V., Higgins, J. P., & Rothstein, H. R. (2011). Introduction to meta-analysis. John Wiley & Sons.

Franchini, A. J., Dias, S., Ades, A. E., Jansen, J. P., & Welton, N. J. (2012). Accounting for correlation in network meta‐analysis with multi‐arm trials. Research Synthesis Methods, 3(2), 142-160.

Schwarzer, G., Carpenter, J. R., & Rücker, G. (2015). Meta-analysis with R. New York: Springer.