This is a past event. Registration is closed. View other South African Statistical Association events.

An Introduction to Spatial Data Science with R


Edzer Pebesma

Director of the Institute for Geoinformatics, University of Münster


This workshop will give an introduction into handling and analysing spatial vector and raster data with R, and exemplify a number of spatial statistical methods including point pattern analysis, geostatistical analysis, and lattice data analysis. The workshop will focus on R packages sf and stars, and a number of analysis packages that are compatible with these. Some prior experience with R is strongly recommended.


Monday, 28 November 2022

13h30-17h30

In person (Hybrid)

Models for Health Outcomes using Data from Population Registries and Surveys


Ruth Etzioni

Professor, Fred Hutchinson Cancer Research Center


This workshop will present methods for analyzing non-normal outcomes in health data studies with a focus on counts and health care costs. The workshop will cover different regression modeling frameworks tailored to the distributional properties of these outcomes with examples drawn from two major data sources in the US – a national cancer registry and a national health survey that includes annual health expenditure information. Additionally, the G-computation method for marginal effect estimation in non-linear regression models, propensity score analysis with inverse probability weighting for causal effect estimation in observational studies, and methods for accommodating complex survey designs will be covered. The properties, strengths and weaknesses of registries and surveys as sources for health outcomes models will also be discussed. All analyses will be programmed in R and code will be provided to all workshop participants. This workshop will draw on material from the text, "Statistics for Health Data Science: An Organic Approach," co-authored by Dr Etzioni. 


Tuesday, 29 November 2022

08h30-12h00

In-person (Hybrid)

A Practical Introduction To Gaussian Process Regression and Bayesian Optimization


Robert Gramacy

Professor, Virginia Polytechnic and State University University (Virginia Tech)


Gaussian process regression is ubiquitous in spatial statistics, machine learning, and the surrogate modeling of computer simulation experiments. Fortunately their prowess as accurate predictors, along with an appropriate quantification of uncertainty, does not derive from difficult-to-understand methodology and cumbersome implementation. We will cover the basics, and provide a practical tool-set ready to be put to work in diverse applications. As one example of an one application where Gaussian processes play a fundamental role, we will introduce Bayesian optimization. The presentation will involve accessible slides authored in Rmarkdown, with reproducible examples spanning bespoke implementation to add-on packages.  


Tuesday, 29 November 2022

12h30-16h00

In-person (Hybrid)

New Developments in Data Science and Data Science in Africa


Berthold Lausen

Professor, University of Essex


The workshop invites presentations by participants on research or education in Data Science. Please, contact the organisers via Berthold Lausen, past president International Federation of Classification Societies (IFCS), email blausen@essex.ac.uk , if you are interested to present in the first part of the workshop (1.30 pm to 3.30 pm) and suggest a title and brief abstract of your presentation by 30 September 2022 and if you are planning to attend the workshop.

 

IFCS has two member societies/groups of from Africa (see under http://ifcs.info/ ): Multivariate Data Analysis Group of the South African Statistical Association (SASA-MDAG) and Moroccan Classification Society. In the second part of the workshop (4.00 pm to 5.30 pm) we plan a discussion on interest, added value and aims for further regional Data Science groups from Africa to join the IFCS.  


Tuesday, 29 November 2022

12h30-16h00

In-person (Hybrid)

Special Interest Group Workshops

MDAG Workshops


Visual methods for multivariate data - a journey beyond 3D

Dianne Cook

Professor,  Department of Econometrics and Business Statistics

Monash University, Australia

Morning Session


This workshop will explain how to use dynamic plots constructed from low-dimensional linear projections, called tours, to examine multivariate data spaces. The tour projections are read similarly to a biplot. There are several tour types, grand, guided, manual, local, and slice, that are useful and you will learn about. These can be helpful when conducting analyses involving non-linear dimension reduction, like t-SNE, and machine learning models, both supervised and unsupervised classification. We will include working with high-dimension, low-sample size data.


Bring your laptop, loaded with R, RStudio and the R package tourr, to follow along with me.


Independent Components Analysis

David Hofmeyr

Department of Statistics and Actuarial Science

Stellenbosch University

Afternoon Session


Components analysis, broadly speaking, refers to the problem of modelling the primary sources of information contained in a set of data. In the linear context,


which forms the framework for the present discussion, these sources are represented by linear combinations of the variables on which the observations in the data have been measured. By far the most well known is the Principal Components Analysis (PCA) model, where the objective is to retain as much variation from the data as possible, by minimising the squared residuals between the original observations and their projection onto the principal components subspace.


 Although already decades old, and very popular in many fields, the Independent Components Analysis (ICA) model is less well known among the "traditional statistics" community. The problem of ICA is to identify the (statistically) independent sources of information in the data. Perhaps surprisingly, given the seemingly very difficult objective in the abstract, when the data generating distribution can be described through a linear "mixing" of statistically independent source variables, even fairly simple objectives lead to consistent estimation; with the only proviso being that the sources are non-Gaussian.


In this talk I will introduce the ICA problem in greater detail, and discuss some important applications where it has had tremendous impact. I will then introduce some of the more common methods for the estimation of independent components (ICs), as well as some work of my own on enhancing the computational aspects of a more direct approach based on non-parametric pseudo-likelihood maximisation. I will also touch briefly on the problem of estimating ICs in the online context, where estimation needs to be conducted in real time with the receipt of a stream of data seen only one at a time.


Monday, 28 November 2022

08h30-17h00

In-person (Hybrid)

Bayes Group Workshop:

Use of informative priors in confirmatory studies, along with a hands-on session in R, followed by address by ISBA President


Rajat Mukherjee

Alira Health


The workshop in Bayesian statistics is aimed to provide industry researchers (statisticians as well as domain experts), academicians and students working in medicine and healthcare with an introduction to the topic along with some specific examples and use cases from the pharmaceutical industry. The workshop will focus on translating historical data, for example, from previously conducted randomized clinical trials into informative priors for the parameters of interest which can then be used in the design and analysis of future trials. This approach of ~Bayesian-Borrowing is gaining interest particularly for investigations in rare diseases and for medical devices. We will discuss a common problem of Prior-Data conflict in this setting and methodologies to control borrowing from historical data in the presence of a conflict. We will also be discussing a recently conducted trial COVID vaccine trial that was conducted in the Bayesian framework. The workshop will conclude with a hands-on session implementing a Bayesian design using the open source R software. Participants are encouraged to install R and the following packages on their laptops prior to attending the workshop.

  • ggplot2
  • RBesT
  • parallel
  • mcmc
  • mvtnorm
  • rstan
  • rstanarm


Workshop Outline

  • Introduction to Bayesian Thinking
  • Constructing informative prior for different endpoint types: continuous, binary, survival
  • Problem of Prior-Data Conflict and how it can be accounted for in a
  • Bayesian dynamic-borrowing methodologies
  • Establishing the frequentist operating characteristics (type-I and power) for a Bayesian design - why and how.
  • Case studies in medical devices, rare diseases and COVID vaccine development
  • Practical session using R


CPD Information

  • Attendance of this workshop will equate to 1 CPD (SACNASP) point.


The workshop will conclude with a address from the ISBA president, Prof Sudipto Banerjee, who will join remotely.


Monday, 28 November 2022

08h30-17h00

In-person (Hybrid)