Risk-Based Monitoring of Survival Data

  • Zhang Zhizhuo


In clinical trials, on-site monitoring is traditionally used to validate trial data quality, reveal abnormal data and identify risk factors. But little evidence has found is has positive effect on bias reduction and precision improvement. Central monitoring is an alternative of on-site monitoring, which can identify sites with higher risks of bias, errors and deviations remotely and effectively.

Get Help With Your Essay

If you need assistance with writing your essay, our professional essay writing service is here to help!

Find out more

Time to event is commonly employed as endpoint especially in tumor therapy trial. Any factors that may reduce the accuracy and precision of survival data would lead to a biased trial result. So survival data can be a potential target for central risk-based monitoring. By revealing unusual pattern or inaccuracy of survival data in site level, risk sites can be identified.

This study aims to establish an algorithm and a risk model for monitoring survival data and identifying risk sites, and to generate a reusable SAS program for future application of the risk model.

Metrics of abnormal event count and proportion in each site will be served as monitoring target. Test for difference between proportions comparing each site with other sites will be applied on proportion data. For rare event, Poisson loglinear regression will be used for calculate relative risk of abnormal event occurrence between each site and other sites. Risk flag on particular site will be reported when a significant result occur.

Table of Content


1. Background

3. Objectives

4. Study Design

5. Methodology

5.1 Restructure datasets according to CDISC

5.2 Algorithm

5.3 Model validation and generalization

5.4 SAS Programming

5.5 Dataset

6. Expected outcomes


Appendix A

Appendix B


In clinical trials, quality assurance – including site performance and data validity – is the essential foundation of maximizing precision of trial results. Varies types of error may occur in all aspect in clinical trials: design error, procedural error, recording error, fraud and analytical error [1]. Any factors involved with these errors are considered as risks. Different monitoring methods can be allocated to detect and reveal specific kinds of risks in clinical trials: trial oversight committee, on-site monitoring and central monitoring.

Traditionally, data quality of clinical trials is validated by on-site monitoring. On-site visiting is an expensive monitoring approach which take approximate 30% of total trial costs in pharmaceutical industry [2]. However, 84% of the pharmaceutical industry and 89% of Contract Research Organizations (CRO) still rely heavily on practices of on-site visiting [3]. Despite this current situation, little evidence has found that on-site monitoring has significant positive effect on bias reduction and precision improvement in clinical trials.

Recently, Food and Drug Administration (FDA) [4] published “Guidance for Industry: Oversight of Clinical Investigations—A Risk-Based Approach to Monitoring.” In this guidance, FDA encourages greater use of centralized monitoring practices. Using these approaches, sites with higher risks of bias, errors and deviations can be identified remotely. By only visiting sites of concerns instead of 100% source data verification, costs and time can be reduced effectively. So far, many statistical methods have been developed to be employed in centralized monitoring, which are proved to be efficient and reliable [5-9]. These statistical methods form the cornerstone of risk-based monitoring.

In clinical trials, time to event is commonly employed as endpoint to evaluate the efficacy of the treatment. Especially in cancer therapy trials, time to progression is served as tumor-assessment endpoint (when majority of deaths are unrelated to the disease) [10] or even primary endpoint. Any factors that may reduce the accuracy and precision of this kind of data – survival data – would lead to a biased trial result, and the interpretation of the result might become inaccurate or of no value. While conducting a multicenter trial, it is of vital importance to check the validity of data updated at intervals, to identify the sites of concern and correct actions of risk. Factors involved with survival outcome including missing data, illogical data and abnormal data, can be a potential targets for risk-based monitoring survival data.

Presently, Clinical Data Interchange Standards Consortium (CDISC) [11] provides “standards to support the acquisition, exchange, submission and archive of clinical research data and metadata.” In advantage of CDISC normative data structure, especially Study Dara Tabulation Model (SDTM) and Analysis Data Model (ADaM), a data template can be established while the multicenter trial is ongoing. All data generated in the trial can be updated and restructured on the basis of the data template. This kind of formatted data structure provides great convenience for routinely data monitoring and validation.

Meanwhile, once an algorithm for risk-based monitoring is generated, statistical model is build and the corresponding SAS program is coded, they can be applied to several trials and datasets which sharing the same monitoring target.


  1. To establish an algorithm and a risk model for monitoring survival data, which is required to be capable of identifying trial centers with risk factors by revealing abnormal data;
  2. To generalize the algorithm and the risk model for application on clinical trials;
  3. To generate a reusable SAS program for application of the risk model.

4.Study Design

  1. Choose adequate metrics according to conventional monitoring targets, establish the algorithm and risk model, set appropriate criteria for risk flag.
  2. Apply the risk model on a real clinical trial dataset, identify risk sites. Compare the sites identified by model and sites with high risk known in advance, calculate sensitivity and specificity of the risk model.
  3. Generalize the risk model according to validation result, generate reusable SAS program for the risk model.


5.1 Restructure datasets according to CDISC

By implementation of Study Data Tabulation Model (SDTM), raw data will be sorted in formatted tabulations with observations of individual subjects. Attributes (name, label, type, length, description, etc.) of every metadata will be reset to meet SDTM conventions. And variables will be classified into corresponding domains.

By implementation of Analysis Data Model (ADaM), data will first be structured into the subject-level analysis dataset (ADSL) formats. Subject-level variables will be specified to be ready for analysis. Specific variables will be calculated and formatted into Basic Data Structure (BDS) for site-level data analysis.

CDISC template for risk model establishment is listed in Appendix A. All the original data will be structured in standardized formats according to this template. And this CDISC template will be reusable for future application.

5.2 Algorithm

The statistical methods for different metrics to report risk flag are summarized in Table 1.

  1. Metrics: Monitoring targets for the risk model is chosen according to conventional monitoring practice. They will be missing randomization date, missing screening date, illogical date, censoring, death and tumor response. These kinds of data is involved with data integrity and data accuracy, and may has potential effect on survival data. Abnormal events in each target of every site will be counted and corresponding proportion will be calculated.
  2. Test for difference between proportions: Proportion metrics of each site will be compared with other sites by calculating t statistics and corresponding p-value. Sites with p-value (two-tailed) < 0.05 will be marked with risk flag.
  3. Poisson loglinear regression: For rare events (proportion metrics in sites are generally very low), Poisson loglinear regression will be implied to obtain point estimate and confidence interval (CI) of risk ratio (RR) in each site. CI of RR does not contain 1 will be considered as risk factor, and site will be marked by risk flag.
Table 1. Metrics and Statistical Methods
Item Metric Method Risk Flag
1 Missing randomization date proportion Test for proportions p-value<0.05 (two-tailed)
Missing randomization date count per site Poisson loglinear regression CI of RR does not contain 1
2 Missing screening date proportion Test for proportions p-value<0.05 (two-tailed)
Missing screening date count per site Poisson loglinear regression CI of RR does not contain 1
3 Illogical date proportion Test for proportions p-value<0.05 (two-tailed)
Illogical date count per site Poisson loglinear regression CI of RR does not contain 1
4 Censoring proportion Test for proportions p-value<0.05 (two-tailed)
Censoring count per site Poisson loglinear regression CI of RR does not contain 1
5 Death proportion Test for proportions p-value<0.05 (two-tailed)
Death count per site Poisson loglinear regression CI of RR does not contain 1
6 Tumor response rate Test for proportions p-value<0.05 (two-tailed)
Tumor response count per site Poisson loglinear regression CI of RR does not contain 1

5.3 Model validation and generalization

Apply the monitoring model on a real clinical trial dataset of which the risks have already known. Risk sites are expected to be marked with risk flag, and the opposite for sites without risks. Accuracy of the model will be tested by calculating sensitivity and specificity.

In order to generalize the risk model for application on clinical trial data, proper metrics and corresponding statistical methods will be chosen to acquire higher accuracy and balance sensitivity and specificity. For example, if missing data proportions in sites are generally high, test for comparison between proportions will be used to identify risk site; however, if missing data proportion in each site is generally low, then missing data count will be considered as the appropriate metric and Poisson loglinear regression will be allocated.

5.4 SAS Programming

Statistical software applied to this project will be SAS, version 9.3. All the procedures will be processed by SAS program. Macros will be utilized to make the program reusable. Flow charts of SAS programming logic are listed in Appendix B.

5.5 Dataset

Dataset is from a real clinical trial data. Risk information of dataset is already known. Dataset will be used for external validation of the model.

6.Expected outcomes

  1. Establish a risk model for central statistical monitoring of survival data in clinical trials.
  2. Generate a SAS program reusable and applicable in pharmaceutical industries and CROs.
  3. Write an article for graduation.


  1. Baigent C, Harrell FE, Buyse M, Emberson JR, Altman DG. Ensuring trial validity by data quality assurance and diversification of monitoring methods. Clinical Trials 2008 February 01;5(1):49-55.
  2. Eisenstein EL, Collins R, Cracknell BS, Podesta O, Reid ED, Sandercock P, et al. Sensible approaches for reducing clinical trial costs. Clinical Trials 2008 February 01;5(1):75-84.
  3. Morrison BW, Cochran CJ, White JG, Harley J, Kleppinger CF, Liu A, et al. Monitoring the quality of conduct of clinical trials: a survey of current practices. Clinical Trials 2011 June 01;8(3):342-349.
  4. FDA. Guidance for Industry: Oversight of Clinical Investigations—A Risk-Based Approach to Monitoring. 2013 August.
  5. Venet D, Doffagne E, Burzykowski T, Beckers F, Tellier Y, Genevois-Marlin E, et al. A statistical approach to central monitoring of data quality in clinical trials. Clinical Trials 2012 December 01;9(6):705-713.
  6. Pogue JM, Devereaux P, Thorlund K, Yusuf S. Central statistical monitoring: Detecting fraud in clinical trials. Clinical Trials 2013 April 01;10(2):225-235.
  7. Buyse M, George SL, Evans S, Geller NL, Ranstam J, Scherrer B, et al. The role of biostatistics in the prevention, detection and treatment of fraud in clinical trials. Stat Med 1999 Dec 30;18(24):3435-3451.
  8. Bakobaki JM, Rauchenberger M, Joffe N, McCormack S, Stenning S, Meredith S. The potential for central monitoring techniques to replace on-site monitoring: findings from an international multi-centre clinical trial. Clinical Trials 2012 April 01;9(2):257-264.
  9. Kirkwood AA, Cox T, Hackshaw A. Application of methods for central statistical monitoring in clinical trials. Clinical Trials 2013 October 01;10(5):783-806.
  10. FDA. Guidance for Industry: Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics. 2007 May.
  11. Available at: //www.cdisc.org/CDISC-Vision-and-Mission.

Appendix A

Variable Name Variable Label Type Controlled Terms, Codelist or Format Role CDISC Notes Core
STUDYID Study Identifier Char Identifier Unique identifier for a study Req
SITEID Study Site Identifier Char Identifier Unique identifier for a site within a study. Req
USUBJID Unique Subject Identifier Char Identifier Identifier used to uniquely identify a subject across all studies for all applications or submissions involving the product. Req
RANDDT Randomization Date Num Record


Required in randomized trials Cond
SCNDCT Screening Date Num Record


STARTDT Time to Event Origin Date for Subject Num Record


The original date of risk for the time-to-event analysis. This is generally the time at which a subject is first at risk of the event of interest (as defined in the protocol or Statistical Analysis Plan). For example, this may be the randomization date or the date of first study therapy exposure. Perm
SVDTC Date/Time of Visit Char ISO 8601 Timing Data/Time of a Subject Visit Exp
RFPENDTC Date/Time of End of Participation Char ISO 8601 Record


Date/time when subject ended participation or follow-up in a trial, as defined in the protocol, in ISO 8601 character format. Should correspond to the last known date of contact. Examples include completion date, withdrawal date, last follow-up, date recorded for lost to follow up, or death date. Exp
CNSR Censor Num Record


Defines whether the event was censored (period of observation truncated prior to event being observed). It is strongly recommended to use 0 as an event indicator and positive integers as censoring indicators. It is also recommended that unique positive integers be used to indicate coded descriptions of censoring reasons. CNSR is required for time-to-event parameters. Cond
TUORRES Tumor Identification Result Char Result


Result of the Tumor identification. The result of tumor identification is a classification of identified tumor.

Examples: When TUTESTCD=TUMIDENT (Tumor Identification), values of TUORRES might be: TARGET, NON-TARGET, or NEW. Or BENIGN ABNORMALITY

TRORRES Result or Finding in Original Units Char Result


Result of the Tumor measurement/assessment as originally received or collected. Exp
TRORRESU Original Units Char (UNIT) Variable


Original units in which the data were collected. The unit for TRORRES.

Example: mm

DTHDTC Date/Time of Death Char ISO 8601 Record


Date/time of death for any subject who died, in ISO 8601 format. Should represent the date/time that is captured in the clinical-trial database. Exp

Appendix B



Most Used Categories

EssayHub’s Community of Professional Tutors & Editors
Tutoring Service, EssayHub
Professional Essay Writers for Hire
Essay Writing Service, EssayPro
Professional Custom
Professional Custom Essay Writing Services
In need of qualified essay help online or professional assistance with your research paper?
Browsing the web for a reliable custom writing service to give you a hand with college assignment?
Out of time and require quick and moreover effective support with your term paper or dissertation?