Project Lead

Professor Douglas Boyle


Data-Driven Healthcare Improvement

Learn more

Lead partner

The University of Melbourne


Australia has fragmented data holdings across all clinical domains, extending to terminologies, data models, and the quality assurance mechanisms employed. Limited interoperability between research datasets, no widely agreed terminology standards, and inconsistencies in dataset quality assessment severely restrict the use of clinical data for research nationally. This program of work is delivering unique tools and methods to advance national data harmonisation and is supporting national data initiatives to deliver a consistent strategy in advancing Health Data Science. Accelerating data integration will result in increased quantity and quality of datasets available for research purposes, and ultimately improve the health of Australians.


  • To provide open availability of common medical terminologies and data mappings for our research community (Australia and Internationally).
  • To leverage such mappings as a part of providing resources to researchers to support the uplift of their data to a common data model – the Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) Common Data Model.
  • Focusing on:
    • Primary care data
    • Hospital EMR data
    • Administrative data
  • Underpin the above with mechanisms to allow for a standards-based assessment of the quality of data in repositories that helps ensure:
    • Researchers understand if specific datasets are able to answer the questions they pose
    • That the resulting research stands-up to peer review


This program of work is led by Professor Douglas Boyle, University of Melbourne, on behalf of MACH. Three additional translation centres – Health Translation Queensland, Health Translation SA, and Maridulu Budyari Gumal NSW – plus state and national agencies including AIHW, ARDC, CSIRO, and Queensland Health are core collaborators. This is an open, inclusive collaboration and additional stakeholders will join as the program progresses.


Melbourne Law School recently hosted the Leadership Forum on Trusted Research Environments for Health & Medical Data, bringing together leading experts in healthcare data and research. Participants delved into Australia’s healthcare data potential, anticipating significant advancements in the field. The forum emphasized the need for secure platforms allowing researchers to explore this data with confidence, ensuring data protection against external threats. With insights from leaders like Professor Dougie Boyle and representatives from various sectors, the discussion covered both challenges and opportunities of these data platforms. The event, part of the 2023 Leadership Forums by the Australian Research Data Commons (ARDC), addressed data-related challenges faced by Australian researchers. It featured expert-led panels and Q&A sessions. Watch a recording of the event here.

Current initiatives within the TDC program of work:

Development of Research Terminologies Australia (RosetTA) with Australian Research Data Commons (ARDC)

The TDC and ARDC are together investigating the feasibility, demand and benefits of tools to support the open sharing and curation of custom medical terminologies, terminology mappings and phenotypes. The aim is to build a service that allows non-technical communities of practice and research groups to build, curate and deploy such mappings and phenotypes, promoting transparency and consensus in definitions and facilitating their re-use at a national level.

This ‘Ontoserver’ service, building on existing advanced national terminology infrastructure, is provisionally called Research Terminologies Australia (RosetTA).

Hospital Electronic Medical Record Data (EPIC and Cerner) to OMOP

In 2020 the TDC received funding from the ARDC for Hospital EMR data as a National Data Asset for Research (10.47486/PS014). This project aims to establish a national, research ready hospital electronic medical record (EMR) Data Asset that will enhance data accessibility for rapid interrogation and evidence generation. Data will be transformed to an international gold standard Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM).

Tools, mappings and experience gained will be made openly available, and a community of practice and a roadmap for continued national implementation will be built. The Data Asset will significantly improve the quality, accessibility and feasibility of EMR data warehouses by streamlining data governance, consent and ethics. The project’s outcome is a CDM platform, converting three Cerner-based data warehouses: Queensland Health, Austin Health and Western Health. Learnings from the Cerner conversions will then be applied to the conversion of other data warehouse platforms including the EPIC EMR system which will be investigated in collaboration with the Peter MacCallum Cancer Centre.

Further details can be found on the project blog.

Competency based online training in the OMOP common data model is provided by the EHDEN Academy.

Patron conversion to OMOP

The Patron primary care dataset has been mapped to the OMOP common data model. Test use cases are currently being used to validate the dataset. This work is funded via University of Melbourne Department of General Practice and Primary Care in close collaboration with the TDC.

White Bandicoot data quality assessment tool

One of the TDC’s core objectives is to raise awareness around data quality issues with EMR data. To this end a team has been working with OHDSI to develop a new data quality tool “White Bandicoot”. White Bandicoot can connect to EMR data sources and run internationally recognised data quality metrics on EMR data and visually highlight quality issues with EMR data.

Importantly White Bandicoot can connect to source EMR data prior to any conversion to a common data model. White Bandicoot is currently being assessed for use by other research groups in Australia and Internationally.

OHDSI Australia

The TDC has been actively involved in the establishment of OHDSI Australia, a key resource in educating researchers on common data models. OHDSI Australia has developed an active membership promoting the use of the OMOP common data model. Webinars from leading experts in the field have been regularly held and video recordings are available on the OHDSI Australia website. Members of OHDSI Australia are also actively involved in the OHDSI Asia Pacific meetings. There has been training made available at no cost to participants. In August 2021 there was a two-day interactive data conversion course provided by OHDSI which was well attended by researchers collaborating with the TDC.

A webinar series is available on OHDSI Australia Website.

Register for OHDSI Australia teams here.

FHIR Terminology Services for OMOP

In conjunction with the TDC, Health Translation Queensland (HTQ) has partnered with CSIRO to demonstrate how HL7 Fast Healthcare Interoperability Resources (FHIR) Terminology Services Implementation in OMOP can better utilise structured data. To achieve this, HTQ and CSIRO have sought to better understand how the Observational Medical Outcomes Partnership (OMOP) and tooling supports the use of standard terminologies, and how FHIR terminology can be used within OMOP tooling, including prototype integration.


A project sub-committee reports to the AHRA Data Driven Healthcare Improvement (DDHCI) Committee. An Advisory Committee and the AHRA DDHCI Committee advise the sub-committee as required.

For further details see the Terms of Reference.

The development of these unique and on-going tools and methods to advance national data integration and harmonisation will stimulate collaborative research and dramatically increase the ability to improve the health of Australians.

The TDC has forged valuable local and national collaborations and has successfully promoted its goal of building awareness of national data collaboration. The group is committed to advancing knowledge in this field to increase awareness of the issues and drive potential solutions. TDC leadership have been actively involved in establishing the Australian chapter of OHDSI and participate in OHDSI Asia Pacific, a key resource in educating researchers on common data models. The group have contributed to training and education activities relevant to all aspects of data uplift for the undertaking of quality research.

Common data models impacts:

The conversion of the Patron General Practice dataset (population = 1.5 million) to the OMOP common data model has demonstrated suitability of the model and increases accessibility of this data for research. Together with the ongoing or pending conversion of all EMR data from Queensland Health Public Hospitals, Austin Health and Western Health to the model, this will significantly advance uplift to support informed decision-making through research.

The research into common data models has led to a close working relationship between the TDC leadership group and the Australian Research Data Commons (ARDC) and has led to further research funding from ARDC, recognising the national importance of this work (  TDC leadership are also represented on the ARDC Impact Reporting Working Group contributing to improved reporting of research outcomes across Australia.

Data quality importance and impact:

Ensuring consistently high-quality data is essential to ensuring that health data makes sense and the evidence generated from it is reliable.

Ongoing impact of quality common health data:

This provides the ability to cost-effectively answer important health questions and potentially reduces the burden of disease on individuals and society. Until this project this data has been fragmented and locked away in individual health systems around the country, limiting research use.

The Impact of Sharing Mappings:

The Transformational Data Collaboration is co-operating with the ARDC and CSIRO on a new service ‘Rosetta’: Research terminologies service. The ARDC are planning to offer this as a national service for sharing medical terminology mappings and codes between centres. This service will be built over the coming years with predicted life span of at least 10 years.

Common data models require medical terms to be linked or “mapped” to codes. This mapping is required for all datasets that use common data models, rather than each health district doing this individually it makes more sense to share these mappings. This saves time, money and improves the quality and consistency of the data.

The OMOP Common Data Model in Australian Primary Care Data; Building a Quality Research Ready Harmonised Dataset. Roger Ward, David Ormiston-Smith, Christine Chidgey, Christine Hallinan, Dougie Boyle (in preparation).

Boyle, Douglas R., Lawley, Michael, & Brownlee, Rowan. (2023, April 21). Research Terminologies Australia (Rosetta). Vocabulary Symposium 2022, Onsite (Canberra, ANU) and online. Zenodo.

Canaway, R., Boyle, D., Manski-Nankervis, JA. et al. Identifying primary care datasets and perspectives on their secondary use: a survey of Australian data users and custodians. BMC Med Inform Decis Mak 22, 94 (2022).

See the December 2020 Report:

AHRA Primary Care Data and Linkage – Australian dataset identification & capacity building