New Electron Microscopy Data Processing Portal
The new Electron Microscopy Data Processing Portal will help researchers process large datasets generated by CryoEM instruments.
November 30, 2022
In recent years the number of electron microscopy (EM) instruments, especially in cryogenic electron microscopy (CryoEM), has increased in Australia and internationally. With more EM instruments now in use, scientists are producing more large datasets that need to be processed and analysed. This has created a bottleneck, with research delayed as scientists need more resources to work with the deluge of data they have created.
To help process the large datasets generated by CryoEM instruments, the EM Data Processing Portal was created through the ARDC co-investment project Australian Characterisation Commons at Scale (ACCS). The EM Data Processing Portal is a tool for processing CryoEM datasets using software such as CryoSPARC. To support the wider electron microscopy community, the software LiberTEM is also available in the Portal.
The 2 software packages available in the new EM Data Processing Portal will accelerate CryoEM image processing. For example, the software CryoSPARC can process tens of thousands of structures from a micrograph, classify them, and align them to reconstruct a 3D image of a structure.
The EM Data Processing Portal supports the growing demand from the EM community nationwide. This tool is deployed at Monash University and QCIF, and accessible by researchers across Australia via their AAF login.
Dr David Poger, Research Data Manager at Microscopy Australia and leader of the work package that oversaw the setup of the portal at ACCS, said, “Data Processing is becoming more and more challenging due to the sheer amount of data being generated. The EM Data Processing Portal provides a new service to help uplift the capability of the community.”
Opening Up Electron Microscopy to the Next Generation of Scientists
Dr Farrah Blades, Postdoctoral Research Fellow at the Institute for Molecular Bioscience at The University of Queensland uses CryoEM instruments to characterise and create strains of algae that can produce biofuels and other sustainable products such as bioplastics.
According to Dr Blades, the new EM Data Processing Portal solves major difficulties in using data generated by EM instruments and will accelerate research.
“There are problems with processing data from EM instruments. Regular computers don’t have the GPUs we need to process our data. Supercomputers alone aren’t suitable because we need to visualise our data.”
Workstations are also not always suitable. “Our data can be too large to process on workstations. They are expensive, quickly become outdated and are overbooked. In addition, users still need to learn command line to do heavy jobs. It’s not a great solution, causing huge bottlenecks and barriers to entry into CryoEM.”
“We need an easy to use solution that’s command line free.”
“CryoSPARC is really easy to use, with drag and drop, click and play. It’s ideal for the next generation of scientists coming into this space.”
However, Dr Blades’ lab has been unable to use CryoSPARC as it is complicated to install and requires special permissions which do not suit current virtual machines. The new EM Data Processing Portal completely removes this barrier for Dr Blades and colleagues, as it’s a cloud-hosted solution, which gives them instant access to the powerful software.
“With this portal, the field of imaging becomes more attractive and easier to understand for the next generation of scientists when there’s no need to rely on command line programming.”
The CryoEM instruments Dr Blades and colleagues use at the University of Queensland have CryoSPARC Live installed on them, so thanks to the new portal, they will be able to use the data which comes from CryoSPARC Live independently of workstations.
Moving Data Quickly Thanks to Globus
To ensure the large amounts of data produced by EM instruments can be analysed quickly on the EM Data Processing Portal, Globus has been deployed within the portal.
Jay van Schyndel, Research DevOps Engineer from Monash University, said, “Globus is great for moving data over high speed, particularly when they are geographically distant from the EM Data Processing Portal nodes at QCIF and Monash University. For example, we’ve had researchers push data from Western Australia to QCIF quickly thanks to the service.”
“As a researcher, you can use Globus for free. It’s an excellent tool to transfer large amounts of data quickly and reliably.”
Globus is available for free to researchers at institutions that subscribe to EduGain. For more information, see the EM Data Processing Portal information on ImagingTools.
The EM Data Processing Portal is hosted on the ARDC Nectar Research Cloud, Australia’s research cloud designed for research computing.
Creating an Ecosystem of Tools Connected to Instruments
The EM Data Processing Portal is part of the Australian Characterisation Commons at Scale (ACCS), a $5.2 million data infrastructure project creating a rich ecosystem of computing systems, data repositories, workflows and services, all connected with powerful imaging instruments.
Capabilities developed and leveraged by the ACCS project will be used by thousands of researchers who use characterisation techniques, facility scientists who run instruments, and researchers using imaging collections, all across Australia. It will also uplift the research capability offered to industry.
Learn more about the EM Data Processing Portal and how to access it on ImagingTools.
The Australian Characterisation Commons at Scale (ACCS) project (doi.org/10.47486/PL101) is supported by the ARDC and is a partnership between Monash University, AARNet, MASSIVE, Microscopy Australia, National Imaging Facility, Pawsey Supercomputing Centre, QCIF, The University of Melbourne, The University of New South Wales, The University of Queensland, The University of Sydney, The University of Western Australia, University of Wollongong, Flinders University, RMIT University, Swinburne University.
The ARDC, Microscopy Australia, National Imaging Facility, and Pawsey Supercomputing Centre are funded through the National Collaborative Research Infrastructure Strategy (NCRIS).
Article written by the Australian Research Data Commons (ARDC) and originally posted on their website.
Reviewed by:
Dr David Poger (Microscopy Australia), Dr Farrah Blades (University of Queensland), Juliana Villa (Monash University).