GlobusWorld 2024 Program

Pre-conference Online Sessions (free and open to all)

Wednesday, April 24, 2024, 10:00—13:30 CDT
Presenters: Lev Gorenstein, Greg Nawrocki, Brigitte Raumann

If you are new to Globus we encourage you to join this free online session in advance of the conference. We will present introductory material that provides an overview of Globus and prepares you for the in-person sessions.

Topics to be covered include:

  • Introduction to Globus for Researchers and New Users: This is a high-level survey of the extensive research capabilities available on the Globus platform, aimed at researchers. We will describe common use cases and demonstrate how to get started with data transfer and sharing, using Globus Connect Personal on your laptop.
  • Introduction to Globus for System Administrators: We will provide an overview of the process for installing and configuring Globus Connect Server to make your storage system(s) accessible via Globus. This is aimed at system administrators who will be responsible for their institution's Globus deployment.
  • Enabling Data FAIRnesswith Globus: We will review best practices for sharing active data among collaborators and describe how the Globus platform can facilitate data description, publication, and discovery.

We look forward to seeing you in Chicago!

(click on a presentation title to view abstract)
Tuesday, May 7, 2024
Sessions will be held in TBD
8:00—17:00 registration desk open
TBD
8:00—9:00 breakfast
TBD
9:00—10:30 Rachana Ananthakrishnan, Vas Vasiliadis, Globus
Ken Miller, ESnet, EPOC

We will explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This session is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—braoder deployments at their institution. We will also present an overview of optimizing performance and troublwshooting network issues on Globus data transfer nodes, in conjunction with experts from ESnet.

10:30—10:45 break
TBD
10:45—12:00 Ada Nikolaidis, Globus

We will present an overview of Globus services for automating research computing and data management tasks, to accelerate research process throughput. This session is aimed at system administrators and researchers who wish to automate repetitive data management tasks (such as data distribution to collaborators), as well as those working with instruments (cryoEM, next-gen sequencers, fMRI, etc.) who wish to streamline data egress, downstream analysis, and sharing at scale. The material in this session will serve as an introduction to more advanced topics that will be covered in the deep dive session tomorrow.

12:00—13:00 lunchTBD
13:00—14:15 Ian Foster, Globus Co-Founder
Rachana Ananthakrishnan, Executive Director, Globus

TBD

14:15—14:45 break
TBD
14:45—16:00 Ben Brown, Director, Facilities Division, Advanced Scientific Computing Research, U.S. Department of Energy

Abstract coming soon

Jonathan Ozik, Senior Computational Scientist, Argonne National Laboratory

COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.

16:00—17:00 Globus Office Hours TBD
17:00—18:30 welcome reception TBD
Wednesday, May 8, 2024
Sessions will be held in TBD
8:00—17:00 registration desk open
TBD
8:00—9:00 breakfast
TBD
9:00—10:30 Greg Gunther, Science Data Management Branch Chief, U.S. Geological Survey

The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data.  Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.

Matt Pritchard, JASMIN User Service Manager, STFC/Rutherford Appleton Laboratory Space/CEDA

JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.

Tibor Auer, Senior Research Software Engineer, Rosalind Franklin Institute

The Rosalind Franklin Institute hosts a variety of scientific instruments, which allow us to capture a multifaceted and multilevel view of biological systems, generating around 70 terabytes of data a month. Distributed solutions, such as Globus and Ceph, facilitates storage, access, and transfer of large amount of data. However, we still must deal with the heterogeneity of the file formats and directory structure at acquisition, which is optimised for fast recording, rather than for efficient storage and processing. Our data infrastructure includes local storage at the instruments and workstations, distributed object stores with POSIX and S3 access, remote storage on HPCs, and taped backup. This can pose a challenge in ensuring fast, secure, and efficient data transfer. Globus allows us to handle this heterogeneity, while its Python SDK allows us to automate our data infrastructure using Globus microservices integrated with our data access models. Our data management workflows are becoming increasingly complex and heterogenous, including desktop PCs, virtual machines, and offsite HPCs, as well as several open-source software tools with different computing and data structure requirements. This complexity commands that data is annotated with enough details about the experiments and the analysis to ensure efficient and reproducible workflows. This talk explores how we extend Globus into different parts of our data lifecycle to create a secure, scalable, and high performing automated data infrastructure that can provide FAIR[1,2] data for all our science. 1. https://doi.org/10.1038/sdata.2016.18 2. https://www.go-fair.org/fair-principles

Sandra Gesing, Senior Researcher, San Diego Supercomputer Center

Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.

Mark Beno, Executive Director, Cleveland Institute for Computational Biology, Case Western Reserve University

The Stimulating Peripheral Activity to Relieve Conditions (SPARC) program at NIH is a mechanism to promote research that leads to the development of improved devices that function to modulate electrical activity in peripheral nerves. This therapeutic approach has been proposed for disorders such as hypertension, heart failure, epilepsy, etc. The SPARC REVA (Reconstructing Vagal Anatomy) project at Case Western Reserve University and Duke University aims to map the vagus nerve in 50 human cadavers; definition of the organization of the axons and fascicles can impact the success of neuromodulation therapeutics and potentially reduce unwanted side effects. The CWRU/Duke SPARC REVA team uses multiple imaging modalities to provide definition of the vagus nerve along its entire length, including MRI, 3D tracing of the entire nerve, microCT, histology and MUSE (Microscopy with UV Surface Excitation) of important branch points, and electron microscopy of selected regions. Each cadaver in this study will have approximately 5-10 TB of raw image files (500TB total). One goal of the SPARC program is the establishment of a data analysis and visualization center and coordination with the SPARC data and resource center (DRC) to implement connectivity maps for landmarks along the vagus nerve and create interactive 2D/3D visualizations on the SPARC Portal (https://sparc.science). The CWRU/Duke SPARC REVA team stores data at CWRU in the High Performance Computing environment and the Globus interface is used to upload raw and processed image files to the DRC, as the Globus interface provides fast and reliable file transfers.

Jennifer Schopf, Director, TACC/University of Texas Austin

NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?

10:30—10:45 break
TBD
10:45—12:00 Josh Bryan, Globus

We will describe the deployment and use of Globus Compute for remote computation. This session is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.

12:00—13:00 lunchTBD
13:00—15:00 Rachana Anathakrishnan, Vas Vasiliadis, Globus

We will dive deeper into the Globus automation and remote computation services, demonstrating how they may be combined to streamline common instrument-based data management tasks—from data capture through publicaiton—. This session is aimed at system administrators and research software engineers building solutions to enable large-scale data- and computation-intensive research.

15:00—15:30 break
TBD
15:30—17:00

TBD

Thursday, May 9, 2024
Meeting will be held at TBD
8:00—9:00 continental breakfast
TBD
09:00—12:00

The Customer Forum is an opportunity for Globus subscribers to discuss their experiences with the service, to learn about our product development plans, and to provide input on future product directions. Attendance at the customer forum is by invitation only. If you would like to represent your institution/community please contact us for an invitation.

12:00—13:00 lunch
TBD

Platinum Sponsors

Amazon Web Services StarFish

Gold Sponsors

Internet2 Spectra Logic Wasabi Seagate

Silver Sponsor

SGX3 iRODS Omnibond

Important Dates

Past Event Programs

2023 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011