GlobusWorld 2024 Program

Pre-conference Online Sessions (free and open to all)

Wednesday, April 24, 2024

This is a high-level survey of the extensive research capabilities available on the Globus platform, aimed at researchers. We will describe common use cases and demonstrate how to get started with data transfer and sharing, using Globus Connect Personal on your laptop.

11:00—11:15 break

We will provide an overview of the process for installing and configuring Globus Connect Server to make your storage system(s) accessible via Globus. This is aimed at system administrators who will be responsible for their institution's Globus deployment.

12:30—12:45 break

We will review best practices for sharing active data among collaborators and describe how the Globus platform can facilitate data description, publication, and discovery.

We look forward to seeing you in Chicago!

(click on a presentation title to view abstract)
Tuesday, May 7, 2024
Sessions will be held in the Winter Garden
8:00—17:00 registration desk open
8:00—9:00 breakfast
9:00—10:30 Rachana Ananthakrishnan, Vas Vasiliadis, Globus

We will explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This session is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—braoder deployments at their institution. We will also present an overview of optimizing performance and troublwshooting network issues on Globus data transfer nodes, in conjunction with experts from ESnet.

Ken Miller, ESnet, EPOC

ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot netowrk issues to maximize data transfer performance. In this talk we will present a summary of aporaches and tips for getting the most out of your network infrastructure using Globus Connect Server.

10:30—10:45 break
10:45—12:00 Ada Nikolaidis, Globus

We will present an overview of Globus services for automating research computing and data management tasks, to accelerate research process throughput. This session is aimed at system administrators and researchers who wish to automate repetitive data management tasks (such as data distribution to collaborators), as well as those working with instruments (cryoEM, next-gen sequencers, fMRI, etc.) who wish to streamline data egress, downstream analysis, and sharing at scale. The material in this session will serve as an introduction to more advanced topics that will be covered in the deep dive session tomorrow.

12:00—13:00 lunchPre-Function
13:00—14:15 Ian Foster, Globus Co-Founder
Rachana Ananthakrishnan, Executive Director, Globus


14:15—14:45 break
14:45—16:00 Ben Brown, Director, Facilities Division, Advanced Scientific Computing Research, U.S. Department of Energy

We will provide an overview of DOE’s IRI initiative as it moves into early implementation, what drives the IRI vision, and the role of DOE in the larger national research ecosystem.

Rachana Ananthakrishnan, Executive Director, Globus

We will review the role of Globus services in the DOE's Intergated Research Infrastructure.

Jonathan Ozik, Senior Computational Scientist, Argonne National Laboratory

COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.

Office Hours
Drafting Rooms 2 & 3

The Globus development team will be available to answer all your questions about the Globus service. Table topics include data transfer and sharing, Globus Connect and Premium Connectors, the web app, CLI and SDKs, Automate, Compute and Authentication

17:00—18:30 welcome reception Pre-Function
Wednesday, May 8, 2024
Sessions will be held in the Winter Garden
8:00—17:00 registration desk open
8:00—9:00 breakfast
9:00—10:30 Greg Gunther, Science Data Management Branch Chief, U.S. Geological Survey

The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.

Lee Liming, Director, Professional Services, University of Chicago - Globus

We will review how Globus services are enabling data publication at the USGS.

Matt Pritchard, JASMIN User Service Manager, STFC/Rutherford Appleton Laboratory Space/CEDA

JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.

Tibor Auer, Senior Research Software Engineer, Rosalind Franklin Institute

The Rosalind Franklin Institute hosts a variety of scientific instruments, which allow us to capture a multifaceted and multilevel view of biological systems, generating around 70 terabytes of data a month. Distributed solutions, such as Globus and Ceph, facilitates storage, access, and transfer of large amount of data. However, we still must deal with the heterogeneity of the file formats and directory structure at acquisition, which is optimised for fast recording, rather than for efficient storage and processing. Our data infrastructure includes local storage at the instruments and workstations, distributed object stores with POSIX and S3 access, remote storage on HPCs, and taped backup. This can pose a challenge in ensuring fast, secure, and efficient data transfer. Globus allows us to handle this heterogeneity, while its Python SDK allows us to automate our data infrastructure using Globus microservices integrated with our data access models. Our data management workflows are becoming increasingly complex and heterogenous, including desktop PCs, virtual machines, and offsite HPCs, as well as several open-source software tools with different computing and data structure requirements. This complexity commands that data is annotated with enough details about the experiments and the analysis to ensure efficient and reproducible workflows. This talk explores how we extend Globus into different parts of our data lifecycle to create a secure, scalable, and high performing automated data infrastructure that can provide FAIR[1,2] data for all our science. 1. 2.

Sandra Gesing, Senior Researcher, San Diego Supercomputer Center

Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.

Doug Southworth, Engineering Scientist, TACC/University of Texas Austin

NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?

10:30—10:45 break
10:45—12:00 Josh Bryan, Globus

We will describe the deployment and use of Globus Compute for remote computation. This session is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.

Chris Scott, Research Software Engineer, NeSI

In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.

12:00—13:00 lunchPre-Function
13:00—13:30 Aditya Tanikanti, Computer Scientist,  Argonne National Laboratory

Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.

Maxwell Grover, Data Scientist, Argonne National Laboratory

The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how globus-flows can be used for petabyte-scale climate analysis.

Nicholas Tyler, Scientific Data Architect, NERSC, Lawrence Berkeley National Laboratory

As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and I will give a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.

Tyler Skluzacek, Research Scientist, Oak Ridge National Laboratory

Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.

13:30—15:30 Rachana Anathakrishnan, Vas Vasiliadis, Globus

We will dive deeper into the Globus automation and remote computation services, demonstrating how they may be combined to streamline common instrument-based data management tasks—from data capture through publicatton—. This session is aimed at system administrators and research software engineers building solutions to enable large-scale data- and computation-intensive research.

Gus Ellerm, PhD Student, University of Cantebury

As scientific discovery and experimentation become increasingly reliant on computational methods, the static nature of traditional publications renders them progressively fragmented and unreproducible. How can workflow automation tools, such as Globus, be leveraged to address these issues and potentially create a new, higher-value form of publication? LivePublication leverages Globus’s custom Action Provider integrations and Compute nodes to capture semantic and provenance information during distributed flow executions. This information is then embedded within an RO-crate and interfaced with a programmatic document, creating a seamless pipeline from instruments, to computation, to publication.

15:30—16:00 break
Office Hours
Drafting Rooms 2 & 3

The Globus development team will be available to answer all your questions about the Globus service. Table topics include data transfer and sharing, Globus Connect and Premium Connectors, the web app, CLI and SDKs, Automate, Compute and Authentication

Thursday, May 9, 2024
19th flr. conference room at 401 N. Michigan Ave.
8:00—9:00 continental breakfast

The Customer Forum is an opportunity for Globus subscribers to discuss their experiences with the service, to learn about our product development plans, and to provide input on future product directions. Attendance at the customer forum is by invitation only. If you would like to represent your institution/community please contact us for an invitation.

12:00—13:00 lunch

Platinum Sponsors

Amazon Web Services StarFish

Gold Sponsors

Internet2 Seagate Spectra Logic Wasabi Storj iRODS

Silver Sponsor

Omnibond SGX3

Important Dates

Past Event Programs

2023 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011