Past Event – 2023 Program

Most sessions will be accessible online. Online attendees will be able to participate in the live discussion, but sessions are optimized for in-person attendees. We hope to see you in Chicago!

(click on a presentation title to view abstract)
Tuesday, April 25, 2023
Sessions will be held in the King Arthur Court
8:00—17:00 registration desk open
3rd floor
8:00—9:00 breakfast
Camelot Ballroom
9:00—10:00 Led by: Brigitte Raumann, Globus

We will review core Globus features for data management. We will demonstrate how to transfer and share data, install a Globus Connect Personal endpoint on your laptop, use the Globus Command Line Interface and automate data management tasks using the Globus web app. If you are new to Globus, this introduction will provide important context for subsequent sessions.

10:00—10:15 break
Camelot Ballroom
10:15—11:00 Led by: Greg Nawrocki, Globus

We will review the Globus Connect Server v5 (GCSv5) architecture and deployment model, and describe the process for creating a Globus endpoint on a POSIX storage system. You will experiment with installing Globus Connect Server and configuring basic options on the endpoint.

11:00—11:15 break
Camelot Ballroom
11:15—11:45 Led by: Josh Bryan, Globus

This is an introductory session for those planning to use Globus services in their applications. We will describe the foundational elements of the Globus platform and demonstrate how to work with Globus APIs and the Globus Python SDK. This session is intended to provide developers new to Globus with a starting point for working with the various platform services such as Globus Auth and Globus Transfer.

11:45—13:00 lunchCamelot Ballroom
13:00—14:30 Ian Foster, Globus Co-Founder
Rachana Ananthakrishnan, Executive Director, Globus

We will review notable events in the evolution of the Globus service over the past year, and provide an update on future product direction and sustainability.

14:30—15:00 break
Camelot Ballroom
15:00—17:00 Dan Stanzione, Associate Vice President For Research, University of Texas at Austin, Executive Director, TACC

This talk will cover the developments at the Texas Advanced Computing Center, and how Globus moves terabytes daily into and out of the center. We will also discuss how Globus is being integrated with the NetSage platform, and the role we expect this to play in the new NSF Leadership Class Computing Facility that starts construction next year.

Emre Keskin, University Research Data Officer, Scott Yockel, University Research Computing Officer, Hellen Zziwa, Director of Strategy and Engineering, Harvard University

Details coming soon.

Hannah Parraga, Software Engineer, Advanced Photon Source, Argonne National Laboratory

The Advanced Photon Source (APS) at Argonne National Laboratory is a scientific user facility funded by the US Department of Energy. It is an X-ray light source which serves thousands of users per year from academia, government, and industry. It enables a diverse range of research areas such as materials research, biology, geosciences, life sciences, security, and many more. This year the APS is undergoing a major construction project aimed at increasing the brightness of its X-ray beams by up to 500 times its current amount. This will be achieved through replacement of the storage ring and include construction of new beamlines. The upgrade will bring new challenges for data management and computing due to the increase in data rates and volumes as well as new scientific techniques. It is estimated the APS will generate on the order of 100 PB of raw data annually and require dozens of petaflops of on-demand compute.

The APS has developed a suite of tools called the Data Management System for processing, cataloging, transferring, and managing user permissions for experimental data. In this presentation, we will highlight how the Data Management System is leveraging tools such as Globus Compute, Globus Transfer, and other Globus components to effectively solve the data management and computing challenges faced by the APS. This presentation will showcase the innovative solutions implemented by the APS which utilizes Globus services to ensure efficient and secure management of its growing data needs and our essential on-going partnership with the Globus Professional Services and Globus Labs teams.

Todd Trann, Senior Software Engineer, University of the Saskatchewan

Launching in late March 2023, Lunaris is a new service for discovering Canadian research data. It is built on years of experience in running the FRDR discovery service, currently powered by Globus Search, coupled with the Geodisy project and the new geographic extensions to Globus Search.

17:00—19:00 reception Tower Lounge – 32nd floor
Wednesday, April 26, 2023
Sessions will be held in the King Arthur Court
8:00—17:00 registration desk open
3rd floor
8:00—9:00 breakfast
Camelot Ballroom
Reflections on Distributed Workflows: Evolution, Advances, Challenges, and Opportunities
Ilkay Altintas, Chief Data Science Officer, San Diego Supercomputer Center, UC San Diego

Details coming soon.

A Data Deluge at the Advanced Photon Source
Laurent Chapon, Associate Laboratory Director - Photon Sciences, Argonne National Laboratory

Details coming soon.

10:40—11:00 break
Camelot Ballroom
Lightning Talks and Open Mic
Brock Palen, Director Advanced Research Computing, University of Michigan

Archivetar is a tool intended to ease the leveraging of low cost storage archives such as Glacier, HPSS/Tape, and Object Storage. It addresses this by simplifying many of the common trip points when leveraging tar or zip. Specifically the 80/20 rule of file size distribution, tars that are to large, latency behavior of networked filesystems, compression performance on multi-core systems, and maximum filesize management. In addition, archivetar leverages the Globus SDK to allow single line multi-core tar, compress, and globus upload. Simplifying the archiving of massive data sets in manageable formats. GitHub

Silvia Ramos, Senior Research Software Engineer, Rosalind Franklin Institute

The Rosalind Franklin Institute (Franklin) is a UK research institute that officially opened in September 2021 with the goal of transforming life science through interdisciplinary research and technology development. Since the opening, various instruments ranging from chemistry laboratory to Electron microscopes have been commissioned. The Franklin has very limited compute available locally and it relies on the cloud computing infrastructure provided by the Science and Technology Facilities Council (STFC) and the Baskerville Tier 2 HPC system (Birmingham) for data processing during and after the experiments. As a result, moving large amount of data in the order of the Terabytes between the Franklin and the remote compute systems is crucial. Furthermore, Globus has also facilitated access to public data archives such as EMPIAR used for data deposition before publication.

In this lightning talk, we will be describing our journey since the adoption of Globus in July 2022. A significant effort has been made to install Globus Connect Servers as well as Globus Connect Personal in all the instrument PCs (21 instruments and a few more coming online in the next few weeks) to transfer data to either Ceph File System or Ceph Object Store (using the S3 interface). Moreover, the Guest Collections functionality has been of great importance to share data with both internal and external collaborators. During this process, engagement with our users was essential through the use of documentation and training courses. At the end of the talk, we will describe the future integration of Globus with our metadata catalogue and automated data processing pipelines.

David Deepwell, Software Developer, University of Calgary

Research is a highly collaborative environment requiring the gathering and sharing of data across institutions. In particular, medical imaging data makes up a significant proportion of the shared research data by volume. One medical imaging study coordinated at the University of Calgary requires the collection, aggregation, curation and subsequent distribution of images and associated data with various external sources and collaborators. The volume of data necessitated an automated process to ensure efficiency in all these tasks.

We designed a workflow that consists of two stages: a preparatory stage for input data validation, distribution logic and backups, and a transfer stage. A Globus Flow is used in the transfer stage to transport the accepted files to their respective target sites before removing the files from the transfer source location.

The use of Globus, and especially Globus Flows, ensures robust transfers within the workflow and a provenance trail of the data between different sites. Furthermore, the trust afforded to Globus by most research institutions made it the natural choice for data sharing.

Swapnil Bhatkar, Cloud Engineer, National Renewable Energy Laboratory

The Advanced Computing Operations (ACO) group at NREL provides HPC, machine learning and cloud computing services for researchers and scientists to support the research and development of renewable energy systems enabling NREL’s clean energy mission. Currently, research teams at NREL transfer TBs of data to the cloud using Globus for advanced data processing but most of this data was moved to the cloud using object storage APIs over a site-to-site VPN connection. Single threaded API operations over a limited bandwidth VPN connection can cause significant network congestion and slow down the performance of the network. Additionally, this process can be time-consuming especially when you have to write utility scripts for monitoring data transfers for multiple large datasets.

To improve performance, automate data transfers, and processing in a hybrid/multi cloud environment, we are building an end-to-end cloud bursting workflow for moving huge volumes of data from on-premises HPC clusters to cloud vendors using event-driven systems. This talk will focus on the architecture of event-driven systems built using a combination of Globus SDK’s, serverless functions, message queues and Infrastructure as Code (IaC) tools such as Terraform.

Researchers can perform complex data analysis, allowing them to get insights from their data by leveraging analytical tools such as Amazon Athena and Google Big Query. Event-driven transfers with Globus also provide several benefits, including the ability to move data in real-time, trigger transfer jobs based on events, retry failed transfers, and verify data integrity through checksumming. This solution also allows IT administrators to replicate critical datasets to the cloud in a cheaper storage class by triggering backup jobs on-demand for disaster recovery purposes

Jason Zurawski, Science Engagement Engineer, Energy Sciences Network

Over 10 years ago, ESnet and our collaborators around the world introduced the concept of the Science DMZ and Data Transfer Nodes: a strategy to develop data architectures to support scientific use of networks that reduced the barrier to data mobility. To date, 100s of universities, laboratories, and facilities have adopted this approach and have successfully implemented efficient data movement using Globus; yet many site still struggle to reach even a 10% efficiency in sending data which results in loss of productivity.

ESnet is revisiting these complications community wide in the form of the "Fasterdata DTN Framework", a revitalized effort to help define a set of Best Common Practices for installing, configuring, and operating a data architecture. Through testing and consultation, our goal is to have a core set of major facilities able to reach efficiencies of 2PB of data transfer a day (e.g. sustained performance of 200Gbps), as well as working with end users to reach an efficiency of 4PB of data transfer a week (e.g. sustained performance of 50Gbps). This multi-year effort will seek volunteers from the Globus user community to participate and increase the R&E's communities ability to be productive with data mobility.

John Hammonds, Principal Application Developer, Advanced Photon Source

The Advanced Photon Source Data (APS) Management System has utilized Globus Connect Server (GCS) 4 since 2015 to distribute data to users. APS Data Management has grown to provide support for about 60 experiment stations and more than 4PB of data shared with users. The current implementation allows access to data with per experiment permissions. These permissions are tied to the APS user database and require registration as an APS user.

This talk will describe transition from GCS 4 to 5, changes to account for user mapping and work done to integrate new GCS Features with APS Data Management. Use of GCS 5 features such as collections and Globus groups will provide better ease of use for users. The use of mapped collections allows users to link ANL domain accounts to home accounts for easy access to data. Features such as guest collections will provide added support like granting access to users outside the APS user system. The switch to GCS 5 also allows integration with new Globus products such as Compute & Gladier to provide enhanced data processing solutions.

Jason Banfelder, Director, HPC Systems & Applications, The Rockefeller University

Some opinionated advice for getting started with Globus Flows.

Mitch Griffith, Archival Storage Software Developer, Oak Ridge National Laboratory

The National Center for Computational Sciences at Oak Ridge National Laboratory provides a central digital object identifier (DOI) repository for preserving datasets called Constellation. Part of the DOI process required us to retain the list of files and file sizes for each file in each fileset. We were able to make improvements to archive time by removing the file size requirement.

12:15—13:15 lunchCamelot Ballroom
13:15—15:00 Led by: Vas Vasiliadis, Globus

We will experiment the Globus Flows and Compute services, building and running progressively more complex flows that demonstrate how to automate common research tasks.

15:00—15:30 break
Camelot Ballroom
Office Hours and Deep Dives (in person only)
Camelot Ballroom

The Globus team will be available to discuss and provide guidance on your use case. We will also offer "deep dives" into a variety of topics including:

  • Globus Connect Server Migration and Advanced Configuration
  • Using the Globus Command Line Interface
  • Building and Deploying Globus Flows
  • Working with Globus Search service and the Globus Portal Framework
  • Working with the Globus Compute service
  • Open Q&A with the Globus support team

Thursday, April 27, 2023
Meeting will be held at the Globus office – 401 N Michigan Ave
8:00—9:00 continental breakfast
19th Floor Conference Center

The Customer Forum is an opportunity for Globus subscribers to discuss their experiences with the service, to learn about our product development plans, and to provide input on future product directions. Attendance at the customer forum is by invitation only. If you would like to represent your institution/community please contact us for an invitation.

12:00—13:00 lunch
19th Floor Conference Center

Past Event Programs

2023 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011

Why Attend?

  • See how to easily provide Globus services on existing storage systems
  • Hear how others are using Globus
  • Learn from the experts about the best ways to apply Globus technologies
  • Connect with other researchers, admins and developers

Important Dates