GlobusWorld Program

(click on a presentation title to view abstract)
Wednesday, May 1, 2019
Sessions will be held in the Chagall Ballroom
7:30—17:00 registration desk open
Chagall Foyer
7:30—8:30 breakfast
Van Gogh
8:30—10:00 Steve Tuecke and Ian Foster Globus Co-founders
Rachana Ananthakrishnan Head of Products
Vas Vasiliadis Chief Customer Officer  | slides

We will review notable events in the evolution of the Globus service over the past year, and provide an update on future product direction and sustainability.

10:00—10:30 beverage break
Chagall Foyer
10:30—11:30 Stephan Peinkofer Lead Developer Storage Architectures, Leibniz Supercomputing Centre  | slides

LRZ's Data Science Storage (DSS) is a novel approach at LRZ to solve the demands and requirements of data intensive science. Therefore, DSS implements a data centric management approach, which gives our researchers the ability to store vast amounts of data for as long as the data is important to them or the science community, access this data from the whole LRZ computing ecosystem, share this data between arbitrary users of the LRZ computing ecosystem and access, transfer and share this data world wide via Globus.

This talk will give an overview of LRZ's Data Science Storage Service and will outline how we integrated Globus into our own Management Portal using the Globus REST API.

Brian Mohr Sr. Systems Engineer, Johns Hopkins University  | slides

The goal of the Open Storage Network (OSN) project is to create a robust national storage substrate that can impact 80% of the NSF research community, and offer a way to build a common basis for the Cyberinfrastructure of the Major Research Equipment and Facilities Construction (MREFC) projects. The program will create a storage appliance which should read and write from disks at speed, with a capacity of about 1.5PB at a price point of $130K. In conjunction with the National Data Service and John Hopkins University, Globus is helping to build a distributed storage platform for OSN, based on object storage with Globus Auth federated identity authorization to promote cross-institutional data sharing for OSN users.

Doug Jennewein Director of Research Computing, University of South Dakota  | slides

Funded by the National Science Foundation and the South Dakota Board of Regents the South Dakota Data Store (SDDS) was deployed in 2018 and currently provides over 1.2PB of capacity across two service tiers. The Sharing Tier provides high-reliability, high-availability, network-accessible storage for research requiring persistent access to large quantities of data. The Archival Tier is hosted on a magnetic tape library for long-term offsite archival-grade storage. SDDS will serve all faculty, staff, postdocs, students, and graduate students in South Dakota. Globus provides the necessary authentication, data sharing, and transfer capabilities to make SDDS a truly statewide resource.

Ben Galewsky Research Programmer, National Center for Supercomputing Applications  | slides

In this talk, we will describe how we used the Materials Data Facility (MDF) and its associated Globus Tooling to implement a global, community-curated database of graphene growth recipes.

The University of Illinois' Nano-manufacturing hub has labs that are attempting to scale up growth of graphene using chemical vapor deposition. So far this effort has involved a great deal of trial and error. There are more than 1,000 research groups around the world involved with this same exploration. We created an application inside a HubZero instance which allows users to capture their recipes along with SEM images and spectrographic analysis of the sample. These samples are submitted to MDF where additional metadata is extracted and the dataset indexed with Globus Search. Researchers can mine recipes using Forge to find areas for their own exploration. Analysis based on this data can also be submitted to MDF where related datasets can be linked, published, and assigned a common DOI.

Katrin Heitmann Physicist and Computational Scientist, Argonne National Laboratory  | slides

In this talk we will discuss a public data release of cosmological simulations carried out with HACC, the Hardware/Hybrid Accelerated Cosmology Code. Our release platform uses Petrel, a research data service, located at the Argonne Leadership Computing Facility. Petrel offers fast data transfer mechanisms and authentication via Globus, enabling simple and efficient access to stored datasets. Easy browsing of the available data products is provided via a web portal that allows the user to navigate simulation products efficiently. The data hub will be extended by adding more types of data products and by enabling computational capabilities to allow direct interactions with simulation results.

11:30—12:15 Tom Barton, Sr Consultant for Cybersecurity and Data Privacy, University of Chicago and Internet2  | slides

Security is sometimes seen as an inhibitor rather than an enabler of science. I hope to convince you that this is not inherently true. We'll consider two scientific contexts, a simpler one and a far more complex one, in which substantial risk is associated with the research and how at least some of the risk can be addressed. Along the way we'll see that real risk reduction can, and sometimes must, happen by means quite unlike applying security controls from a catalog, and that it can take the form of a service rather than a constraint. I'll leave attendees with a few services they may wish to follow up with to help them address risk in their own circumstances.

12:15—13:30

Please join a table for informal conversation on a topic of interest. Globus staff will be spread across tables to participate in discussions.

13:30—14:15 Bobby Kasthuri, Neuroscience Researcher, Argonne National Laboratory, Assistant Professor in Neurobiology, University of Chicago
Rafael Vescovi, Postdoctoral Scholar, Argonne National Laboratory

The Kasthuri lab at the University of Chicago and Argonne National Laboratory is pioneering new techniques for brain mapping of the fine structure of the nervous system – 'connectomics' and 'projectomics'. I will describe these developments including: large volume automated electron microscopy for mapping neuronal connections, synchrotron source X-ray microscopy for mapping the cellular composition of entire brains, and combining both with cell type specific labeling for multi-scale, multi-modal brain maps. We have applied these tools to brains from octopuses and squids, to primates and mice, to the enteric nervous system and how stem cells integrate into the brain. We hope to help answer questions like: how do brains learn as they grow up? And how do brains differ across individuals and across species? And how can we reverse engineer brain function in our own computers and robots?

14:15—15:00 Susan Chacko Senior Staff Scientist, National Institutes of Health  | slides

The HPC facility at the National Institutes of Health's Intramural campus has had a Globus endpoint since 2012. Globus is used routinely for data transfer along with other methods, but the Globus data-sharing capabilities in particular have been enthusiastically embraced by the NIH HPC users. Statistics and discussion of data sharing from the user and administrative point of view will be presented.

Jonathan Silverstein Chief Research Informatics Officer, Health Sciences and Institute for Precision Medicine, University of Pittsburgh
Michael Davis Software Integration Architect, Department of Biomedical Informatics, University of Pittsburgh  | slides

The Research Informatics Office (rio.pitt.edu) is responsible for UPMC's clinical data extraction, transformation, honest brokering, and provisioning for research. Neptune Research Data Warehouse and Health Record Research Request (R3) are RIO data and policy resources serving hundreds of large and small research projects across discrete data, text, imaging, and limited -omics. In partnership with the Pittsburgh Supercomputing Center, RIO is also responsible for the infrastructure for the HuBMAP consortium. To efficiently support these many activities, particularly including protected data sharing with many investigators across multiple institutions, RIO has adopted, as a key strategy, the Federated Identity, Data Movement, Search and Group Management features of Globus.

In this talk we will use multiple examples to describe how Globus underpins these projects in usable and re-usable ways to achieve Secure Data Liquidity. We will also highlight a specific project, the Cancer Registry Records for Research (CR3), whose goal is to advance cancer research by providing tools that facilitate appropriate governance and dissemination of cancer-specific data to the research community.  Using Globus, we have developed a portal providing cloud-based services for authentication, searching and data transfer. Using data extracted from the UPMC Network Cancer Registry, which is based on the North American Association of Central Cancer Registries (NAACCR) data standards for cancer registration, we constructed a set of "limited" data (i.e., site, stage, grade/morphology, outcomes) to store in Globus Search. This enables researchers to query and visualize aggregate cancer data for preparatory to research purposes.

Mohamad Qayoom IT Consultant, LSU Health Sciences

Due to a schedule conflict, this talk was not presented.

Steven Newhouse Head of Technical Services, European Bioinformatics Institute  | slides

The European Bioinformatics Institute (EMBL-EBI) is part of the European Molecular Biology Laboratory and one of the world's leading providers of life-science data to a global community. A key aspect of our work is for individual users to deposit their data, for their data to be processed, and made available to the global community alongside added value knowledge derived from that data.

Moving data from where it is generated to EMBL-EBI and from our archives to where the user wishes to analyze the data is a key part of our contribution to ELIXIR - a European Research Infrastructure for life-science. The integration of the ELIXIR AAI within Globus and the exposure of our data through Globus endpoints, will enable researchers within ELIXIR to seamlessly move data across Europe.

15:00—15:30 beverage break
Chagall Foyer
15:30—17:00 Giri Prakash ARM Data Center Director and Research Staff, Environmental Sciences Division, Oak Ridge National Laboratory  | slides

The Atmospheric Radiation Measurement Data Center (ADC) is a long-term archive and distribution facility for various ground-based, aerial and model data products in support of atmospheric and climate research. The ADC Archive currently holds over 11,000 data products with a total holding of over 1.7 petabytes of data that dates back to 1992, these include data from instruments, value added products, model outputs, field campaign and PI contributed data. ADC’s data discovery and delivery use modern and scalable architecture with data access and delivery options include THREDDS/OpenDAP, Globus and near real-time data access API, automated data access via web services, advanced visualizations and big data analysis platform. In this talk, we will discuss how users are using Globus to transfer terrabytes of data from ADC to their home institution and also how ADC is using Globus for its operations including transferring data between clusters.

Riley Conroy Sotfware Engineer, National Center for Atmospheric Research  | slides

Since late 2014, the Research Data Archive (RDA; https://rda.ucar.edu) at the National Center for Atmospheric Research (NCAR) has used the Globus data management and publication services to support its online research data portal. During this time, users have transferred more than 1.1 petabytes of data from the RDA collections, and the services developed and enabled by the Globus platform have been a tremendous benefit to the RDA user community. This presentation will provide a retrospective look at our experiences building these services into the RDA portal and how we use them to simplify our data management strategies and enhance the user experience.

Using the Globus Python SDK, the RDA portal improves data access and enables scalable workflows for its large community of users. Shared endpoint access to the full RDA data catalog is user-driven and fully automated, and user-delegated transfers of curated file lists and custom delayed mode data products are initiated directly from the RDA portal and facilitated by the Globus Auth and Transfer APIs. Additional highlights include the NCAR RDA alternate identity service, which allows users to log into Globus with their RDA credentials, and a complete history of Globus data transfer usage is harvested via the Transfer API. The key ingredient in this integration is the Science DMZ network in place at NCAR, which allows Globus to deliver scalable, efficient, and reliable data transfers out of the RDA.

Ben Blaiszik Research Scientist - University of Chicago, Globus and Argonne National Laboratory Data Science and Learning Division  | slides

In this talk, we describe two related materials data infrastructure systems built on the Globus platform that work to build an ecosystem to support machine learning in materials science: the Materials Data Facility (MDF) and the Data and Learning Hub for Science (DLHub). MDF serves as an automated facilitator and interconnection point for materials data producers and consumers. Its services allow data to flow in from many sources, be enriched via a variety of tools (e.g., via automated metadata extraction, quality control), and flow onwards to many destinations, including not only MDF-operated services (e.g., the MDF repository, for storage of data with no other home, and the MDF search engine, for integration navigation and search of any and all data known to MDF) but also to the growing number of other materials-related data infrastructure components. DLHub provides similar functions for ML models and associated data transformation and analysis tools, allowing researchers to describe and publish such tools in ways that support discovery and reuse; run published tools over the network (with tools executed on a scalable hosted infrastructure); and link models, other tools, and data sources into complete ML/AI pipelines that can themselves be published, discovered, and run.

Sandra Gesing Research Assistant Professor, Notre Dame  | slides

SGCI (Science Gateways Community Institute) provides services to the academic community to achieve sustainable science gateways—end-to-end solutions that allow researchers and educators to solve research questions within easy-to-use user interfaces hiding underlying complex research infrastructures. In this talk, I will present examples of the successful use of Globus technologies with gateways including the QUBES science gateway, the COSMIC2 science gateway, PlantingScience and CitSciBio. As diverse as the projects are—from modeling and simulation integration to citizen science technologies—so is the use of Globus technologies, from authentication features to data transfer to fully applied Globus data management features.

Karl Kornel System Administrator, Research Computing, Stanford University  | slides

At Stanford University, we have a Standard Globus subscription covering all of campus. That includes over 18,000 faculty and students, and over 1,000 IT providers. We want everyone at Stanford to use Globus, yet our team is less than 20 people, with only three people well-versed in Globus. Our response to this need will be the “Globus @ Stanford” web site. In this talk, we will explain why we decided to create an entire site for Globus. We will also talk about the decision to use GitHub Pages as the platform, and describe some of the major sections of the site. Although the site is not public, it is published, and the audience will be invited to check out the site after the talk.

Lee Liming Technical Communications Manager, University of Chicago - Globus  | slides

Jetstream (www.jetstream-cloud.org) is a self-service OpenStack cloud for researchers, available via NSF's research allocation process and XSEDE. Jetstream offers object storage that’s compatible with Amazon Web Services’ Simple Storage Service (S3). Although the research community is learning to use object storage—and important research applications have been adapted to use the S3 API—it's cumbersome to move large research datasets into and out of object storage services using the API. This lightning talk shows how researchers can easily and reliably move research datasets into and out of Jetstream’s object storage using Globus Connect Server with the AWS S3 storage connector. This illustrates the value of the S3 storage connector to campuses that have AWS S3 or OpenStack object storage.

Kyle Chard Research Fellow, University of Chicago  | slides

Abstract will be posted shortly.

17:00—19:00
Reception
Van Gogh

Enjoy refreshments and light appetizers before heading out to explore some of the amazing dining options that Chicago has to offer.

 
Thursday, May 2, 2019 - TUTORIALS
Tutorials will be held in the Chagall Ballroom
07:30—17:00 registration desk open
Chagall Foyer
7:30—8:30 breakfast
Van Gogh
9:00—16:00 Led by: Globus Team

Need more personal attention? Do you have a particularly thorny issue that you're unable to resolve? Stop by during "office hours" where Globus developers will be on hand to answer your toughest questions.

8:30—12:15 Led by: Greg Nawrocki, Rachana Ananthakrishnan   | slides

We will demonstrate new and updated Globus capabilities from the perspective of a researcher, systems administrator, and application developer. This is a high-level introduction to all aspects of the Globus service, including the recently refreshed web application, command line interface, and new terminology introduced by the launch of protected data management features.

Led by: Vas Vasiliadis   | slides

We will demonstrate how to install and configure a Globus endpoint. We will also review deployment configurations such as multi-server data transfer nodes, using the management console with pause/resume rules, and integrating campus identity systems for streamlined user authentication. You will get to experiment with server endpoint installation using a virtual machine.

Led by: Rachana Ananthakrishnan, Greg Nawrocki   | slides

We will provide a detailed walkthrough of installing and configuring Globus Connect Server v5 for high assurance endpoints. We will demonstrate the key aspects of managing protected data on such endpoints, including how to configure additional authentication assurance, enforcing data encryption, and accessing audit logs.

Led by: Greg Nawrocki   | slides

We will use a Jupyter notebook to demonstrate how you can incorporate Globus capabilities into your own data portals, science gateways, and other web applications to easily manage large datasets in diverse research use cases.

12:15—13:00 lunch
Van Gogh
13:00—17:00 Led by: Jason Zurawski, ESnet   | slides

We will provide an introduction to the Science DMZ concept and illustrate best practices for leveraging modern, high-speed netowrks in data-intensive research.

Led by: Rachana Ananthakrishnan, Greg Nawrocki   | slides

We will present various use cases that illustrate the power of Globus data sharing capabilities, and provide hands-on experience with the Globus file sharing APIs.

Led by: Greg Nawrocki, Rachana Ananthakrishnan   | slides

We will review common use cases and demonstrate how the Globus command line interface (CLI) and API may be used to automate repetitive data management tasks.

Led by: Vas Vasiliadis   | slides

We will demonstrate how Globus integrates with interactive platforms suchs as JupyterHub and web frameworks such as Django. We will also describe how to leverage the Globus platform—and Globus Auth in particular—to secure APIs when building your own web services.

17:00 conference adjourns
 
Friday, May 3, 2019 – Customer Forum (by invitation only)
08:30—9:00 check-in and breakfast
Globus Office
9:00—12:00 Led by: Globus Team

The Customer Forum is an opportunity for Globus subscribers to discuss their experiences with the service, to learn about our product development plans, and to provide input on future product directions. Attendance at the customer forum is by invitation only. If you would like to represent your institution/community please contact us for an invitation.

Speakers include:

  • NIH
  • Internet2
  • Leibniz Supercomputing Centre
  • European Bioinformatics Institute
  • ESnet
  • NCAR
  • NCSA
  • University of Pittsburgh
  • Johns Hopkins University
  • LSU Health Sciences
  • Notre Dame / SGCI
  • University of South Dakota
  • Materials Data Facility
  • Oak Ridge National Lab
  • Open Storage Network
  • University of Chicago
  • Argonne National Lab

Past Events

Important Dates

Why Attend?

  • See how to easily provide Globus services on existing storage systems
  • Hear how others are using Globus
  • Learn from the experts about the best ways to apply Globus technologies
  • Connect with other researchers, admins and developers

Connect