Program – 2016

(click on a presentation title to view abstract)

This year's program incorporates a day and a half-long developer workshop on Building the Modern Research Data Portal that overlaps partly with other Globus tutorials. As the title suggests, you should attend the developer workshop if you intend to build applications that incorporate Globus services. If you are an HPC/campus computing administrator you should attend the introductory and advanced administration tutorials.

Wednesday, April 20
07:30—17:00 registration desk open
Walnut Gallery
7:30—8:30 breakfast
Savoy
8:30—10:00 Led by: Steve Tuecke | workshop materials

We will introduce the Modern Research Data Portal and set the context for how Globus and the ScienceDMZ combine to deliver unique data management capabilities. This will include:

  • Overview of use cases: Common patterns like data publication/distribution, orchestration of data flows, etc.
  • Overview of the Globus platform: Architecture and brief overview of available services
  • Introduction to the Globus Auth API: Authenticating and authorizing a client
  • Introduction to the Globus Transfer API: Make your first call and move data with Globus
  • Introduction to the Python SDK for using Globus Auth and Transfer

10:00—10:30 beverage break
Imperial Foyer
10:30—12:00 Led by: Steve Tuecke | workshop materials

You will create your first simple web application using the Globus Auth and Globus Transfer APIs. You may bring your own portal/gateway/web app code, or use a sample application we provide. You will learn how to register the application with Globus Auth, authenticate using Globus Auth's OpenID Connect API, and call the Globus Transfer API to do a directory listing.

12:00—13:30 lunch
Savoy
13:30—15:00 Led by: Steve Tuecke | workshop materials

We will explore the Globus Transfer API in depth and demonstrate how to directly access files from an endpoint using the Globus Connect HTTPS Endpoint Server.

Led by: Vas Vasiliadis | tutorial materials

You will learn how access the Globus service, use Globus Connect Personal to install a Globus endpoint on your laptop, and use it to move data between your laptop and other Globus endpoints. You will also experiment with Globus file sharing and group management features, allowing you to easily share data with collaborators, and describe/curate data for more formal publication and discovery.

15:00—15:30 beverage break
Imperial Foyer
15:30—17:00 Led by: Steve Tuecke | workshop materials

Extend your web app to use additional capabilities of the Globus Transfer API including initiating asynchronous file transfers, checking transfer status, managing shared endpoint ACLs, and using direct HTTPS file access.

GlobusWorld Tutorials: Introductory Globus tutorial (continued)
Palladium
Led by: Vas Vasiliadis | tutorial materials
 
Thursday, April 21
7:30—17:00 registration desk open
Walnut Gallery
7:30—8:30 breakfast
Imperial
8:30—10:00 Led by: Steve Tuecke | workshop materials

We will explore the Globus Auth in depth. You will learn how to extend the Globus platform with your own services, using Globus Auth to authorize calls to your REST API.

Led by: Vas Vasiliadis, Rachana Ananthakrishnan | tutorial materials

This session is designed to address your specific Globus deployment issues. We will provide more detailed reviews of common deployment configurations such as multi-server data transfer nodes, using the management console with pause/resume rules, and integrating campus identity systems for streamlined user authentication.

10:00—10:30 beverage break
Walnut Gallery
10:30—12:00 Led by: Steve Tuecke | workshop materials

Extend your web app with a REST API and grant access to resources using linked identities. We will also hold an open discussion and gather feedback on the Globus platform to inform future development and product direction.

Led by: Vas Vasiliadis, Rachana Ananthakrishnan | tutorial materials

Following our presentation of advanced deployment scenarios, we will look to attendees to drive the discussion by describing specific use cases and leveraging the group's collective expertise to develop a solution. Examples might include troubleshooting performance issues, exposing multiple filesystems via a single Globus endpoint, or dealing with unusual network/firewall configurations. We will also discuss our product roadmap and new features requested by Globus users.

12:00—13:30 lunch
Imperial
13:30—15:00 Ian Foster, Globus Co-founder  | slides

Ian will review notable events in the evolution of the Globus service over the past year, and provide an update on product direction.

15:00—15:30 beverage break
Walnut Gallery
15:30—17:00
Lightning Talks
Walnut Ballroom

Building a Scaleable National Repository Infrastructure for Canada
Todd Trann, University of Saskatchewan, Compute Canada  | slides

Compute Canada worked closely with the Canadian Association of Research Libraries (CARL) and Globus on a Research Data Management (RDM) pilot in 2015. The pilot was very valuable in bringing different communities together to discuss the challenges of RDM in a Canadian context. The pilot identified a candidate solution, leveraging Globus data publication, for a scalable national repository infrastructure and preservation pipeline. CC, CARL, and Globus have begun a two-year project on January 1, 2016, to develop that software infrastructure. In this lightning talk, we will present the plan to build a national-scale Canadian data repository software framework with a scalable, federated storage model, a preservation service, and national research data discovery for data stored in many other Canadian repositories.


NCAR Global File System and Data Transfer Services Integration
Pamela Hill, Manager, Global File Systems and Data Transfer Services, NCAR  | slides

The GLobally Accessible Data Environment (GLADE) provides centralized high-performance file systems spanning supercomputing, data post-processing, data analysis, visualization, NCAR science gateways and HPC-based data transfer services. Additional services like high-performance data transfer protocols, including a Globus based data-sharing service, enhance NCAR’s ability to bring data from other sites to NCAR for post-processing, analysis, and visualization and to share data easily with external collaborators. GLADE also hosts data from NCAR’s Research Data Archive (RDA), NCAR’s Community Data Portal, and the Earth System Grid which curates CMIP5/AR5 data. GLADE’s architecture, now entering it's third incarnation, shifts user workflows from a design that centers on serving the supercomputer to a more scientifically efficient design that facilitates the flow of data. Through a globally accessible storage infrastructure, users now arrange their workflows to use stored data directly without first needing to move or copy it. We will discuss the evolution of these services including the growing integration of Globus based integration in the GLADE environment.


Paving the Road from Instruments to the RCC's Midway HPC Cluster: XDM, the XROMM Data Management Platform
H. Birali Runesha, Assistant Vice President for Research Computing, The University of Chicago  | slides

Researchers and collaborators are generating data from instruments and observations at accelerating rates, resulting in extreme challenges in data management and computation. These instruments can generate terabytes of data per experiment or per day, which often must be transferred from remote locations, field stations, or core facilities to the user’s system for storage and analysis. This talk will discuss one example of how the University of Chicago Research Computing Center (RCC) uses Globus to address this challenge. As part of a data management system for the X-ray Reconstruction of Moving Morphology (XROMM) core facility at the University of Chicago, we have developed a client interface (RCCclient) for the synchronous transfer of experimental data and metadata from source machines at the XROMM facility to destinations on RCC’s Midway high-performance computing cluster. The data and metadata associated with these experiments are catalogued by a locally hosted X-ray Motion Analysis (XMA) Portal user interface and then the system updates the national XMA repository at Brown University.


Globus and a Multi-site, Multi-Petabyte Workflow
John D. Maloney, Storage Engineer, National Center for Supercomputing Applications  | slides

The Terraref project, funded by the DOE, will be using NCSA to store and process plant image data being generated at multiple sites across the United States. All sites combined will be generating more than 5TB/day of data that needs to be automatically transferred to NCSA systems for processing, archival, and serving of the processed information. Globus will be used to push data in from all the sites, to flip a copy of selected data to a Nearline tape archive system, and as an option remote researchers can use to grab large processed data sets for their local use. The sites sending data include the University of Arizona Maricopa Agriculture Center which will have a large Lemnatec Gantry system that alone will produce 4-5TB/day, the Danforth Plant Science Center at Washington University which will use a different Lemnatec system and will be running multiple experiments, each producing over 1TB of data, and Kansas State University that will use aerial craft to gather large amounts of image data across field plots. Globus will allow us to use a robust transfer mechanism to ensure maximum throughput between sites, especially the University of Arizona, and to ensure data integrity as it travels across the country for processing, and back out to researchers who need it. (visit the project home page)


Globus and X-ray Imaging at the Advanced Photon Source: From Data Intensive to Data Driven Science
Francesco De Carlo, Advanced Photon Source, Argonne National Laboratory  | slides

Full-field X-ray imaging is an extremely versatile technique that is broadly applicable to almost all scientific and engineering disciplines. In many cases, full- field imaging is the keystone linking a sample to other X-ray techniques such as ptychography, uXRF, uXANES, and uXRD. The Advanced Photon Source (APS) allows for hierarchical 3D imaging of dynamic systems and materials with spatial resolution up to 20 nm, without a major sacrifice in time resolution. These generate extremely large amounts of data. In this talk we will present how Globus has been integrated with the Transmission Microscope Instrument at the APS beamline 32-ID to manage and distribute the nano tomography instrument data.


Overview of the LANL Globus Endpoint
Giovanni "Vann" Cone, HPC Consultant, Los Alamos National Laboratory  | slides

LANL's High Performance Computing division deployed a dedicated Globus endpoint for "Institutional Computing" open-science users during the summer of 2015. In this presentation I will cover the history leading up to the deployment along with both networking and security challenges that were involved in the process of constructing the endpoint. The underlying usage model of our Globus endpoint with regards to authenticating and migrating data to and from the endpoint staging file-system will also be explained. I will conclude with brief usage statistics over the last several months.


17:00—19:00
Reception/Poster Viewing
Palladium (Ground Floor)
 
Friday, April 22
7:30—14:30 registration desk open
Walnut Gallery
7:30—8:30 breakfast
Imperial
8:30—9:00 Bob Cone, Spectra Logic  | slides

If we believe that the survival of a species in nature is dependent upon the passing on of that species' genome, what can be said about the survival of society as a whole? The human species is the only animal which takes information from previous generations, builds upon it and then passes it on to future generations. Spectra CEO, Nathan Thompson, proposes that society too has a genome, and it is comprised of the data and digital content we as a society create. In his book, "Society's Genome", Thompson investigates the threats to society’s genome and suggests ways in which those threats can be mitigated if not eliminated. Join co-author Bob Cone for an insightful and interesting exploration of how nature's example can be used in data centers throughout the world to create strategies for digitally preserving the world's treasury of information. Attendees will be mailed a copy of Thompsons book when the first edition is released in May 2016.

9:00—10:00
Lightning Talks
Walnut Ballroom

Using Globus Transfer for Camera Data on NSTX-U
Gretchen Zimmer, Lead Software Engineer, Princeton Plasma Physics Laboratory  | slides

The major experimental device at Princeton University’s Plasma Physics Laboratory (PPPL) is the National Spherical Torus Experiment Upgrade (NSTX-U). Among the 55 diagnostics on NSTX-U are fast-framing cameras that record the plasma light at different locations during experiments. Traditionally, transferring the camera data to project storage has been accomplished using the Samba protocol. A new method has been implemented using Globus Connect Personal and the Globus Command Line Interface (CLI). Camera data files are transferred to the Globus server endpoint after each experimental pulse throughout the run day with no user intervention. File size is typically in 1 to 2 GB range, but can be up to 10 GB in size. The task that manages the file transfers uses the Globus CLI transfer command to initiate the transfer and return the task ID, the CLI wait command, with the task ID as input, to signal transfer completion, and the CLI events command to return the transfer details such as bytes transferred and effective MBits/second. The task uses PuTTY to handle SSH; it re-activates the Globus destination endpoint if the endpoint has expired. Currently, the Globus Transfer method has been implemented on four NSTX-U diagnostic cameras. Transfer rates on PPPL’s 10GB network are about a factor of 3 times faster than the Samba protocol, enabling researchers to view vessel videos after each pulse, which aids in setting parameters for the next pulse.


Scripted Deployment of Globus Server using Ansible
Eric Coulter, XSEDE Campus Bridging Engineer, Indiana University  | slides

Significant challenges exist for institutions that aim to support high-quality computational research and education. Even beyond the expense of buying and supporting computer hardware, there are large barriers to implementing local computing resources in the forms of poor documentation, unsupported or incomplete software projects, and general lack of time on the part of support staff. The XSEDE (eXtreme Science and Engineering Discovery Environment) Campus Bridging team has produced a few sets of tools to make cluster implementation easier for campuses without the resources for large computing environments. The XSEDE-Compatible Basic Cluster (XCBC) and XSEDE National Integration Toolkit (XNIT) enable campuses to construct usable cluster resources with a minimum of effort, using well-supported projects, and well-documented tools. These tools have been used at multiple campuses across the US, and are demonstrably helpful in providing students and researchers with computational resources with a minimum of effort from support staff. The primary tool we provide for data handling is the Globus Connect Server and Globus toolkit. In order to allow the quick and easy deployment of a Globus Connect Server on an XSEDE-like cluster, the Campus Bridging team uses the Ansible configuration management software. In this talk, Campus Bridging staff will briefly describe the Ansible software, illustrate the swift deployment of a Globus Server (this can be done in less than 1 minute on a freshly built cluster), and touch on the benefits of Globus Connect Server for small institutions.


Using Globus to Improve Access and Delivery of Scientific Data
Thomas Cram, National Center for Atmospheric Research  | slides

The Research Data Archive (RDA; http://rda.ucar.edu) at the National Center for Atmospheric Research (NCAR) contains a large and diverse collection of meteorological and oceanographic observations, operational and climate reanalysis outputs, and remote sensing datasets to support weather and climate research. The RDA contains greater than 700 dataset collections which support the varying needs of a diverse user community. The number of RDA users is increasing annually, and the most popular method used to access the RDA data holdings is through web based protocols, such as wget and cURL based download scripts. In FY 2015, 11,000 unique users downloaded greater than 1.6 petabytes of data products from the RDA, and customized data products were prepared for more than 4,500 user-driven requests. In order to further support this increase in web download usage, the RDA has integrated the Globus data transfer service (www.globus.org) into its services to provide a GridFTP data transfer option for the user community. The Globus service is broadly scalable, has an easy to install client, is sustainably supported, and provides a robust, efficient, and reliable data transfer option for RDA users. This presentation will highlight the technical functionality, challenges, and usefulness of the Globus data transfer service for providing user access to the RDA holdings.


Lessons Learned From Running an HPSS Globus Endpoint
Lisa Gerhardt PhD, Big Data Architect, NERSC, Lawrence Berkeley National Laboratory  | slides

The High Performance Storage System (HPSS) archival system has been running at NERSC at Lawrence Berkeley National Laboratory since 1998. The archive is extremely popular, with roughly 100 TB of I/O everyday from the ~6,000 scientist that use the NERSC facility. We also maintain a Globus-HPSS endpoint that transfers an average of 630 TB/month of data into and out of HPSS. Getting Globus and HPSS to mesh well can be challenging. We will give an overview of some of the lessons learned.


10:00—10:30 beverage break
Walnut Gallery
10:30—12:00
Lightning Talks
Walnut Ballroom

Leveraging Globus Identity for the Grid
Suchandra Thapa, Computation Institute  | slides

The Open Science Grid (OSG) is an open national cyberinfrastructure of research computing centers comprised of 125 institutions. High-throughput computing resources are shared between users and institutions according to local and virtual organization policies. Excess capacity is shared on an opportunistic basis using grid computing tools and services. To connect eligible users to all these resources quickly, we created a federated login and job submission service, OSG Connect, which heavily leverages Globus Identity services.


Science DMZ Implementation at the Advanced Photon Source
Jason Zurawski, Science Engagement, ESnet  | slides

The Science DMZ is a network paradigm that facilitates a "friction free" network path for bulk data movement. Components of this design include fast network hardware, targeted information security policies, and intelligent data movement tools such as Globus. This talk will present a case study in designing a Science DMZ for the Advanced Photon Source at Argonne National Laboratory, and present results to encourage the adoption of this technology for facilities of all sizes.


FACE­IT: Earth science workflows made easy with Globus and Galaxy technologies
Raffaele Montella, University of Naples Parthenope, University of Chicago  | slides

Framework to Advance Climate, Economic, and Impact Investigations with Information Technology (FACE­IT) is the results of our effort in designing, implementing and testing in real world scenarios, a Globus Galaxies incarnation with the primary goal of crop and climate impact assessments. The FACE­IT platform enables the capture of both workflows and simulation outputs in well­ defined, reusable, and easily comparable forms. It leverages the capabilities of the Globus Galaxies adopting the Globus identification system and implementing Globus based data movement. This research initiative could be contextualized in the field of operational science data portal and simulation desktops fitting the nowaday and hopely the future vision of science as a service. Globus Galaxies learned to speak the earth science data language thanks to the implementation of more than new 20 datatypes as the remarkable dataset aggregation, the development of tools and enabling technologies as NetCDF advanced support, powerful map integration and many new data source interfaces. The latest developments include a completely novel NetCDF scavenging, ingestion and query system, still under development, that will enable the next generation of earth scientists to search their data on a Globus supported cloud and on premise wide environment.


Introducing the new Globus Command Line Interface
Stephen Rosen, Stephen Rosen, Globus DevOps, University of Chicago  | slides

We are reaching the limits of the Globus command line interface (CLI) available at `cli.globusonline.org`. With the advent of Globus Auth, and a growing ecosystem of applications and institutions using Globus, we've taken a step back to re-evaluate everything and anything about the Globus CLI. Join us for a preview of the new CLI, which is still under active development in Open Alpha, with particular emphasis on critical design considerations like error handling and output parsing.


Enabling the Creation of Dynamic Globus Endpoints on AWS via CloudyCluster
Brandon Posey, Omnibond  | slides

The Cloud offers a great opportunity to extend the reach of High Performance Computing to a broader audience and with Globus and CloudyCluster the barriers of entry for configuring the HPC environment and seamlessly transferring data in and out of the cloud have been removed. In this talk we will discuss the solution of Globus and CloudyCluster, how they work together to dynamically configure an HPC environment and seamlessly transferring data in and out of the Cloud. Thus allowing researchers to be able to quickly get their research data transferred to and their analysis running in the Cloud, all while utilizing all the Globus features that they are used to on their local Clusters. An example setup and transfer using Globus will be demonstrated to a previously provisioned cluster, to showcase the value of the automated and dynamic Globus Configuration.


Data Transfer Using Globus at the University of Wyoming
Dylan Perkins, Advanced Research Computing Center, University of Wyoming  | slides

Contemporary science has become a data intensive practice, collecting more data at ever increasing speeds. This development in research has prompted universities across the country to adapt their cyberinfrastructure and computational resources in order to keep up with their researchers' demands. The Advanced Research Computing Center (ARCC) at the University of Wyoming has developed six data transfer nodes (DTNs) to provide researchers with the means to connect to several research-oriented computational resources, including a long-term data storage solution (capable of storing multiple petabytes) short-term scratch storage, and the University super computer with the fast data transfer speeds of the Science DMZ. These DTNs are equipped with 40 and 100 gigabit line cards with plans to expand all of them to match the Science DMZ’s 100 gigabit transfer speeds. The university systems are also equipped with 40 and 100 gigabit line cards. This provides researchers with excellent connectivity from and to the Science DMZ. The primary data transfer tool used by these DTNs is the Globus software to provide researchers the best transfer speeds possible, given system limitations, as well as an easy to use interface in order to share and collaborate with researchers across the country and the globe. This talk will cover the purpose, details, functionality and a few use cases of these DTNs.


Research Computing with Globus in the Jetstream Cloud
Steve Tuecke, Globus Co-founder  | slides

NSF's Jetstream is a new cloud computing service, similar to Amazon EC2 and Microsoft Azure, designed specifically for research computing needs. Jetstream allows NSF researchers to quickly and easily provision and configure their own virtual machines (VMs) on hardware scaled for research computing, based on a large library of system images. Jetstream uses the Atmosphere web interface from CiVerse (formerly iPlant) to provide a beautiful, browser-based interface for VM setup and status monitoring. Jetstream uses Globus Auth for Atmosphere logins, allowing NSF users to login using their XSEDE or InCommon campus credentials, and to use Globus for moving and sharing data on Jetstream-hosted VMs.


12:00—13:00 lunch
Imperial
13:00—14:30 Moderated by: Steve Tuecke, University of Chicago, Globus team

We invite all attendees to an open rountable discussion to answer your most pressing Globus questions—or just that one little thing that you've always wondered about. Take a minute to submit your question ahead of time so we can have a more productive discussion. You can send questions to outreach@globus.org. We will also use this session to gather feedback on current usage and planned features. This will be a unique opportunity to meet and engage with many Globus team members in an informal setting. Bring your toughest questions!


14:30 conference adjourns
 

Connect


 

Why Attend?

  • See how to easily provide Globus services on existing storage systems
  • Hear how other institutions are using Globus
  • Learn from the experts on the best ways to apply Globus technologies
  • Connect with colleagues and Globus developers

Gold Sponsors

OrangeFS Spectra Logic

Sponsor Prospectus

GW17 Sponsor Prospectus

Past Events