Program – 2017

(click on a presentation title to view abstract)

This year's program incorporates a day of developer and administrator tutorials. You should attend the developer tutorial if you intend to build research applications that incorporate Globus services. If you are an HPC/campus computing administrator managing Globus endpoints you should attend the introductory and advanced administration tutorials.

Tuesday, April 11
7:30—17:00 registration desk open
Walnut Gallery
7:30—8:30 breakfast
8:30—10:00 Steve Tuecke, Globus Co-founder  | slides

Steve will review notable events in the evolution of the Globus service over the past year, and provide an update on future product direction.

10:00—10:30 beverage break
Walnut Gallery
Lightning Talks
Walnut Ballroom

Compute Canada: Experiences with Globus as National Research Infrastructure
Jason Hlady, Manager, Research Computing, University of Saskatchewan  | slides

Compute Canada entered into a partnership with Globus in 2014, and has deployed Globus file transfer and sharing tools at over twenty computational and storage-intensive sites across Canada, comprising a national research data transfer service. This talk will briefly describe Compute Canada’s data service, highlight how Globus was used to migrate more than 1.5PB data from aging legacy systems to new national storage systems, and how Compute Canada and the Canadian Association of Research Libraries have partnered to leverage Globus’s new search functionality and data publication software for a national-scale research data repository for Canada.

Compute Canada’s national advanced computing platform integrates high performance computing systems, research expertise and tools, data storage, and resources with academic research facilities across the country. Compute Canada works to ensure that Canadian researchers have the advanced research computing facilities and expert services required to remain globally competitive.

Enabling the Minnesota Supercomputing Institute's Large Data Archive System
Jeffrey McDonald, PhD, Assistant Director for HPC Operations, Minnesota Supercomputing Institute  | slides

In 2009, the Minnesota Supercomputing Institute (MSI), an organization that provides high-performance computing and large-scale storage needs for research groups at the University of Minnesota, applied for a National Institute of Health (NIH) grant for research. In order to apply for the grant, MSI was required to install a large data archive system, prompting them to seek out a new storage solution. The organization required a system that was reliable, dense, and highly scalable, leading them to purchase a Spectra tape library. Five years later, MSI upgraded its tape technology and incorporated a Spectra BlackPearl appliance into their environment. MSI’s most recent modification to their data center was the addition of Globus client software. The Globus client software allows the faculty to easily move data files between computers, servers and its supercomput­ing facility, using a simple browser. This prevents groundbreaking research efforts from being stalled when IT technical issues arise. Minnesota Supercomputing Institute’s current configuration enables the university to archive and share petabytes of information with a convenient solution.

File Transfer Feedback from Globus Tools
Andrew Lee, Associate Director of International Networks, Indiana University  | slides

We’ve all had the experience of file transfers having very different performance from day to day. How well a file transfer goes is generally out of the control of the end user – sometimes it’s speedy, and other times, not so much. Our group spends a lot of time understanding the behavior of networks, and trying to make sure end users get the performance they expect from it. What if there was a way to let us, and the folks at Globus, know that your transfer didn’t go well? And that someone could look into what was happening behind the scenes, on the backbone networks that most users don’t even see?

This talk will outline our proposal to the Globus team to incorporate a Feedback Screen, similar to what Skype uses, to get feedback on file transfers, and to give us an opportunity to jointly look into improving your file transfer performance or provide end user education. This opt-in approach will enable end users to have more control over understanding how to speed up their data sharing.

The International Networks group at Indiana University is funded by the NSF, builds high speed networks, and engages with end users to support research and education. Poor performance over long distances is a common problem encountered by researchers supported by our group.

A Collaborative Platform for Integrating AgroInformatics Data Using Globus
Andrew Gustafson, Scientific Computing Consultant, University of Minnesota  | slides

The field of agricultural informatics (AgroInformatics) is of growing interest to researchers in academia, private industry and governmental non-profits. Much of this interest is driven by an acute need to develop sustainable agricultural practices that optimize food production. Furthermore, new data sources and new high performance computational resources open up the opportunity for researchers to ask big questions concerning the role that genotype, the environment, management practices, and socioeconomic factors have in agricultural successes. Lastly, AgroInformatics research is becoming more collaborative, with organizations showing an increased interest in selectively sharing data in order to foster an environment in which questions related to such things as trait identification to improve crop yields can be asked on a global scale.

The International Agroinformatics Alliance (IAA) is a strategic partnership among public and private entities catalyzed by the College of Food Agriculture and Natural Resource Sciences and the Minnesota Supercomputing Institute at the University of Minnesota. IAA has created a Globus linked collaborative AgroInformatics research data platform. The platform uses the core data transfer capabilities of Globus to provide efficient informatics data transfer to and from the platform, as well as Globus OAuth for authentication and data sharing, so that researchers at different institutions can easily participate. The platform also integrates Jupyterhub interactive Python and R web notebooks to give users multiple interfaces to perform simple and complex analyses of geospatially referenced crop related data.

Globus at the University of Michigan and the Advanced Research Computing Organization
Todd Raeker, Advanced Research Computing, University of Michigan  | slides

In this talk I will present how Globus is currently being deployed and used on Campus in various research entities. Since research data storage is distributed across many different units on campus, management of Globus Endpoints poses unique challenges. Advanced Research Computing (ARC) is the biggest Globus user at >80% flowing through one endpoint system despite hosting only 5-10% of all storage. Most storage and endpoints thus are not under ARC control or direct management. I will describe some of the issues this decentralized environment presents as well as some of the solutions. Thoughts on future directions will be presented as well.

Login with XSEDE and Jetstream: Our Experiences with Globus Auth
Lee Liming, Technical Communications Manager, University of Chicago  | slides

The XSEDE community and the Jetstream cloud service provider are using Globus Auth to simplify and streamline user authentication and add support for identity linking: especially campus credentials. We will share what we've done and what we've learned.

Topic Modeling in the Cloud with Globus and CloudyCluster
Boyd Wilson, in collaboration with the Dice lab, Clemson University)  | slides

In this lightning talk, we will present a case study on how topic modeling in the cloud can leverage Globus data transfers to simplify and facilitate computation for PLDA+ in AWS with CloudyCluster in under an hour.

The case study will cover how the Data Intensive Computing Ecosystems (DICE) lab at Clemson University, under the direction of Dr. Amy Apon, Professor and Chair of the Division of Computer Science in the School of Computing, is utilizing CloudyCluster for scientific computations in the area of scalable machine learning, and in particular, in topic modeling.

Topic modeling is based on Latent Dirichlet Allocation (LDA), published by Blei, Ng, and Jordan in 2003. There have been many implementations of this method. One recent implementation by Google, PLDA+, uses message passing in a distributed cluster environment to speed up the calculation of topics in a very large corpus.

A large corpus of data transfer can be up to hundreds of gigabytes and requiring state of the art data Globus transfer technology.


Please join a table for informal conversation on a topic of interest. Globus staff will be spread across tables to participate in discussions.


Introducing Globus Labs
Ian Foster, University of Chicago  | slides

An Ensemble-based Recommendation Engine for Scientific Data Transfers using Recurrent Neural Networks
Kyle Chard, Senior Researcher, University of Chicago  | slides

Big data scientists face the challenge of locating valuable datasets across a network of distributed storage locations. We explore methods for recommending storage locations (“endpoints") for users based on a range of prediction models including collaborative filtering and heuristics that consider available information such as user, institution, access history, endpoint ownership, and endpoint usage. We combine the strengths of these models by training a deep recurrent neural network on their predictions. Collectively we show, via analysis of historical usage from the Globus research data management service, that our approach can predict the next storage location accessed by users with 80.3% and 95.3% accuracy for top-1 and top-3 recommendations, respectively. Additionally, our heuristics can predict the endpoints that users will use in the future with over 75% precision and recall.

Draining the Data Swamp
Tyler Skluzacek, PhD. Student, University of Chicago  | slides

Scientists’ capacity to make use of existing data is predicated on their ability to find and understand those data. While significant progress has been made with respect to data publication, and indeed one can point to a number of well-organized and highly utilized data repositories, there remain many such repositories in which archived data are poorly described and thus impossible to use. We present Skluma—an automated system designed to process vast amounts of data and extract deeply embedded metadata, latent topics, relationships between data, and contextual metadata derived from related documents. We show that Skluma can be used to organize and index a large climate data collection that totals more than 500GB of data in over a half-million files.

Responsive Storage: Home Automation for Research Data Management
Ryan Chard, Postdoctoral Fellow, University of Chicago  | slides

Exploding data volumes coupled with the rapidly increasing rate of data acquisition and the need for yet more complex research processes has placed a significant strain on researchers’ data management processes. It is not uncommon now for research data to flow through pipelines comprised of dozens of different management, organization, and analysis processes, while simultaneously being distributed across a number of different storage systems. To alleviate these issues we propose adopting a home automation approach to managing data throughout its lifecycle. To do so, we have developed RIPPLE, a responsive storage architecture that allows users to express data management tasks using high level rules. RIPPLE monitors storage systems for events, evaluates rules, and uses serverless computing techniques to execute actions in response to these events. We evaluate our approach by examining two real-world projects and demonstrate that RIPPLE can automate many mundane and cumbersome data management processes.

Explaining Wide Area Data Transfer Performance
Zhengchun Liu, Postdoctoral Appointee, MCS, Argonne National Laboratory  | slides

Increasing scientific data and worldwide science discovery collaboration require moving large amounts of data over wide area networks (WANs). End-to-end file transfers over WAN involve many subsystems and tunable application parameters that pose significant challenges for performance optimization. Performance models make it possible to evaluate resource configurations effi- ciently, allowing systems to identify an optimal or near-optimal parameter set for a given transfer requirement. Armed with log data for millions of Globus transfers involving billions of files and 100s of petabytes, we develop models that can be used to determine bottlenecks and predict transfer rates based on a combination of historical transfer data and current endpoint activity, without the need for online experiments on individual endpoints. Our work broadens understanding of factors that influence file transfer rate by clarifying relationships between achieved transfer rates, transfer characteristics, and various measures of endpoint load. We create profiles for endpoint CPU load, network interface card load, and transfer characteristics via extensive feature engineering, and show that these profiles can be used to explain large fractions of transfer performance. For 27,130 transfers over 30 heavily used source-destination pairs (“edges”), totaling 5191TB in 254 million files, we obtained median absolute percentage prediction errors (MdAPE) of 7.0% and 4.6% when using distinct linear and nonlinear models per edge, respectively. When using a single model for all edges, we obtain MdAPEs of 19% and 6.8%, respectively. These profiles are useful not only for this particular prediction task but also for optimization and explanation, providing new understanding of the impact of competing load on transfer rate. Their prediction can be used for distributed workflow scheduling and optimization.

The Materials Data Facility
Ben Blaiszik, Research Scientist, & Logan Ward, Postdoctoral Scholar, University of Chicago  | slides

The Materials Data Facility (MDF) operates two cloud-hosted services, data publication and data discovery, built on Globus services. These MDF services are built to promote open data sharing, self-service data publication and curation, and encourage data reuse, layered with powerful data discovery tools. The data publication service simplifies the process of copying data to a secure storage location, assigning data a citable persistent identifier, and recording custom (e.g., material, technique, or instrument specific) and automatically-extracted metadata in a registry while the data discovery service will provide advanced search capabilities (e.g., faceting, free text range querying, and full text search) against the registered data and metadata. The MDF services empower individual researchers, research projects, and institutions to publish research datasets, regardless of size, from distributed storage; and interact with and discover published and indexed data and metadata via REST APIs to facilitate automation, and analysis.

This talk will include live demonstrations showcasing the search interface (Web UI and API) to discover materials data indexed from over 15 sources and combine disparate materials datasets from distributed locations to train a state-of-the-art machine learning model on JetStream.

Integrating Globus with Jupyter
Rick Wagner, Globus Professional Services Lead, University of Chicago  | slides

Project Jupyter supports interactive data science and scientific computing across several programming languages, providing a tool that enables the rapid sharing of computational science tools and methods. We see several areas where Globus can contribute to Jupyter to broaden the access and distribution of notebooks, along with enabling streamlined data discovery and access within the notebook environment. To begin, we’ve extended the current suite of Jupyter OAuth2 authenticators with one using the Globus Auth platform. Building on this we’re continuing to incorporate Globus Transfer and other services, and we’ll describe and demonstrate our progress.

15:00—15:30 beverage break
Walnut Gallery
Lightning Talks
Walnut Ballroom

Bridging Compute and Storage Infrastructures
Ryan Prout, Oak Ridge National Laboratory  | slides

The Compute and Data Environment for Science at Oak Ridge National Lab is providing compute and data infrastructure resources, coupled with experts, to create a new environment for scientific discovery. The CADES goal is to continually develop an environment that allows researchers to share data, among local and distributed resources, easily and in a performant manner. Through the Science DMZ architecture we can start to connect and abstract different infrastructures by deploying workflows that utilize data portal tools, allowing us to achieve that cohesive and performant environment across the lab. This talk will give a preview of the CADES environment, the Science DMZ architecture, and the workflows we are helping develop which utilize the Science DMZ and Globus.

Science DMZ Patterns for the Modern Research Data Portal
Eli Dart, Network Engineer, ESnet, Lawrence Berkeley National Laboratory  | slides

This talk will describe the modern research data portal, and how it can be built using Globus and the Science DMZ. The architectural enhancements over the legacy data portal model will be discussed, as well as current scalability of Globus endpoints at large-scale HPC facilities.

Globus Automation
Preston Smith, Director of Research Services and Support, Purdue University  | slides

Shaping the User Experience
Vytas Cuplinskas, Globus User Experience Lead, University of Chicago  | slides

We interact with a wide range of objects and services every day. Some help us achieve our goals and even delight us while others impede our progress and raise our ire. We'll take a peek at how Globus approaches the process of improving the user experience.

Globus Professional Service Engagements
Rick Wagner, Globus Professional Services Lead, University of Chicago  | slides

Last year, Globus created a dedicated professional services team: engineers that can aid organizations and projects in leveraging the Globus platform-as-a-service in building and supporting custom integrations. Rick will discuss current engagements and the roles that the professional services team could play in your projects.

The Globus Python SDK
Stephen Rosen, Globus Software Engineer, University of Chicago  | slides

The Globus Python SDK provides a powerful suite of tools for interacting with Globus Services and, in particular, handling authentication and authorization via Globus Auth. However, these tools are pluggable, and can be used to handle the use of Globus Auth with your own APIs. The core abstractions of the SDK are a simple object model of Authorizers and Clients. Authorizers handle authorization and recovery from “Unauthorized” API responses, and Clients use Authorizers to authorize their requests to a service. Learn how to extend the SDK with Custom Clients, and see how Globus is using this capability to add support for new services.

Ask What We Can Do for YOU!
Greg Nawrocki, Globus Director of Customer Engagement and Support, University of Chicago  | slides

We will describe our customer engagement strategy and the programs we are developing for research computing centers to better educate their users about Globus capabilities.

Palladium (Ground Floor)
Wednesday, April 12
07:30—17:00 registration desk open
Walnut Gallery
7:30—8:30 breakfast
8:30—10:00 Led by: Vas Vasiliadis | slides

You will learn how to install and configure a Globus endpoint using Globus Connect Server. This session is targeted at system administrators, storage managers, and anyone who is tasked with maintaining Globus endpoints at their institution. The content will include a mix of presentation and hands-on exercises.

10:00—10:30 beverage break
Walnut Gallery
10:30—12:00 Led by: Vas Vasiliadis

This session is designed to address your specific Globus deployment issues. We will provide more detailed reviews of common deployment configurations such as multi-server data transfer nodes, using the management console with pause/resume rules, and integrating campus identity systems for streamlined user authentication.

12:00—13:30 lunch
13:30—15:00 Led by: Rachana Ananthakrishnan | slides

We will introduce the Globus platform and describe how you can use Globus services to deliver unique data management capabilities in your applications. This will include:

  • Overview of use cases: Common patterns like data publication/distribution, orchestration of data flows, etc.
  • Overview of the Globus platform: Architecture and brief overview of available services
  • Introduction to the Globus Transfer API: Make your first call and move data with Globus
  • Introduction to the Python SDK for using Globus Auth and Transfer

You will use a Jupyter notebook to experiment with the Globus Transfer API, using it to manage endpoints, transfer and share files. We will also demonstrate a simple, yet fully-functional, application that leverages the Globus platform for data distribution and analysis.

15:00—15:30 beverage break
Walnut Gallery
15:30—17:00 Led by: Rachana Ananthakrishnan

We will introduce the Globus Auth API and demonstrate how it is used in a sample data portal. You will learn how to register an application with Globus Auth, authenticate using Globus Auth's OpenID Connect API, and access various authentication and authorization functions via sample Python scripts. We will also demonstrate how to directly access files from an endpoint using the Globus Connect HTTPS Endpoint Server.

17:00 conference adjourns
18:00—19:30 poster session — hosted by Globus and NDS
reception with hors d'oeuvres and cash bar

NDS Workshop - April 13-14, 2017

The 7th National Data Service Consortium Workshop will be held at the Hotel Allegro immediately following GlobusWorld 2017. For more information please visit the NDS workshop site.

We are soliciting poster submissions to be presented at a joint session of the co-located NDS/GlobusWorld workshops, Chicago, April 12, 2017.

Gold Sponsors

OrangeFS Spectra Logic

Sponsor Prospectus

GW17 Sponsor Prospectus



Why Attend?

  • See how to easily provide Globus services on existing storage systems
  • Hear how other institutions are using Globus
  • Learn from the experts on the best ways to apply Globus technologies
  • Connect with colleagues and Globus developers

Past Events