|Wednesday, April 25, 2018 |
Sessions will be held in the Walnut Ballroom
|7:30—17:00||registration desk open
Steve Tuecke and Ian Foster Globus Co-founders
Steve will review notable events in the evolution of the Globus service over the past year, and provide an update on future product direction.
Lightning Talks: Innovations in Research Data Management
by Stanford, Mass General, UMichigan, NCSA, NCAR, and Harvard
Working with Data on The Farm: Challenges of Aligning Systems, Services and Expectations
In this presentation, I will discuss some of the challenges the Stanford Research Computing Center has faced in our effort to provide services, systems and capabilities to campus research teams whose appetite for and interest in acquiring, using, and analyzing vast amounts of data is seemingly insatiable. Data management tools and platforms like Globus play a valuable role in helping research teams—and us—be successful. But they aren't magic. Moving data around still takes time. Helping users have realistic expectations about performance, timing and potential bottlenecks associated with data flows is key, yet it's something we often neglect to communicate. Come share your ideas about how we can do better in aligning systems, services and expectations.
Serving Neuroscientific Studies Using Globus at the Massachusetts General Hospital
Neuroscientific studies are increasingly using multiple acquisition sites to leverage patients, expertise and technology that may not reside locally. In addition, the public sharing of data generated in these studies has become more prevalent, propelled by new rules by funding agencies (e.g. the National Institutes of Health) and increasingly, as a requirement for journal publication. The curation, storage and sharing of multi-modal neuroscientific data presents unique challenges for a data management system serving a multisite project. I will present use cases showing how Globus facilitates intra-project data movement and eventual data publication here at the Massachusetts General Hospital. I will also discuss lessons learned from these use cases.
Data Sharing Workflow for Large Datasets (>1TB) with Globus
In the Deep Blue Data (DBD) repository, built on Fedora/Samvera, at the University of Michigan, we have datasets ranging in size from KBs to TBs. Like many institutions, we are struggling with an uptick in file sizes (total data stored in DBD increased from 2.94TB in October to over 6TB by the end of January 2018). Until recently we were unable to download files larger than 20GB from DBD, which significantly hampered our ability to serve the researchers in our university. However, as of early November, we have started testing the integration of Globus (globus.org) with the Deep Blue Data interface as a means of downloading these larger files. My lightning talk will illustrate how the University of Michigan Library is using the integration of Globus with our instance of Fedora/Samvera work with large datasets. We will be using an actual case study of the workflow from researcher network storage to final Globus download endpoint. The code for this Globus integration is available on GitHub.
Adding Cloud Based Interactive Compute Capabilities to Globus Endpoints
The growing size and complexity of high-value scientific datasets are pushing the boundaries of traditional models of data access and discovery. Many large datasets are only accessible through the systems on which they were created or require specialized software or computational resources for re-use. In response to this growing need, the National Data Service (NDS) consortium is developing the Labs Workbench platform, a scalable, web-based system intended to support turnkey deployment of encapsulated data management and analysis tools to support exploratory analysis and development on cloud resources that are physically "near" the data and associated high-performance computing (HPC) systems.
Labs Workbench compliments the Globus architecture by bringing computation capabilities to data as well as eliminating the setup steps for running a globus personal connect endpoint for a development environment or for importing a remote dataset for integration with local data.
We will show how NDS Labs Workbench can be used to:
Globus Integration in the NCAR RDA Data Portal: Recent Enhancements
The Research Data Archive (RDA; rda.ucar.edu) at the National Center for Atmospheric Research (NCAR) contains a large and diverse collection of meteorological and oceanographic observations, operational and climate reanalysis outputs, and remote sensing datasets to support weather and climate research. The RDA contains greater than 700 dataset collections which support the varying needs of a large and continually growing user community. In FY 2017, 13,000 unique users downloaded greater than 2 petabytes of data products from the RDA, and customized data products were prepared for 50,000 user-driven requests.
The RDA portal leverages the Globus Python SDK to simplify data management and enable scalable workflows for its large community of users. Shared endpoint access to the full RDA data catalog is user-driven and fully automated, and Globus “push” transfers for custom data products are facilitated by the Globus helper APIs. This presentation will provide an overview of the RDA Globus integration and highlight recently added features that enhance user access to the RDA holdings, including integration of Globus Auth and the Browse Endpoint helper API to facilitate transfer of user-curated file lists and automated transfers of delayed mode products.
Modernizing Data Workflows from the Research Lab to the Data Center
For years, scientists have been using the latest technology to create and analyze data. In the past decade, the sophistication of scientific instruments can easily overwhelm the instrument-attached computer storage and processing power. Thus, a need arose for a cluster of computers to process data. This creates a type of data velocity that is both numerical and data intensive, and it creates a rate of research productivity that must be maintained from a storage and compute standpoint. In this talk, I'll present success stories where researchers have incorporated Globus shared endpoints to move data from the Research Lab to Harvard Research Computing for bulk storage and analysis.
Integrating Globus into a Science Gateway for Cryo-EM
Recent advances in cryo-electron microscopy (cryo-EM) have led to the widespread adoption of the technique to allow the determination of atomic protein structures. In order to use this powerful technique, however, scientists must use high performance computing (HPC) resources in order to calculate protein structures from terabytes of image data. We have built the COSMIC2 science gateway to provide researchers with access to the computing resources available on the Comet supercomputer located at the San Diego Supercomputer Center. In order to handle user authentication, terabyte-sized data uploads and data management, we have integrated Globus into the gateway. In this session I will discuss the challenges and lessons learned from that effort.
Please join a table for informal conversation on a topic of interest. Globus staff will be spread across tables to participate in discussions.
Alex Szalay, Bloomberg Distinguished Professor of Physics, Astronomy and Computer Science, Johns Hopkins University
The goal of the Open Storage Network project (OSN) is to create a robust, industrial strength national storage substrate that can impact 80% of the NSF research community, and offer a way to build a common basis for the Cyberinfrastructure of the MREFC projects. The challenge is more in social engineering than a technical one: how can one convince universities to embrace a more homogeneous, standardized data storage solution. The idea of the OSN is to have a very simple, standardized software stack with an Object Store on the bottom and Globus on top. The standardized hardware will help to eliminate current bottlenecks due to configuration differences. The proposed model appliance should read and write from disks at 100G speeds, and have a capacity of about 1.5PB at a price point of $100K.
Lightning Talks: Research Storage Trends and Tools
by UMichigan, Oak Ridge Lab, KAUST, and ESnet
A Tiered Approach to Research Storage Services
This talk will discuss the University of Michigan’s approach to building a catalog of central storage services to its researchers. We approach the problem from the perspective of the user: we need to provide storage that competes well in performance, usability, features and cost with options that researchers can provide for themselves (USB drives and desk-side NAS devices). We will also talk how each of our services fits into a traditional research data management lifecycle.
Using Globus' Shared Endpoints for Data Publication
The Oak Ridge Leadership Computing Facility developed a Digital Object Identifier (DOI) service that allows researchers to publish to DOE's OSTI's Data ID Services. The data resides at ORNL, but we use Globus to allow users to construct their data, workflow, and additional information before submitting to be published.
We have strict requirements to integrate with OSTI's Data ID Services, and we have a need to collect additional metadata for each DOI request. In order to facilitate this requirement, we are using multiple shared Globus endpoints to construct and retrieve DOIs.
DOIs generated at ORNL are searchable multiple ways including datacite and our internal service. Published data lives on the archival storage system, HPSS. We have on going work to make the datasets directly available from an HPSS shared Globus endpoint to reduce data movement.
Accelerating Science at KAUST with a Science DMZ and Globus
The King Abdullah University of Science and Technology has started a project to install a Science DMZ network enclave on their main campus network. Early work has focused on removing network friction, understanding wide area connectivity limitations, and identifying scientific users. This talk will discuss some of the technology and policy steps the University will take to support scientific users accelerate their research.
Lightning Talks: Best Practices for Research Data Management Infrastructure
by USaskatchewan, NCSA, Argonne, UMinnesota, UChicago, and Booth School of Business
Globus in Canada: National and University Perspectives
Compute Canada has deployed Globus file transfer and sharing tools at over twenty computational and storage-intensive sites across Canada, comprising a national research data transfer service. This talk will briefly describe the service and how Globus is leveraged there, with details on how the University of Saskatchewan and the Canadian Light Source are adopting Globus as part of their research data workflow. It will also briefly touch on the growth of Compute Canada's Federated Research Data Repository over the last year.
Compute Canada's national advanced computing platform integrates high performance computing systems, research expertise and tools, data storage, and resources with academic research facilities across the country. Compute Canada works to ensure that Canadian researchers have the advanced research computing facilities and expert services required to remain globally competitive.
Monitoring, Metrics, Dashboards, and Alerting for GCS - Experience from Blue Waters GCS Clusters
Presenting the metrics/monitoring/alerting system that NCSA implemented for the Blue Waters GCS clusters. The system gathers real-time metrics across the 28-node and 50-node GCS clusters for ncsa#BlueWaters and ncsa#Nearaline and provides system dashboards, real-time monitoring tools with drill-downs based on individual systems, users, and transfers, and implements alerting based on metrics values. We will give an overview of the tool-stack, the integration with GCS, and the customization of metrics and alerting. A short live-demonstration will show the dashboards, live-tools, and some of the valuable features of the system. The public repository with tooling and configurations will be published on Github for community use of the tools.
This will be the first publication of this work.
Advanced Photon Source (APS) Data Management
As the capabilities of modern X-ray detectors and data acquisition technologies increase, so do the data rates and volumes produces at synchotron beamlines. This brings into focus a number of challenges related to managing data at such facilities, including data transfer, near real-time data processing, automated processing pipelines, data storage, handling metadata, and remote user access to data. In this talk I will describe the Advanced Photon Source (APS) Data Management and Distribution System that is designed to help APS beamlines deal with some of those issues. I will discuss in more details how various Globus components fit into the APS system infrastructure, list some of the challenges that APS is facing, as well as outline plans for future work and system enhancements.
Globus: Helping Make Marketing the New Finance
The University of Chicago’s Booth School of Business is known for being strong in finance, but the school’s foundation in economics is what prepared our faculty to be the pioneers of quantitative marketing. Core to the mission of the Kilts Center for Marketing, one of twelve research and learning centers at Booth, is to elevate the school’s excellence in marketing. One way we do that is by facilitating research in quantitative marketing and economics around the world. Globus helps make that possible. In partnership with the Nielsen Company, the Kilts Center distributes several marketing datasets to hundreds of researchers securely and efficiently. Learn more about how we use Globus.
Globus and SpectraLogic Blackpearl at the University of Minnesota
In 2017, the University of Minnesota's Minnesota Supercomputing Institute launched a new tape backup service utilizing Globus for Spectra BlackPearl. This talk will update users on the Globus deployment at the University of Minnesota, discuss how users have been using the BlackPearl connector, and provide a look at the cost model for the system.
SSH with Globus Auth
Integrating SSH into the Globus universe lets users access remote services using any linked account while allowing providers to specify authentication and authorization standards required for their environment. Fully integrated with Globus Auth and built around the latest OpenSSH client—but with support for all clients—this new product performs token management on the user's behalf, passing the token as necessary during the authentication process. The product requires no SSH client or server modifications, and client installation can be performed without administrative privilege. This talk will cover product status and provide a preview of the feature roadmap.
Globus Customer Engagement: Ask What We Can Do For You!
You have to go beyond "build it and they will come" in order to realize the full value of Globus. Greg will provide an overview of the diverse programs we've developed for connecting with your researchers and system administrators and educating them on what's possible with the Globus service.
Palladium (Ground Floor)
|Thursday, April 26, 2018 - TUTORIALS |
Tutorials will be held in the Walnut Ballroom
|07:30—17:00||registration desk open
Led by: Greg Nawrocki, Globus Customer Engagement
We will demonstrate Globus capabilities from the perspective of a researcher, systems administrator, and application developer. This is a high-level introduction to all aspects of the Globus service, including the web application, command line interface, and platorm-as-a-service. You will learn how to share and publish data, and how to set up personal endpoints.
Office HoursLed by: Globus Team
Need more personal attention? Do you have a particularly thorny issue that you're unable to resolve? Stop by during "office hours" where Globus developers will be on hand to answer your toughest questions.
Led by: Vas Vasiliadis, Globus customer engagement
We will provide a detailed walkthrough of installing and configuring a Globus endpoint. We will also review deployment configurations such as multi-server data transfer nodes, using the management console with pause/resume rules, and integrating campus identity systems for streamlined user authentication. You will get to experiment with server endpoint installation using a virtual machine.
Led by: Greg Nawrocki, Globus customer engagement
We will demonstrate how you can incorporate Globus capabilities into your own data portals, science gateways, and other web applications that support data intensive research.
Led by: Rachana Ananthakrishnan, Globus product management
We will present various use cases that illustrate the power of Globus data sharing capabilities, and provide hands-on experience with the Globus file sharing APIs.
Led by: Rachana Ananthakrishnan, Globus product management
&nbbsp; | slides
We will review common use cases and demonstrate how the Globus command line interface (CLI) and API may be used to automate repetitive data management tasks.
Led by: Rick Wagner, Globus professional services
&nbbsp; | slides
Learn how Globus may be used in conjunction with the Jupyter platform to open up new avenues in interactive data science.
Led by: Kyle Chard, Globus Labs
Explore the various capabilities provided by Globus for assembling, describing, publishing, searching and discovering data sets, and learn how to integrate Globus services into your institutional repository and data publication workflows.
Led by: Steve Tuecke, Globus Co-founder
Learn how to use Globus Auth to provide authentication and fine-grained authorization for accessing your own services.
|Friday, April 27, 2018 – Customer Forum (by invitation only)|
|08:30—9:00||check-in and breakfast
Led by: Globus Team
The Customer Forum is an opportunity for Globus subscribers to discuss their experiences with the service, to learn about our product development plans, and to provide input on future product directions. Attendance at the customer forum is by invitation only. If you would like to represent your institution/community please contact us for an invitation.