Past Events – 2013

GlobusWorld 2013 was held April 16-18 at Argonne (near Chicago Illinois, USA). The agenda with links to presentations can be found below.

Tuesday, April 16 – Tutorials
11:00—17:00registration desk open
11:30—13:00lunch
TCS Foyer
13:00—14:15 Led by: Steve Tuecke, Raj Kettimuthu, Dan Morgan  | slides

We will provide an overview of Globus Online and the Globus Toolkit, with demonstrations of key functions such as file transfer, dataset sharing, and endpoint setup.
14:15—14:30beverage break
TCS Foyer
14:30—15:45
  • Led by: Steve Tuecke, Karl Pickett  | file

    We will explore advanced usage of Globus Online via the Command Line Interface (CLI), and how to use scripts for controlling Globus Online operations.

  • Led by: Raj Kettimuthu, Lukasz Lacinski  | slides

    We will walk system and network administrators through the process of using Globus Connect Multiuser to easily configure a lab server, campus computing cluster, or other shared resource for file transfer and sharing.
15:45—16:00beverage break
TCS Foyer
16:00—17:15
  • Led by: Bryce Allen, Dan Morgan, Mattias Lidman  | tutorial

    We will teach developers how to use the Transfer REST API, for programmatic interaction with Globus Online. Examples will demonstrate using the Transfer REST API to integrate Globus Online with Java and Python clients and Web–based portals.

  • Led by: Raj Kettimuthu, Lukasz Lacinski  | slides

    We will teach system and network administrators advanced procedures for configuring Globus Online endpoints.
 
Wednesday, April 17
7:30—17:00registration desk open
7:30—8:30continental breakfast
Main Hall
8:30—10:00 Ian Foster, Globus co-founder  | slides

The Globus ecosystem has grown and matured substantially during the past year. Ian Foster will review notable deployments and provide an update on the most significant changes, including the launch of new services for data sharing and the introduction of a new model for sustainability.
10:00—10:30beverage break
Foyer
10:30—12:00
  • Globus Online Experiences
    Main Hall

    Led by: Steve Tuecke
  • Lee Taylor, Infrastructure Systems Team Leader, University of Exeter  | slides

    You know that your digital research data is by the very nature of your organization distributed across many computers on many sites in many geographic locations and is sometimes what is often now referred to as BIG DATA. Your institution has recognized that there is a need to secure any completed research data in a central archive driven by funding body mandates and general good practice. It has also invested heavily in a leading enterprise storage solution and you have long since been using industry standard repositories for storing research papers and other relatively small discrete data sets. So how do you go about connecting up the dots and enabling your users to deposit any data they have in any location irrespective of file size or type to your repository?

    Here we will take you through why we decided to turn to Globus Online to help us solve this problem and how we achieved a working solution. We will demonstrate our solution to you and talk a little bit about how we have further plans to use Globus as a more general file transfer and sharing service across Exeter.
  • David Skinner, OSP Group Leader, NERSC  | slides

    The transformation of science toward large distributed teams and facilities level scientific devices has required new thinking to coordinate the many people and resources required for large scale projects. For the first cyclotrons this meant embracing machine shops, innovative engineering, and business level planning as fundamental to successful large scale projects. Big Science is an enterprise, in the sense well known in corporate business, but with goal of maximizing the value realized from scientific discovery. While in the business world the internet has transformed how companies interact, collaborate, and leverage their resources, too much of the scientific enterprise remains hidden behind interfaces which are very 21st century. In consequential ways, science is lagging the progress made in the business world building infrastructure which makes the internet deliver on its value propositions. Building reliable pipes to connect scientific resources and disseminate data is often still manual and lacking a programmatic interface. Science focused web Application Programming Interfaces (APIs) provide a path forward that has broad potential benefits to science teams both large and small, their management stakeholders, and the public.
  • Brock Palen, Senior HPC Systems Administrator, University of Michigan  | slides

    Space, Speed, Security: the three S's for approaching centralized research storage. Centralized resource providers at research institutions are faced with handling larger quantities of research data, forcing new approaches in management of petascale filesystems and backups. These larger data requirements necessitate the creation/adoption/support of user-friendly transport methods that perform better than traditional tools, while maintaining higher levels of security (for authentication and client connections) to manage growing quantities of restricted data.
  • Matthias Hofmann, Robotics Research Institute, TU Dortmund  | slides

    We will report on two scientific scenarios in which we employed the Globus Toolkit and Globus Online: First, we have designed an emerging platform for searching for cancer biomarkers. The platform is to be used for clinical research, for example. The second scenario is related to computational neurosciences. We have been involved in the development of a framework for dynamics analysis of simulations of large biological neural microcircuits.

    The two scenarios share a few common points: the core development team, the choice of computing technology (Globus), and the choice of data movement solution (Globus Online).
12:00—13:30
Main Hall
lunch
a word from our sponsor — EMC Isilon  | slides
13:30—15:00
  • Globus Product Sneak Previews
    Main Hall
  • Carl Kesselman, Globus co-founder  | slides
    Kyle Chard, Researcher, University of Chicago

    Datasets provide an abstraction to shield researchers from the complexities of dealing with many files, directories, and records. A Dataset represents a logical collection of data elements that may be stored in files, database tables or metadata. It provides an intuitive unit for thinking about the many interactions that occur during the data lifecycle: operations such as copying data, applying analysis programs, managing permissions, and versioning can be performed on logical datasets rather than individual files. In this session we describe the need for Dataset management and demonstrate an early version of Dataset management and metadata capabilities in Globus Online. These capabilities enable researchers to create, manage, describe, and share Datasets, metadata about these Datasets, and the files and directories that make up a Dataset. We will also demonstrate the ability to create Datasets automatically based upon metadata contained in common data formats, and highlight integration with other Globus services.
  • Paul Davé, Director, User Services, Computation Institute  | slides
    Ravi Madduri, Senior Fellow, Computation Institute
    Dina Sulakhe, Researcher, Computation Institute

    In this talk, we will present Globus Genomics which is a robust, scalable, and flexible solution that provides end-to-end research data management for Next Generation Sequencing Analysis using Galaxy, Globus Online and Amazon Web Services. Globus Genomics combines state-of-the-art algorithms with sophisticated data management tools, a powerful graphical workflow environment, and a cloud-based elastic computational infrastructure. The emphasis is on providing researchers with a high degree of flexibility to inspect, customize, and configure NGS analysis tools and workflows, and share findings with collaborators spanning multiple organizations. We will describe our experience addressing the critical requirements for sequencing analysis in exploratory research that are not effectively addressed by current approaches. We will discuss use cases from University of Washington, University of Chicago, Washington University at St. Louis and highlight their specific challenges, the solution that was developed, and the impact on their research.
15:00—15:30
TCS Foyer
beverage break
15:30—17:00
  • Globus Online Tech Deep Dives
    Main Hall

    Led by: Raj Kettimuthu
  • Eli Dart, Network Engineer, ESnet  | slides

    ESnet is a nationwide network that provides high-bandwidth, reliable connectivity linking tens of thousands of scientists at more than 40 DOE labs and facilities with collaborators worldwide. ESnet has begun working with scientists at the Advanced Light Source at Berkeley Lab who are seeing massive increases in the data output of their experiments and who now require HPC and network resources to support their research.

    This talk details ESnet's work with an ALS scientist whose beamline instrument upgrade led to the production of data at a rate of 300 megabytes per second - 50 times faster than the previous instrument. Keeping up with this torrent of data required new methods for moving, storing, and analyzing data. ESnet staff worked with the beamline scientist to deploy new infrastructure based on the 'Science DMZ' architecture where data-intensive science applications are run on dedicated infrastructure specifically configured for high performance.

    Leveraging Globus Online as the data primary data transfer tool on the Science DMZ, the beamline scientist now seamlessly transmits data to DOE's National Energy Research Scientific Computing Center (NERSC) in Oakland, where the data is stored, managed and shared with other researchers.
  • Jim Basney, Senior Research Scientist, NCSA  | slides

    CILogon supports authentication from over 75 InCommon identity providers to Globus Online and other research services, enabling users to access these services using their existing campus credentials through the InCommon Federation. Additionally, the open source OAuth for MyProxy software enables authentication to Globus Online from Argonne LCF, Exeter, NCSA Blue Waters, and XSEDE, via the same OAuth protocol used by CILogon. OAuth for MyProxy provides an OAuth protocol front-end to the Globus Toolkit's MyProxy software, so users can delegate credentials from MyProxy to web applications (such as Globus Online or other science gateways and portals) without exposing their MyProxy passwords to those applications. This presentation will describe the current capabilities of CILogon and OAuth for MyProxy, discuss how these technologies are currently being used, and solicit input on future plans.
  • Parag Mhashilkar, Technical Lead, Fermilab  | slides

    The High Energy Physics community (HEP) is a pioneer in Big Data. Throughout the years, to support its mission, the community has relied on the Fermi National Accelerator Laboratory to be at the forefront of data movement and storage technologies. To give a sense of the scale, the Fermilab experiments have written more than 94 PB of data to storage. In terms of data movement, to support state-of-the-art experiments such as the Compact Muon Solenoid, Fermilab regularly reaches peaks of data transfer rates over the WAN in and out of the laboratory of 30 Gbit/s and on the LAN between storage and computational farms of 160 Gbit/s. To address these ever increasing needs, as of this year Fermilab is connected to the Energy Sciences Network (ESNet) through a 100 Gbit/s link.

    For the past two years, the High Throughput Data Program has been using the ESNet 100G Testbed to identify gaps in data movement middleware typically used by the HEP community when transferring data at these high-speeds. The work is conducted as a collaboration with internal and external organizations, including the Illinois Institute of Technology and IBM research. These evaluations include technologies, such as GridFTP / GlobusOnline, XrootD, Squid, and NFSv4. The resulting outcome is provided as input to the middleware developers and ultimately influences the adoption of the technologies at the laboratory. This talk presents the highlights of these evaluations, compares and contrasts the technologies studied, and focuses on the efficiency of GridFTP and GlobusOnline for data transfers over a 100G network for the full range of data sizes.

    The presentation shows latest results from the studies done on using different data movement middleware, particularly GridFTP and NFS on High Speed networks. The resulting outcome of the tests provides as input to the middleware developers and ultimately influences the adoption of the technologies by different communities.
  • Shreyas Cholia, Computer Systems Engineer, NERSC/KBase  | slides

    The Department of Energy Systems Biology Knowledgebase (KBase) is a software and data environment designed to enable researchers to collaboratively generate, test and share new hypotheses about gene and protein functions; perform large-scale analyses on our scalable computing infrastructure; and model interactions in microbes, plants, and their communities. It permits secure sharing of data, tools, and scientific conclusions in a unified and extensible framework that does not require users to learn separate systems.

    Although the integration of different data types will itself be a major offering to users, the project is about much more than data unification. KBase is distinguished from a database or existing biological tools by its focus on interpreting missing information necessary for predictive modeling, on aiding experimental design to test model‐based hypotheses, and by delivering quality‐controlled data. The project leverages the power of cloud computing and high‐performance computing resources across the DOE system of labs to handle the anticipated rapid growth in data volumes and computing requirements of the KBase.

    KBase and Globus Online have formed a collaboration that simplifies KBase's account management by providing simple, reliable mechanisms for creation and ongoing management of user accounts and groups. The Globus identity platform, which includes Authentication and Authorization components, enables KBase to handle linked and multiple accounts, as well as both self-organized and administrator-controlled groups. Globus allows KBase users to transparently log into multiple KBase resources, all using a single identity.

    Future plans for the collaboration include the creation of a Globus Online plugin for KBase technology (Shock) to facilitate high-speed, large data transfers.
17:15
TCS Foyer
depart for party
18:00—21:00Dinner With the Stars
Adler Planetarium

Join us as we watch a sky show and peek through one of the many telescopes at the Adler Planetarium—a unique venue for this year's dinner party.
 
Thursday, April 18
7:30—12:00registration desk open
7:30—8:30continental breakfast
Main Hall
8:30—9:30 David Lifka, Director, Center for Advanced Computing, Cornell University  | slides

The Cornell Center for Advanced Computing (CAC) evolved from the former NSF-funded Center for Theory and Simulation in Science and Engineering. After the Theory Center's NSF mission ended, the Center went through various phases of trying to identify new sources of funding and right-sizing its staff and services. In 2006 David Lifka, working with the University Provost, Vice Provosts, Deans and key faculty members, came up with a new operational model—the Center became a core service facility with an 80% cost recovery requirement.

Cost recovery for a center that had been providing advanced computing and consulting services to Cornell researchers for free for nearly 20 years was no easy undertaking. The Center had to quickly launch new services and work closely with faculty to ensure that those services were truly in demand. Providing services under a cost recovery model requires an ongoing needs analysis in order to project utilization levels and offer services at a fee users are willing to pay. While services are often based on leading edge technologies, at times, commodity-based solutions provide just the right value that researchers are looking for. By definition, all CAC services are right-sized to the needs of the Cornell community. Services that can be provided more efficiently by central IT or external providers are leveraged and recommended. Identifying economies of scale and scope that provide enterprise class computational and data analysis services at extremely competitive pricing is essential to the continued success of the Center. Emerging and, in some cases, disruptive technologies such as cloud computing are always on our radar.

Lifka will discuss the transition of CAC to a cost recovery model, lessons learned, and how new services such as Red Cloud and Globus Online Sharing are providing Cornell researchers with the resources they need at prices they can afford.
9:30—10:00 Steve Tuecke, Globus co-founder  | slides

The Globus Online product roadmap continues to evolve, driven by the needs of our growing user community. Steve Tuecke will review the features and services planned for release over the next 12-18 months.
10:00—10:30beverage break
Foyer
10:30—12:00
  • Resource Provider Spotlight
    Main Hall

    Led by: Rachana Ananthakrishnan
  • Birali Runesha, Director of Research Computing, University of Chicago  | slides

    Science is increasingly data driven, computational, and collaborative. A consequence is that researchers need ever-more sophisticated infrastructure to manage research data throughout its lifecycle. Providing an effective campus infrastructure to support these researchers poses a considerable challenge for individual campuses. In the latter half of 2012 the Research Computing Center (RCC) introduced a new computing environment and storage platform to the University of Chicago research community. In the months since introduction 250+ users have transferred or generated 400 terabytes of data in the form of more than 50 million files. The Globus Online transfer service played a key role in permitting the RCC to cope with the substantial ingest associated with introducing a new storage service to satisfy the pent up demand of a computing-heavy academic institution.

    We will discuss how Globus Online is implemented at RCC using UChicago authentication services, case studies and the methods used to enable seamless bridging between local and remote storage and computing facilities. The solution implemented within RCC makes heavy use of Globus Online that provides research data management functions using software-as-a-service approaches. We will also discuss testing results of the new Globus Online sharing features.
  • Carina Lansing, Software Architect, Pacific Northwest National Laboratory  | slides

    The Chemical Imaging Initiative at Pacific Northwest National Laboratory (PNNL) is working to provide high performance analytical capabilities for molecular-scale imaging. One particular challenge has been the analysis of terabytes sized images which are generated remotely, such as at Argonne National Lab (ANL)’s Advanced Photon Source (APS). The multi-phase analysis cannot be performed at the experimental facilities as it requires customized high performance computing facilities. In the past, scientists were forced to transfer data via hard drives that were shipped from the source, resulting in the full analysis taking weeks to complete. On site, scientists were only able to process a hand full of images from each sample, resulting in limited information that could be used to shape the settings and processing of subsequent samples. Our scientists needed a solution that would allow them to analyze sample images remotely in near-real-time. With the help of the Globus Online API, we created a streaming data transfer tool that pipes data directly from APS to PNNL’s Institutional Computing (PIC) facility, enabling scientists to analyze the full sample at PNNL within minutes of completing the instrument run. In this talk we will describe our streaming transfer tool and data analysis architecture, our initial user experiment to test this design, and the results, performance, and lessons learned from this experiment.

    This presentation will describe how the Globus Online API can be incorporated into custom tools to enable near-real-time analysis of remote data. It will also describe our experiences with Globus Online and its REST-based API, how it performed over a 3-day user visit to APS where over one terabyte of data were streamed to PNNL, and suggested improvements for future development.
  • Steve Tuecke, Globus co-founder  | slides

    We will review and compare the features of our recently announced Provider Plans and Plus sharing plans.
12:00—13:30
Main Hall
lunch
13:30—15:00
  • Globus Community Updates
    Main Hall

    Led by: Stuart Martin
  • Matthias Hofmann, Research Associate, TU Dortmund University  | slides

    The European Globus Community Forum (EGCF) is the organizational body of the Globus community in Europe. Its goal is to boost an integrated approach to collaboration on Globus development and to provide an organizational platform to foster cooperation within Europe and beyond. Its members are users, administrators, and developers who are applying the Globus Toolkit as their middleware or are interested in doing so.
  • Alan Sill, Vice President of Standards, Open Grid Forum  | slides

    The Open Grid Forum (OGF) serves the needs of advanced high performance distributed computing developers, providers and users to provide open-access venues to discuss the most current available methods in the field. It also provides a forum and methods to document, share, publish and support implementations of these methods with completely open processes.

    This talk will highlight recent successes in advanced distributed grid, cloud and other federated computing projects and highlight new methods, giving specific examples of ways in which OGF standards and OGF work are being carried out in partnership with other standards development organizations and with the community at large. This work has led to real, measurable progress that can easily be adopted by others, and will provide a roadmap and a template for ways in which this progress can be replicated in the future.
  • Amit Chourasia, Visualization Services Group Lead, SDSC

    Computational simulations have become an indispensable tool in a wide variety of science and engineering investigations. With the rise in complexity and size of computational simulations, it has become necessary to continually and rapidly assess simulation output. Visualization plays an important and critical role for qualitative assessment of raw data. The result of many visualization processes is a set of image sequences, which can be encoded as a movie and distributed within and beyond the research groups doing the simulations. The movie encoding process is a computationally intensive, manual, serial, cumbersome and complicated process as well as one that each research group must undertake. Furthermore, sharing visualizations within and outside the research groups requires additional effort. On the other hand, the ubiquity of portable wireless devices has made it possible and oftentimes desirable to access information anywhere and at anytime, yet the application of this capability for use in computational research and outreach has been negligible. We are building a cyberinfrastructure to fill these gaps by using a combination of hardware and software. SEEDME will enable seamless sharing and streaming of visualization content on a variety of platforms from mobile devices to workstations making it possible to conveniently view the results and provide an essential yet missing component in computational research and current High Performance Computing (HPC) infrastructure.

    Data movement is one of the core areas for the SeedMe project. We anticipate including Globus tools in our infrastructure for data transfer to provide robust and reliable data delivery. Accessible visualization is a key necessity for many large projects which is currently missing in HPC currently. We are aiming to fill this critical gap. (Project site: www.seedme.org)
  • Ezra Kissel, Researcher, Indiana University  | slides

    The Globus XIO framework provides an ideal environment for extending the capabilities of Globus applications, and in particular the GridFTP implementation. We have developed an XIO-XSP driver based on our eXtensible Session Protocol to provide detailed instrumentation of GridFTP transfers. In addition, the driver enables on-demand configuration of dynamic network environments, including those using technologies such as OSCARS and OpenFlow, to support high-demand GridFTP flows.

    This talk will describe the results of our XIO-XSP driver development and the integration of the NL-calipers library to enable scalable, high-frequency event logging within GridFTP. Details about our ongoing integration with Globus Online, DYNES, and experiences from our Science Research Sandbox testing at SC12 will also be presented.
15:00adjourn
 

Important Dates

Gold Sponsors

OrangeFS Spectra Logic

Sponsor Prospectus

GW17 Sponsor Prospectus

Past Events