NSF Workshop on Data-Centric Workflows
Arlington, Virginia
May 11-12, 2009                                                                                             Supported by NSF Grant IIS-0842993

Preliminary Findings of NSF Workshop on Data-Centric Workflow

May 3, 2009

        Workflow technology is central to the efficient and flexible management of business and government processes, including management of information, resources, personnel, manufacturing, transportation, healthcare delivery, and the analysis of scientific information. In the past few years, a data-centric approach to workflow has emerged, which has the potential of significantly enhancing, and in some cases supplanting, the traditional process-centric approach.

        This shift towards data-centricity of workflow is occurring along two broad dimensions. First, conceptual models for data-aware workflow are emerging in the business process management arena, and also arise more implicitly in healthcare delivery and digital government. These models elevate the data being manipulated by the workflows to a level of prominence essentially equivalent to the level given to process flow in conventional models. It appears that these models permit a unification of the conceptual models for workflow requirements, policies, activity flows, data management, and monitoring, which have hitherto been rather disparate. Second, the area of workflow as data has been growing in recent years, primarily through the lens of scientific workflow, and also in recent research aimed at business application. A key focus here is to be able to easily represent, store, and query both workflow schemas and enactments (i.e., "runs") of those schemas. This is useful for understanding the provenance or history of how data is produced of updated, for discovering and re-using workflow schemas, and for monitoring compliance of workflows with government or other regulations and policies.

        The workshop is centered on articulating the underlying tenets of data-centric workflow, identifying key advances already made in the area, and identifying research challenges that should be addressed in the coming years in order to maximize the value that can be gained by these new perspectives on workflow. The workshop will address both research in data-centric workflow from a general perspective, and from the perspectives of application in business, healthcare, digital government, and science.

        A synopsis of some discussions held by the workshop participants in advance of the workshop is presented here, in order to provide a preliminary indication of the kinds of results anticipated from the workshop.

Data-aware Workflow

        Current practices in the design, deployment, and maintenance of workflows, especially in the realm of business applications, are fundamentally disjointed, because different conceptual models are used for the different aspects of managing business operations. Four key dimensions in business process management are:

  1. Day-to-day management of operations;
  2. Attempting to align operations with business goals, policies, rules, and requirements;
  3. Measuring business performance and driving improvements to operations; and
  4. Ensuring that operations are complying with government and other regulations.

        In most cases, a different conceptual model is used for each of these aspects. For example, a model based on activity flows (e.g., BPMN) is used for (1); a separate vocabulary and a language based on requirements and rules (e.g., SBVR) is used for (2); a relational database perspective along with data mining and analytics are used for (3); and ad hoc techniques for linking the government regulations to the activity flows are used for (4). It appears that the continual mapping between different conceptual models leads to massive inefficiencies in the management of businesses and other organizations. Further, the use of activity flows as the basis often makes it challenging or impossible for business executives to gain a good understanding of how their business is operating, and for managers of different parts of a business (e.g., from different regions, or working with different aspects of the overall operations) to effectively communicate with each other and come to consensus about common interfaces or the larger-granularity common goals that they are working towards. Similar disparities between conceptual models arise in the application areas of healthcare and digital government.

        A promising approach to overcome these issues is the development of new workflow models, which emphasize the data aspects of organizational processes and operations at a level comparable to the current emphasis on activity flows. Several examples of such data-aware workflow models have already emerged and shown value, e.g., document engineering, data services as an enhancement of the Service Oriented Architecture, proclets, and business objects. Of particular interest is IBM's notion of business artifact, perhaps the first to introduce as the basis for workflow management a tight coupling of a data structure, which corresponds to a key business entity, along with a lifecycle specification, which describes the possible sequencings of tasks that might occur to the business entity as it passes through the workflow. Similar constructs are emerging in application domains, including frameworks in healthcare that are centered around patients, around patient "cases", and around patient visits into hospitals. Indeed, several innovative companies have begun to release products and services based on data-aware workflow models, including FlowConnect, BizAgi, Pallas Athena, and IBM in the general services infrastructure arena and Lifecom in the healthcare arena.

Workflow as Data

        In a variety of contexts and application areas, it is useful to understand how a workflow is performing its activites, and even manipulate the ways that it is performing them. For example, there are the following needs.

  1. Query large collections of workflow schemas and large numbers of enactments of workflows. This was first studied in the context of scientific workflow, and more recently in the business context. It appears to be extremely important in healthcare, and also in digital government.
  2. Understand and work with the provenance (or history) of how and why a workflow enactment progressed. This is related to topic (a), but has a somewhat different focus that centers around the history of data sets being produced, and also individual data items that are produced or updates that are made.
  3. Design new workflows from existing ones. Early work here was again in the scientific workflow area, but this capability is now seen as relevant in healthcare delivery, business, and digital government.
  4. Understand relationships between workflows, e.g., does a new version do things that the previous version did?
  5. Process mining. Develop (semi-)automated tools to help to understand large sets of enactments, by characterizing a workflow schema that could have generated them.

        The field of scientific workflow was been developing techniques for understanding provenance, and querying and manipulating scientific workflow schemas for several years. This has enabled significant advances in how scientists are able to document the techniques used in computationally-intensive experiments, and to vary the ways that data is manipulated in order to best show and understand its hidden meanings.

Research Challenges in Data-Centric Workflow, Considered at a general level

        While some progress has been made in data-centric workflow, this field is still quite young, and many research questions need further study in order to maximize the potential value of this new perspective. Some of the main research themes are now listed.

        The first themes listed relate primarily to data-aware workflow.

W1: Conceptual Models. The field should continue to invent, extend, refine data-aware/data-centric workflow models, and especially those based on a tight coupling of data schema and lifecycle specification.

W2: Foundations. Study fundamental properties of these models (e.g., analysis, synthesis, expressive power, views, interaction, and perhaps something analogous to database normal forms).

W3: Systems Issues. Study approaches to architecture and implementation of data-aware workflow, including optimization, distribution, security, monitoring and reporting.

W4: Enabling Richer Semantics. Understand the implications of using ontologies rather simple data schemas in data-aware workflow models; incorporate techniques from semantic web services for auto-discovery, auto- composition, auto-monitoring; develop a much deeper understanding of the semantics underlying OMG's Semantics for Business Vocabulary and Rules (SBVR).

W5: Ecosystem Enablers. In the context of data-ware workflow: enable workflow schema design, evolution, and variations; manage workflow evolution in the context of "in-flight" enactments; manage complex events; incorporate people and performers in the spirit of BPEL4People; incorporate security; and explore whether data-aware enables new approaches to managing exceptions.

        Some themes relevant primarily to workflow as data are now listed.

W6: Querying of workflow schemas and enactments. Continue with paradigms/frameworks/techniques for querying and manipulating both workflow schemas and workflow enactments. Is there an "algebra" or "calculus" for building workflow schemas from other workflow schemas? Extend results from scientific workflow to other application areas, where workflows generally have side-effects.

W7: Provenance, both coarse- and fine-grained. Find ways to combine the coarse-grained provenance research in scientific workflow, and the fine-grained provenance work from the database community.

W8: Temporal aspects of workflow. Workflow schemas describe how data and processes are to occur over time, and workflow enactments include a history of what actions were taken through time. The common approaches to temporal databases do not appear well-suited for the kinds of querying, discovery, and manipulations that are needed in the context of workflow as data. Approaches such as PatternSQL, which enable queries over sequential patterns appear to have promise.

Research themes in Science

        While research in scientific workflow is somewhat advanced, and has already found its way into practice, this has served to stimulate the need for more research and a deeper understanding of these workflows. Some key research themes are now listed.

  • S1: Querying and storing provenance.
  • S2: Integrating coarse- and fine-grain provenance.
  • S3: Integrating provenance derived from multiple tools.
  • S4: Provenance analytics.

Research themes in Business

        Some key research themes in the business application area include the following:

  • B1: Unifying the currently disparate approaches to different functionalities needed around business processes.
  • B2: Business agility.
  • B3: Towards a next-generation architecture for service management.

Research themes in Digital Government

        Some of the unique aspects of digital government are (a) the government must serve everybody; (b) the government is made up of numerous jurisdictions, many of which overlap; and (c) the government has highly sensitive information and is obligated to keep it secure and honor. There is also the issue of scale: taken in aggregate, the governments of larger nations arguably hold much more data and perform much more processing than any other kind of organization.

        Some key research themes in this area include:

  • G1: Making government information widely accessible
  • G2: Regulations and Compliance
  • G3: Case support across jurisdictions

Research themes in Healthcare Delivery

        Some key research themes here include the following.

  • H1: Large volumes of highly varying protocols and outcomes
  • H2: Generic vs. specialized workflows
  • H3: Enabling rich decision support
  • H4: Providing a framework understandable to clinicians

High-level recommendations of the workshop

        The workshop will also make some high-level recommendations for the field of research into workflow taken as a whole. Some tentative recommendations are as follows.

  1. In society, workflows are everywhere but seldomly well supported. Our society at large, and many application-specific areas within it, would benefit greatly if a technology emerged for managing personal or organizational processes and workflows that could become as well understood and pervasive as spreadsheets are today.
  2. The emerging approaches to make workflow more data-centric, in terms of both data-aware workflow models and workflow as data, are providing fundamentally new approaches to understand and automate organizational processes, and hold the potential of a disruptive improvement in our ability to understand, design, and manage organizational processes.
  3. There are substantial opportunities for cross-fertilization of workflow management techniques between different application areas, especially in the area of bringing techniques from scientific workflow to business, healthcare, and digital government, and bringing techniques from business workflow and services infrastructure to healthcare and digital government.

Page maintained by su (at) cs.ucsb.edu