What is SOA?

Posted by

Introduction

This is a white paper on the definition of SOA (Service Oriented Architecture) and what ecosystem it fits in, and issues that are relevant to people at various stages of adopting this paradigm. To learn more about SOA, you can read this book on Safari Books Online.

Web services != SOA

Just because an organization has web services, that doesn’t mean that they have adopted a Service Oriented approach to building software. In addition to technical service implementations, which may be in the form of web services, or JMS collaborations, or CICS transactions hosted in a mainframe, there has to be a higher business level abstraction layer – implemented minimally by having re-usable business services (with a coarse grained interface), or more elaborately with business processes orchestrating various technical and business services. A more mature SOA implementation would have registry and repository, and various other components in the SOA ecosystem that govern the various aspects of a service (design, implementation, testing, security, etc) as it goes through various lifecycle stages (inception, design, implementations, sunset, etc). So there really are two moving targets: as people adopt SOA, they will evolve in their maturity of adopting this methodology, and the services that they build will change as well, as they progress through this journey.

Abstraction and Reuse (what they mean in an SOA context)

For people who are used to traditional SDLC (develop, build, test, deploy), the term reuse can mean something quite different than what reuse means in an SOA context. In an SOA context, reuse has nothing to do with reusing code, or artifacts – it has everything to do with leveraging the same business service interface or business process interface! In SOA, reuse doesn’t mean copying code, it means reusing a service that’s already deployed, and tweaking what it does exactly via security, and runtime policies.

This is a very important distinction to make. Which leads me to why most SOA implementations are doomed to fail before they even start – most people make the mistake of not taking the time and effort to create business process/service interfaces that :

  1. are at a high or abstract level – so that implementation details and artifacts don’t show up in the interface itself (you don’t want to pass database IDs in a business service/process interface)
  2. reflect the business function that this service/process represents. Instead of thinking of moving data from one place to another, and applying a transformation to it, think in terms of what business function or value this service provides. So, a really important rule to get started is the following – make sure that you’ve put enough thought into what your business does, and what services should be the natural outcome of this. Then think about the ecosystem and what tools and platforms will come into play when trying to implement this design. Most people spend too much time and money implementing flawed designs in SOA, and then blame SOA for it.

SOA Ecosystem

Here is a list of components that exist in the SOA ecosystem:

ESB

Enterprise Service Buses are a new take on the old idea of an app server. They are typically implemented to have a web services stack implementation (like BEA’s ALSB). There are older implementations from Progress, etc. that consider this a non-hub and spoke container of some EAI and App Server functions. In an SOA context, ESBs support deploy and publish, where service metadata is published to a registry when it’s deployed. Additionally, some ESBs allow security information to be created as a policy as well as routing rules (runtime policy) without impacting the implementation of a service. This functionality overlaps with Runtime Intermediaries (like Actional and Amberpoint).

Runtime Intermediary

Amberpoint and Actional are examples of runtime intermediaries. They allow a service implementation to be reused, by not attaching authentication, authorization, security, and routing rules into the implementation of a service (or it’s interface) and perform this function at a higher level. Most intermediaries then to use a registry to either discover existing services in an environment, or they use the registry to publish metadata on the services they’ve discovered in an environment. They can also pass bits and pieces of their runtime policy to a registry using Policy to UDDI mapping (some parts of the policy are copied into tModels). However, there aren’t standards for this – since the policy can contain assertions which are vendor specific, and not all registry vendors can agree on what these UDDI to Policy mappings should look like. Infravio has SOALink, and Systinet has GIF (Governance Interoperability Framework).

Registry, Repository

UDDI was an early attempt to catalog and categorize all the services that exist in your environment. It has a few failings. You can’t upload anything to the registry, and you can only put information in it as tModels, which are incredibly restrictive to get any kind of complex information in there. So, to address it’s shortcomings, registry vendors started creating repository products that overcome these issues. And then they added more features and functionality to the repositories. However, the repositories don’t conform to any standards (like UDDI registry has to). Some repositories expose UDDI interfaces, so that you can view them as if they were registries – but if you want to access their repository only features, then you have to work with their native API. The Registry/Repository vendors are pushing customers to move away from using the Registry to store all information about their services, and just relegating it to perform the function of basic interoperability with other SOA Ecosystem components. Also, for Governance, these vendors are trying to position their SOA Repository products as a platform that other Ecosystem components can report information into (like testing, management and monitoring, runtime, and security data).

Security appliances and products

Companies like Datapower have created dedicated security appliances that do what ESBs and Runtime Intermediaries can do with security policies, except that they accelerate it in hardware. These security appliances interact with Registries very similarly to Runtime intermediaries. Some vendors support some WS Security to tModel mappings, but those aren’t widely adopted.

Maturity of SOA Adoption

Most people who are implementing SOA right now are not at the level of maturity where they have high expectations from vendors to provide tight integration into their ecosystem. In fact, their expectations are quite low on this front. However, there are some use cases that can be appealing to people who are just getting started, and people who are well past the first stage of adoption. Here’s how I would generalize these phases:

Phase 1

What technical services do I have? How do I catalog them?

The answer is to use a runtime intermediary that discovers services, and brings them under management, and publishes this information to a registry. A successful outcome of this phase is that people know what services exist in their environment.

Phase 2

Who should be able to read/write to/from the registry? How to integrate this with existing SDLC? How to attach additional metadata to the services?

The answer is to use a Registry and Repository and ensure that all the products in the SOA ecosystem have some basic UDDI interoperability capabilities. The Repository comes into play when additional meta data needs to be captured about a service. This is rudimentary SOA governance. At this phase, if people are used to uploading new services when they are created, then the culture of collaboration has been adopted, then the next phases are attainable.

Phase 3

Now that I know who is reading/writing to/from the registry and I have metadata, how do I get a better understanding of how the various services are being affected by runtime intermediaries, security appliances, and ESBs that I have on my network?

This is a slightly more evolved version of SOA governance than in Phase 3, because the requirement is for all the products in the SOA ecosystem to at least report what they are doing to a service, and have a copy of this information available in one place. In this phase, a testing product would be required to interface with the registry and find out what new services have been added, and should be tested, and when these results are available they should be published to the registry. For tighter integration, it’s possible to interface with the SOA Repository directly, in addition to the basic UDDI interface (UDDI may still be necessary in order to communicate with other products that don’t know how to interface with the Repository). Products aren’t able to do implement this Phase of adoption.

Phase 4

How do I tie design time policy, such that as services move through their various lifecycle stages, certain things are automated – like validating that services must have documentation and test cases associated with them, before they can move from Inception to Testing. When test results come back, then the service can move to it’s next phase…

There is a custom workflow involved here (which will change depending on the organization that’s implementing this). Products aren’t really ready to implement this Phase as of today.

Only Phase 1 and 2 can be implemented by most products today. However, they don’t all do it, and there are issues in how they do it as well. Most customers are at Phase 1, and some are slowly migrating to Phase 2, but very very few are even close to Phase 3. Phase 4 is what the future holds to be.

Introduction to UDDI Registry and Repository

The following section provides a brief overview of what a UDDI registry is and the limitations in the integration options that are available with them. A registry is a piece of software that contains metadata around service information. An example of this in the Java environment is RMI Registry. SOA registries are conceptually very similar – they provide a way to encode what the SOAP end point of a service is, and they allow various key-value pairs to be associated with a service entry (similar to name that you can associate with a RMI stub in the RMI Registry).

How do people use the Registry?

Design time environment

The primary use of a registry is in the design time environment, where service metadata is cataloged during the inception, design, and development process.

Producers

People that create services that they want to share simply upload the service’s metadata to the appropriate registry, and other people get access to them.

Consumers

People that are looking to use services in their composite applications, business processes, and even other services simply connect to a registry and determine what services are available for sharing. In order to find a WSDL, consumers can connect with the registry via it’s web-app, which allows you to search for services by looking up name, or various attributes, using a simple textual search of the contents of the registry. Programmatically, IDE’s can do UDDI Queries to find what services are available for re-use. Regardless of the method to get to the service, the UDDI query results in a set of WSDLs that can then be downloaded by the consumer from the original location (only the URL to the WSDL is stored in the registry).

Run time environment

Very few people actually use the registry in the run time environment. There are a few companies that always query the registry for a WSDL, and then resolve that into an endpoint, in order to dispatch requests from their applications and services. Putting a registry in the critical path of a transaction at run time can seriously slow things down, as this is not how they were meant to be used by the designers of the system. However, people that take this approach tend to cache data provided by registries so that they minimize the performance hit they take, every time they go to a registry to get a WSDL that they need.

UDDI and Registry

SOA registry implementations have been tied to UDDI since they first appeared. UDDI is a SOAP based set of web services itself that provides 2 major operations:

  1. publish – this operation is used by various components in the SOA ecosystem (like runtime intermediaries, and security appliances, and IDEs) to populate service metadata into a registry.
  2. discover – this operation is used by various components in the SOA ecosystem to query service metadata that exists in a registry.

UDDI is up to version 3 right now, which is implemented by most popular Registries in the market today. A lot of the free implementations tend to stop at UDDI v2, and a lot of the free clients are up to v2 right now.

tModels

When speaking about UDDI, tModels come into play. tModels are hierarchical data structures, that are used to represent key-value pairs in the registry. Any metadata that you want to store in a UDDI registry has to be represented or encoded in the form of tModels. This is a very severe limitation of what you can and can not do with a registry. Anyone can make up a structure of tModels and encode whatever information they want in it – it will only make sense to the author. In the spirit of interoperability, a few standard encodings for things like WSDL, XSD, and other key pieces of metadata were produced – these are called tModel mappings. However, these standard mappings are few and far between and most vendors can’t agree on a standard representation (in tModels) for various metadata that is native to their environment (e.g.: runtime policy for runtime intermediary vendors, or security policy for security intermediary vendors).

One of the things that tModels are used for, in addition to mapping certain information from WSDLs, etc. is categorization. These are called taxonomies in the UDDI world. There is a way to create a catalog of tModels, in a tree structure. Then it’s possible to associate services in a registry with any node in this catalog of tModels. The whole idea is that a service might be categorized as more than one thing… a service that looks up the state tax given the zip code, maybe categorized as a part of the billing application, or the price quoting application. This flexibility is one of UDDI’s strengths. Many organizations tend to create a taxonomy that makes sense for how they organize services in their environment, and then tie any service metadata to this taxonomy.

Registry – what can’t it do?

One of the severest limitations of UDDI registries is that they have no concept of a persistence layer. There is no way to take a byte array or blob and save it in a registry. Only tModels can be persisted in the registry, which can then be queried. So if you want to upload a WSDL and then save it to the registry, this is not possible. Only a hyperlink to the source of the WSDL can be stored, and other key pieces of metadata extracted from the WSDL which can then be mapped to the tModels (using WSDL to tModel mappings) are actually stored in the registry.

Also, due to the tModel limitations around creating metadata to elaborate entries in the registry, there is only so much information that can be added by software in the SOA ecosystem so that it makes sense to anyone but the author or publisher.

Registry – what can it do?

There are few basic things that registries are good for:

  1. You can publish WSDL documents into the registry. You can even associate various key-value pairs along with the service when you publish it. When you publish a service, you get a UUID that you can use to reference this service instance in any future transactions – this is what uniquely identifies a service instance in the registry.
  2. You can create your own taxonomy and publish it to the registry. Then you can associate new or existing services with nodes in this taxonomy. This helps describe what these services do, and it helps other people query for the right service that they need.
  3. You can publish or associate more than just the basic set of metadata (tModel mappings for WSDL, or taxonomies) to the registry. For eg: a runtime intermediary can associate a runtime policy with a service entry, or a security intermediary can associate a security policy with a service. Now, this new metadata that’s added will only make sense to the author. This is why there have been attempts to create partnerships between companies so that they can encode their tModels in formats that other vendors can understand. A good example of this is Systinet’s GIF – Governance Interoperability Framework. They have a set of use cases and tModel encodings for certain pieces of information that make sense for runtime and security intermediary vendors.

UDDI v3 – what’s new?

UDDI v3 introduced a few key features that were missing from v2. There is a stronger security model, that is more fine grained that has to be implemented with v3, that was missing from v2. And also, notifications and subscriptions are supported in v3. When you register an expression of interest with a registry, it will notify you when the artifact that you are interested in changes in any way – either via an email or a web service call. The delta change is not reported, rather the entire artifact is embedded in XML format in the call (email or web service) any time a change occurs. This is marginally useful, since the onus is on the recipient to determine what exactly changed.

Repository – the future

To address the severe limitations of UDDI and registries, most of the registry vendors have created repository products. These SOA repositories are more like document or content management systems. They have a great deal of flexibility in what kind of metadata can be associated with a service, and they are able to store any kind of file or binary data. They also have the ability to perform searching that’s quite complex and more rich than just UDDI queries. Also, they have support for creating design and runtime policy that affects a service in the repository… however, these advanced features are really quite limited, and are mostly just for reporting purposes right now… passive rules that can report on the current state of certain artifacts in the repository. In the future, these vendors plan to expand integration of their repositories into BPM engines to support workflow and automated control of a services lifecycle as it transitions from one phase to another, depending on the rules that determine it’s run time, design time and change time policies. This is not a reality yet though. And there are no standards around doing this in an interoperable way.

Right now, UDDI has been relegated to the role of a basic interoperability mechanism between various vendors in the SOA ecosystem. It’s not where things are stored, rather it’s an adapter through which certain key pieces of information are transmitted to a place, where others can see, share, and publish as well.