What is an XML Web Service?
By Brian Travis and Mae Ozkan
Brian Travis is founder and Chief Technical Officer of
Architag International Corporation and Managing Editor
of <TAG>
Mae Ozkan is Chief Architect of Architag International
Corporation.
Their most recent book,
Web Services Implementation
Guide
, offers a practical and technical explanation
of what web services are, and how to make them work. This article
is adapted from the book.
Abstract
"A web service is a service delivered over the Web
"
. If
only it were that simple. There has been a lot written about the magic of
web services. The information that you might have read will make you believe
that if you just got your data into XML streams, web services would be easy,
even automatic. Not so fast.
This article shows how a standardized representation of information
creates a path leading to the implementation and exposure of web services.
But is that enough?
The World Wide Web has been mainly created
to support user interactions and has been leveraged by end-users,
"web surfers
"
. If an enterprise has valuable information,
such as a database or an application, it may want to expose
that information to its users. This has traditionally been called
the
"presentation layer
"
: a user interface for humans
to view the information. These user interfaces are expressed
as HTML-coded streams.
But HTML is really just a typesetting language.
It is a markup language that is optimized for delivering information
to human eyeballs. I guess you could say that
"HTML
"
stands for
"the eye ball markup language.
"
HTML is parsed by applications called
"Web browsers
"
and presented in eyeball-pleasing,
easy-to-read
"pages
"
. These pages are very similar to a thing
that we have been accustomed to reading and looking at for years.
That thing is paper, and is so familiar to people that successful
Web experiences rely on the reader's familiarity with this medium.
We have had magazines, newspapers, and books
to read all of our lives. These paper products have tables,
pictures, headlines, and paragraphs that present data in a two-dimensional
way. HTML pages look just like paper products for
presenting data. A compelling advantage of HTML-delivered pages
is that we can now offer more functionality, such as searching,
persistent storage, and personalization. From humans' point-of-view,
we can read and understand HTML pages using our eyeballs.
The World Wide Web has changed our lives,
because we can reach new information all around the world, with
little or no out-of-pocket cost. We humans can leverage the
information presented on the World Wide Web by interacting with
our browsers. We click on links we would like to visit, we read
the pages we are interested in, and search the information that
we are looking for using search engines. We know how
to surf through the Internet and get the data we are looking
for and this changed our lives and our proximity with data.
You've gotten this far, and are now wondering
why we are talking about something that is obvious to every
six-year-old. The reason is that we wanted to point out that,
while the Web has given us, as humans, a great way to interact,
we are wasting this huge thing called the Web by just sending
HTML around when we can do so much more with this great infrastructure.
Screen Scraping
You can think of a web service the same
way you think of a Web page, except that the information is
expressed in a machine-understandable format, rather than in
a human-readable one. We can use the same tools, the same data,
and the same infrastructure as we used to create our HTML Web
pages.
As an example, let's take the weather.
Sample Web Page shows a simple Web page that shows
the current weather conditions and the five-day forecast.
Sample Web Page
This page shows typical weather information.
It was created using a three-tier process, and displayed
in a thin client browser.
The page was created using a traditional
three-tier architecture that is common with Web sites. The
three-tiers, data, middle, and client, work together to create
a view of data for a consuming party. The architecture is shown
in
Sample Web Page.
Get Weather Architecture
This is a three-tier system that retrieves
data from the data tier, formats it in the middle tier,
and presents it to the client tier.
Since this information was created specifically
for consumption by humans, it is formatted as an HTML
stream, which is interpreted by a Web browser. This gives the
familiar display shown in
Sample Web Page.
The program in the middle tier,
getWeatherHTML
, reads data that is managed in
the data tier, wraps HTML tags around the data, and presents
it to the user in the client tier.
Any human can read this page and interpret
it. Try it yourself: What is the forecast high temperature
for next Monday in Palm Desert California? 74-degrees
Fahrenheit. How did you know? You scanned the page, noticed
that it was for Palm Desert, California. You saw the heading
"Forecast
"
at the bottom, with a table arranged
in rows and columns. All your life you have seen such
visual structures, so you know how to interpret them. You see
a list of days across the top of the table, and three rows,
one for the high temperature, one for the low, and one for
the condition of the skies. You navigated to the intersection
between
"Monday
"
and
"High temp
"
, and there you found the answer.
You had to make a lot of assumptions, most of which you probably
were not even aware you were making:
- The string
"California
"
is nowhere on this page.
You translated
"CA
"
into
"California
"
- The word
"temperature
"
is no where on the
page, either. You made the translation from
"temp
"
to
"temperature
"
, because the subject
was weather. If we were talking about temporary housing or
temporary insanity,
you might have made the translation differently.
- You read the string,
"74°rees;F
"
, and concluded that it meant
"seventy four degrees Fahrenheit
"
.
Again, you used your experience as a human who has been
reading things like this for a long time.
You made a lot of decisions when answering
this simple question. Most of which you did intuitively.
Humans do that all the time.
Now, suppose we want to go somewhere next
week, somewhere that's warm. We want to look at hundreds of
locations to see which one is the warmest. If we were to use
the interface illustrated here, we would need to access each
page, one at a time, scan the page for the table, and look
for the data. This could take a while, and not be very fun.
Instead, let's write a computer program
to make this repetitive task easier. We will have this program
access each site for us and look to see which city has the
highest temperature.
But first, we need to teach our program
how to read this table. This is easy for you and me, because
we have human cognitive abilities. However, a computer is not
so smart. But first, we need to realize what the computer sees.
It does not see the output as we see it in
Sample Web Page.
Rather, the computer sees the screen as
shown in
Sample Web Page.
<HTML>
<HEAD>
<TITLE>getWeather</TITLE>
</HEAD>
<BODY STYLE="font-family:Verdana;">
<H3>Current conditions for Palm Desert, CA</H3>
<TABLE>
<TR>
<TD><B>Current temp</B></TD>
<TD>61</TD>
</TR>
<TR>
<TD><B>Visibility</B></TD>
<TD>10 miles</TD>
</TR>
<TR>
<TD><B>Barometer</B></TD>
<TD>30.21 inches</TD>
</TR>
<TR>
<TD><B>Dew point</B></TD>
<TD>15</TD>
</TR>
<TR>
<TD><B>Relative Humidity</B></TD>
<TD>16%</TD>
</TR>
<TR>
<TD><B>Sunrise</B></TD>
<TD>6:51 am PST</TD>
</TR>
<TR>
<TD><B>Sunset</B></TD>
<TD>4:54 pm PST</TD>
</TR>
<TR>
<TD><B>Wind</B></TD>
<TD>from the East at 6 mph</TD>
</TR>
<TR>
<TD><B>Wind chill</B></TD>
<TD>47</TD>
</TR>
</TABLE>
<H3>Forecast</H3>
<TABLE BORDER="1">
<TR>
<TD></TD>
<TH ALIGN="CENTER">Saturday</TH>
<TH ALIGN="CENTER">Sunday</TH>
<TH ALIGN="CENTER">Monday</TH>
<TH ALIGN="CENTER">Tuesday</TH>
<TH ALIGN="CENTER">Wednesday</TH>
</TR>
<TR>
<TD><B>High temp</B></TD>
<TD ALIGN="CENTER">71&#x00B0;F</TD>
<TD ALIGN="CENTER">77&#x00B0;F</TD>
<TD ALIGN="CENTER">74&#x00B0;F</TD>
<TD ALIGN="CENTER">76&#x00B0;F</TD>
<TD ALIGN="CENTER">75&#x00B0;F</TD>
</TR>
<TR>
<TD><B>Low temp</B></TD>
<TD ALIGN="CENTER">35&#x00B0;F</TD>
<TD ALIGN="CENTER">32&#x00B0;F</TD>
<TD ALIGN="CENTER">34&#x00B0;F</TD>
<TD ALIGN="CENTER">43&#x00B0;F</TD>
<TD ALIGN="CENTER">45&#x00B0;F</TD>
</TR>
<TR>
<TD><B>Skies</B></TD>
<TD ALIGN="CENTER">Cloudy</TD>
<TD ALIGN="CENTER">Partly Cloudy</TD>
<TD ALIGN="CENTER">Mostly Sunny</TD>
<TD ALIGN="CENTER">Partly Cloudy</TD>
<TD ALIGN="CENTER">Partly Cloudy</TD>
</TR>
</TABLE>
</BODY>
</HTML>
HTML Representation of Weather Forecast
A computer will see the weather forecast
as an HTML document.
The computer sees the raw HTML code that
creates the screen. Our task is to write a program that will
read that HTML code and find the high temperature for next
Monday. If you understand HTML code, you can find it. It
is there on line 061.
Our program would need to first locate
this page, find the text that has Monday's temperature,
extract it, save it, then get the next file. After retrieving
all of the cities, it is a simple programming task to display
the city with the highest temperature.
The problem is finding the temperature.
In this case, it is in an HTML element called
TD
, which stands for
"table data
"
. This data
is the fourth cell inside of the third table row
TR
. The data is in the cell, but it has an
entity reference,
&#x00B0;
, which represents degree sign.
This is followed by a capital
"F
"
, indicating Fahrenheit. Our program would
need to strip these in order to load the temperature itself
into a comparison variable.
If this document were constant, that is,
if the creator of the document never changed the formatting
of the data, we could write our program to find the fourth
cell in the third row. However, if the designer changed anything,
our program would break. Suppose that the creator wanted to
conserve bandwidth by compressing space. Or, suppose all of
the tag names were lowercase. The fragment shown in
Sample Web Page will format identically as the row
we are looking for.
<tr><td><b>High Temp</b></td><td align='center'>71&#x00b0;F
</td><td align='center'>77&#x00b0;F</td><td align='center'>
74&#x00b0;F</td><td align='center'>76&#x00b0;F</td><td align=
'center'>75&#x00b0;F</td></tr>
Alternate HTML Coding
Compressing spaces and making all tag
names lowercase creates the same HTML document.
Or, suppose the temperature were expressed
in Celsius. If that were the case, there would probably
be a
"C
"
instead of
"F
"
in the line. We would need to do a conversion
from Celsius to Fahrenheit in order to compare all cities.
This technique is called
"screen-scraping
"
, and has been used for years
to automate certain functions, or to get data from one format
into another. It is effective to a certain extent.
However, the flexibility of HTML coding
is such that screen scraping is difficult to do, especially
if you are depending upon someone else's program to create
the HTML in a stable, consistent, reliable way.
"If we only had XML!
"
We have heard the mantra,
"If we only had our data in XML, then we
would be able to leverage it.
"
That is, people realize that
HTML has limitations in an environment like the one above,
and if they could just access sites that serve data as XML
rather than HTML, all of their problems would be solved.
Let's load our HTML document into an XML editor
and see how it looks. This is shown in
Sample Web Page.
HTML Document as XML
This HTML document has been loaded into
an XML editor to check its structure.
Surprise! This document has been loaded
into an XML editor that shows its status. Notice the indicator
in the status line,
"well-formed
"
. We will see later that
"well-formed
"
means that this document adheres
to all of the
"well-formedness constraints
"
defined by the
W3C XML specification. That means that this HTML document is
a well-formed XML document. All of our problems are solved!
Not so fast. Just because this is a well-formed XML document,
it is of no more use to us than it was
as an HTML document. The data is still locked into a format
designed for viewing by humans. The high temperature is still
in the fourth cell of the third row.
Instead of just requiring that the document
be expressed as an XML document, we should have said that
we need the weather document expressed in terms of the weather
data, not in terms of some two-dimensional representation of
the weather data.
Instead of creating a data stream marked
up in the
"hypertext markup language
"
, we need one expressed
in the
"weather forecast markup language
"
. We will
call this WFML.
Changing to WFML is relatively simple.
We already connect to the database that has the raw weather
data. All we need to do is create a program similar to the
one that creates HTML and change it so it represents the weather.
This architecture is shown in
Sample Web Page.
Get Weather as XML Weather Forecast Format
The same three-tier architecture can
be used to express data in terms of the weather forecast.
The only differences between this and the
last example are that we modified the program to create WFML,
and we used a program on our client tier to access the data.
The resulting document is shown in
Sample Web Page.
<?xml version="1.0"?>
<weather zipcode="92260" >
<location>Palm Desert, CA</location>
<current>
<condition name="temperature">61</condition>
<condition name="visibility">10 miles</condition>
<condition name="wind chill">47</condition>
<condition name="wind">from the East at 6 mph</condition>
<condition name="dewpoint">15</condition>
<condition name="relative humidity">16%</condition>
<condition name="barometer">30.21 inches</condition>
<condition name="sunrise">6:51 am PST</condition>
<condition name="sunset">4:54 pm PST</condition>
</current>
<forecast updated="2002-05-06">
<day date="2002-05-07" high="71" low="35" sky="Cloudy"/>
<day date="2002-05-08" high="77" low="32" sky="Partly Cloudy"/>
<day date="2002-05-09" high="74" low="34" sky="Mostly Sunny"/>
<day date="2002-05-10" high="76" low="43" sky="Partly Cloudy"/>
<day date="2002-05-11" high="75" low="45" sky="Partly Cloudy"/>
</forecast>
</weather>
Weather Forecast Markup Language
This stream expresses the weather in
terms of the weather itself.
Notice, now, that the elements and attributes
reflect the names of the data objects in the database.
It is relatively easy to find the high temperature for the
day where the
date
attribute is equal to
2002-05-09
(next Monday).
This is the basis of web services. We have
taken an asset that already exists and exposed it in a way
that is meaningful for computer programs. This simple change
in the way data is represented will make it possible for others
to leverage our data in ways that we had never considered,
or bothered to implement.
Data Still Held Hostage
From the point-of-view of our applications
and data, the World Wide Web has not changed a thing. Applications
behind our corporate firewalls are still written in different
languages, running on different platforms, with different architectural
visions and programming models. Important data is held hostage
by some application, be it a proprietary home-grown application,
a proprietary Java bean, or a COM object. Integration and communication
between different applications is not easy.
Integration among applications across different
departments in an organization is an even harder problem to
solve. Technologies that support external communication are
very new. Practices and frameworks that enable external communication
are not fully automated yet. Dialogs and expectations among
external partners are very human-intensive. As a result of
all these and many others that we will discuss later, communication
between different organizations is a difficult problem to solve.
Applications that they use do not always interoperate even if
they are using the same programming models, applications, or
visions.
</>