Friday 10-Sep-2010.
New Book
XRay XML Editor
Company
University
Solutions
<TAG>
Xmlu.com
Current Weather
Ski Conditions

Article from July, 2002.


What is an XML Web Service?

By Brian Travis and Mae Ozkan

Brian Travis is founder and Chief Technical Officer of Architag International Corporation and Managing Editor of <TAG>

Mae Ozkan is Chief Architect of Architag International Corporation.

Their most recent book, Web Services Implementation Guide , offers a practical and technical explanation of what web services are, and how to make them work. This article is adapted from the book.


Abstract

"A web service is a service delivered over the Web " . If only it were that simple. There has been a lot written about the magic of web services. The information that you might have read will make you believe that if you just got your data into XML streams, web services would be easy, even automatic. Not so fast.

This article shows how a standardized representation of information creates a path leading to the implementation and exposure of web services. But is that enough?


The World Wide Web has been mainly created to support user interactions and has been leveraged by end-users, "web surfers " . If an enterprise has valuable information, such as a database or an application, it may want to expose that information to its users. This has traditionally been called the "presentation layer " : a user interface for humans to view the information. These user interfaces are expressed as HTML-coded streams.

But HTML is really just a typesetting language. It is a markup language that is optimized for delivering information to human eyeballs. I guess you could say that "HTML " stands for "the eye ball markup language. "

HTML is parsed by applications called "Web browsers " and presented in eyeball-pleasing, easy-to-read "pages " . These pages are very similar to a thing that we have been accustomed to reading and looking at for years. That thing is paper, and is so familiar to people that successful Web experiences rely on the reader's familiarity with this medium.

We have had magazines, newspapers, and books to read all of our lives. These paper products have tables, pictures, headlines, and paragraphs that present data in a two-dimensional way. HTML pages look just like paper products for presenting data. A compelling advantage of HTML-delivered pages is that we can now offer more functionality, such as searching, persistent storage, and personalization. From humans' point-of-view, we can read and understand HTML pages using our eyeballs.

The World Wide Web has changed our lives, because we can reach new information all around the world, with little or no out-of-pocket cost. We humans can leverage the information presented on the World Wide Web by interacting with our browsers. We click on links we would like to visit, we read the pages we are interested in, and search the information that we are looking for using search engines. We know how to surf through the Internet and get the data we are looking for and this changed our lives and our proximity with data.

You've gotten this far, and are now wondering why we are talking about something that is obvious to every six-year-old. The reason is that we wanted to point out that, while the Web has given us, as humans, a great way to interact, we are wasting this huge thing called the Web by just sending HTML around when we can do so much more with this great infrastructure.

Screen Scraping

You can think of a web service the same way you think of a Web page, except that the information is expressed in a machine-understandable format, rather than in a human-readable one. We can use the same tools, the same data, and the same infrastructure as we used to create our HTML Web pages.

As an example, let's take the weather. Sample Web Page shows a simple Web page that shows the current weather conditions and the five-day forecast.

Sample Web Page

This page shows typical weather information. It was created using a three-tier process, and displayed in a thin client browser.

The page was created using a traditional three-tier architecture that is common with Web sites. The three-tiers, data, middle, and client, work together to create a view of data for a consuming party. The architecture is shown in Sample Web Page.

Get Weather Architecture

This is a three-tier system that retrieves data from the data tier, formats it in the middle tier, and presents it to the client tier.

Since this information was created specifically for consumption by humans, it is formatted as an HTML stream, which is interpreted by a Web browser. This gives the familiar display shown in Sample Web Page.

The program in the middle tier, getWeatherHTML , reads data that is managed in the data tier, wraps HTML tags around the data, and presents it to the user in the client tier.

Any human can read this page and interpret it. Try it yourself: What is the forecast high temperature for next Monday in Palm Desert California? 74-degrees Fahrenheit. How did you know? You scanned the page, noticed that it was for Palm Desert, California. You saw the heading "Forecast " at the bottom, with a table arranged in rows and columns. All your life you have seen such visual structures, so you know how to interpret them. You see a list of days across the top of the table, and three rows, one for the high temperature, one for the low, and one for the condition of the skies. You navigated to the intersection between "Monday " and "High temp " , and there you found the answer. You had to make a lot of assumptions, most of which you probably were not even aware you were making:

  • The string "California " is nowhere on this page. You translated "CA " into "California "

  • The word "temperature " is no where on the page, either. You made the translation from "temp " to "temperature " , because the subject was weather. If we were talking about temporary housing or temporary insanity, you might have made the translation differently.

  • You read the string, "74&degrees;F " , and concluded that it meant "seventy four degrees Fahrenheit " . Again, you used your experience as a human who has been reading things like this for a long time.

You made a lot of decisions when answering this simple question. Most of which you did intuitively. Humans do that all the time.

Now, suppose we want to go somewhere next week, somewhere that's warm. We want to look at hundreds of locations to see which one is the warmest. If we were to use the interface illustrated here, we would need to access each page, one at a time, scan the page for the table, and look for the data. This could take a while, and not be very fun.

Instead, let's write a computer program to make this repetitive task easier. We will have this program access each site for us and look to see which city has the highest temperature.

But first, we need to teach our program how to read this table. This is easy for you and me, because we have human cognitive abilities. However, a computer is not so smart. But first, we need to realize what the computer sees. It does not see the output as we see it in Sample Web Page.

Rather, the computer sees the screen as shown in Sample Web Page.

<HTML>
 <HEAD>
     <TITLE>getWeather</TITLE>
 </HEAD>
 <BODY STYLE="font-family:Verdana;">

     <H3>Current conditions for Palm Desert, CA</H3>
     <TABLE>
         <TR>
             <TD><B>Current temp</B></TD>
             <TD>61</TD>
         </TR>
         <TR>
             <TD><B>Visibility</B></TD>
             <TD>10 miles</TD>
         </TR>
         <TR>
             <TD><B>Barometer</B></TD>
             <TD>30.21 inches</TD>
         </TR>
         <TR>
             <TD><B>Dew point</B></TD>
             <TD>15</TD>
         </TR>
         <TR>
             <TD><B>Relative Humidity</B></TD>
             <TD>16%</TD>
         </TR>
         <TR>
             <TD><B>Sunrise</B></TD>
             <TD>6:51 am PST</TD>
         </TR>
         <TR>
             <TD><B>Sunset</B></TD>
             <TD>4:54 pm PST</TD>
         </TR>
         <TR>
             <TD><B>Wind</B></TD>
             <TD>from the East at 6 mph</TD>
         </TR>
         <TR>
             <TD><B>Wind chill</B></TD>
             <TD>47</TD>
         </TR>
     </TABLE>
 
     <H3>Forecast</H3>
     <TABLE BORDER="1">
         <TR>
             <TD></TD>
             <TH ALIGN="CENTER">Saturday</TH>
             <TH ALIGN="CENTER">Sunday</TH>
             <TH ALIGN="CENTER">Monday</TH>
             <TH ALIGN="CENTER">Tuesday</TH>
             <TH ALIGN="CENTER">Wednesday</TH>
         </TR>
         <TR>
             <TD><B>High temp</B></TD>
             <TD ALIGN="CENTER">71&amp;#x00B0;F</TD>
             <TD ALIGN="CENTER">77&amp;#x00B0;F</TD>
             <TD ALIGN="CENTER">74&amp;#x00B0;F</TD>
             <TD ALIGN="CENTER">76&amp;#x00B0;F</TD>
             <TD ALIGN="CENTER">75&amp;#x00B0;F</TD>
         </TR>
         <TR>
             <TD><B>Low temp</B></TD>
             <TD ALIGN="CENTER">35&amp;#x00B0;F</TD>
             <TD ALIGN="CENTER">32&amp;#x00B0;F</TD>
             <TD ALIGN="CENTER">34&amp;#x00B0;F</TD>
             <TD ALIGN="CENTER">43&amp;#x00B0;F</TD>
             <TD ALIGN="CENTER">45&amp;#x00B0;F</TD>
         </TR>
         <TR>
             <TD><B>Skies</B></TD>
             <TD ALIGN="CENTER">Cloudy</TD>
             <TD ALIGN="CENTER">Partly Cloudy</TD>
             <TD ALIGN="CENTER">Mostly Sunny</TD>
             <TD ALIGN="CENTER">Partly Cloudy</TD>
             <TD ALIGN="CENTER">Partly Cloudy</TD>
         </TR>
     </TABLE>
 </BODY>
</HTML>
HTML Representation of Weather Forecast

A computer will see the weather forecast as an HTML document.

The computer sees the raw HTML code that creates the screen. Our task is to write a program that will read that HTML code and find the high temperature for next Monday. If you understand HTML code, you can find it. It is there on line 061.

Our program would need to first locate this page, find the text that has Monday's temperature, extract it, save it, then get the next file. After retrieving all of the cities, it is a simple programming task to display the city with the highest temperature.

The problem is finding the temperature. In this case, it is in an HTML element called TD , which stands for "table data " . This data is the fourth cell inside of the third table row TR . The data is in the cell, but it has an entity reference, &amp;#x00B0; , which represents degree sign. This is followed by a capital "F " , indicating Fahrenheit. Our program would need to strip these in order to load the temperature itself into a comparison variable.

If this document were constant, that is, if the creator of the document never changed the formatting of the data, we could write our program to find the fourth cell in the third row. However, if the designer changed anything, our program would break. Suppose that the creator wanted to conserve bandwidth by compressing space. Or, suppose all of the tag names were lowercase. The fragment shown in Sample Web Page will format identically as the row we are looking for.

<tr><td><b>High Temp</b></td><td align='center'>71&amp;#x00b0;F
</td><td align='center'>77&amp;#x00b0;F</td><td align='center'>
74&amp;#x00b0;F</td><td align='center'>76&amp;#x00b0;F</td><td align=
'center'>75&amp;#x00b0;F</td></tr>
Alternate HTML Coding

Compressing spaces and making all tag names lowercase creates the same HTML document.

Or, suppose the temperature were expressed in Celsius. If that were the case, there would probably be a "C " instead of "F " in the line. We would need to do a conversion from Celsius to Fahrenheit in order to compare all cities.

This technique is called "screen-scraping " , and has been used for years to automate certain functions, or to get data from one format into another. It is effective to a certain extent.

However, the flexibility of HTML coding is such that screen scraping is difficult to do, especially if you are depending upon someone else's program to create the HTML in a stable, consistent, reliable way.

"If we only had XML! "

We have heard the mantra, "If we only had our data in XML, then we would be able to leverage it. " That is, people realize that HTML has limitations in an environment like the one above, and if they could just access sites that serve data as XML rather than HTML, all of their problems would be solved.

Let's load our HTML document into an XML editor and see how it looks. This is shown in Sample Web Page.

HTML Document as XML

This HTML document has been loaded into an XML editor to check its structure.

Surprise! This document has been loaded into an XML editor that shows its status. Notice the indicator in the status line, "well-formed " . We will see later that "well-formed " means that this document adheres to all of the "well-formedness constraints " defined by the W3C XML specification. That means that this HTML document is a well-formed XML document. All of our problems are solved!

Not so fast. Just because this is a well-formed XML document, it is of no more use to us than it was as an HTML document. The data is still locked into a format designed for viewing by humans. The high temperature is still in the fourth cell of the third row.

Instead of just requiring that the document be expressed as an XML document, we should have said that we need the weather document expressed in terms of the weather data, not in terms of some two-dimensional representation of the weather data.

Instead of creating a data stream marked up in the "hypertext markup language " , we need one expressed in the "weather forecast markup language " . We will call this WFML.

Changing to WFML is relatively simple. We already connect to the database that has the raw weather data. All we need to do is create a program similar to the one that creates HTML and change it so it represents the weather. This architecture is shown in Sample Web Page.

Get Weather as XML Weather Forecast Format

The same three-tier architecture can be used to express data in terms of the weather forecast.

The only differences between this and the last example are that we modified the program to create WFML, and we used a program on our client tier to access the data.

The resulting document is shown in Sample Web Page.

<?xml version="1.0"?>
<weather zipcode="92260" >
 <location>Palm Desert, CA</location>
 <current>
     <condition name="temperature">61</condition>
     <condition name="visibility">10 miles</condition>
     <condition name="wind chill">47</condition>
     <condition name="wind">from the East at 6 mph</condition>
     <condition name="dewpoint">15</condition>
     <condition name="relative humidity">16%</condition>
     <condition name="barometer">30.21 inches</condition>
     <condition name="sunrise">6:51 am PST</condition>
     <condition name="sunset">4:54 pm PST</condition>
 </current>
 <forecast updated="2002-05-06">
     <day date="2002-05-07" high="71" low="35" sky="Cloudy"/> 
     <day date="2002-05-08" high="77" low="32" sky="Partly Cloudy"/> 
     <day date="2002-05-09" high="74" low="34" sky="Mostly Sunny"/> 
     <day date="2002-05-10" high="76" low="43" sky="Partly Cloudy"/> 
     <day date="2002-05-11" high="75" low="45" sky="Partly Cloudy"/> 
 </forecast>
</weather>
Weather Forecast Markup Language

This stream expresses the weather in terms of the weather itself.

Notice, now, that the elements and attributes reflect the names of the data objects in the database. It is relatively easy to find the high temperature for the day where the date attribute is equal to 2002-05-09 (next Monday).

This is the basis of web services. We have taken an asset that already exists and exposed it in a way that is meaningful for computer programs. This simple change in the way data is represented will make it possible for others to leverage our data in ways that we had never considered, or bothered to implement.

Data Still Held Hostage

From the point-of-view of our applications and data, the World Wide Web has not changed a thing. Applications behind our corporate firewalls are still written in different languages, running on different platforms, with different architectural visions and programming models. Important data is held hostage by some application, be it a proprietary home-grown application, a proprietary Java bean, or a COM object. Integration and communication between different applications is not easy.

Integration among applications across different departments in an organization is an even harder problem to solve. Technologies that support external communication are very new. Practices and frameworks that enable external communication are not fully automated yet. Dialogs and expectations among external partners are very human-intensive. As a result of all these and many others that we will discuss later, communication between different organizations is a difficult problem to solve. Applications that they use do not always interoperate even if they are using the same programming models, applications, or visions.

</>

Format for Printing



HomeContactusCopyright
All original material on this site is copyright © 1994-2010 by Architag International Corporation, All rights reserved. No part of this information may be reproduced in any form without express permission from
Architag International Corporation.