WS-DuckTyping

One of the most popular features of languages such as Smalltalk, ObjectiveC, and Ruby is Duck typing. To quote Wikipedia:

[…] duck typing is a form of dynamic typing in which a variable’s value itself implicitly determines what the variable can do. This implies that an object is interchangeable with any other object that implements the same interface, regardless of whether the objects have a related inheritance hierarchy.[…]

The term is a reference to the duck test — “If it walks like a duck and quacks like a duck, it must be a duck”.

So what does this have to do with Web services? Well, duck typing is a great way to create graceful and interoperable Web services. Web services are about exchanging information, and as long as the information walks like a duck…

Here are my three tips to implement WS-Duck Typing:

Don’t validate incoming messages!

Not only is XSD-based validation slow, it also requires strict schema conformation from the other party, thus creating a strictly typed service. Such a service breaks Postel’s Law: be conservative in what you do; be liberal in what you accept from others.

If you really want to do validation, do it on server-side outgoing messages only. After all, you should adhere to your own schema. Also, Schematron is the exception to the rule, since it not based on grammars, but on finding tree patterns in the parsed document. Which brings us to:

Use XPath!

XPath is an excellent way to extract information from an XML document. Code written with XML API like DOM, SAX, or StAX is typically quite fragile when it comes to element ordering, nesting, or unexpected elements. And XML marshalling isn’t much better: some of these API’s throw exceptions in these cases.

Not with XPath. When using XPath, you don’t care whether the <lastName> element is the first or the second child of <person>; the /person/lastname expression grabs it anyway. And if if you really don’t know where to find the last name, you can always resort to //lastname, which finds it anywhere in the document.

In the past, XPath has dismissed as being too slow. With modern XPath libraries which support XPath pre-compilation, this is less of an issue.

Don’t create stubs or skeletons!

This is perhaps the most controversial tip. If you create client-side stubs or server-side skeletons in a strongly-typed language like Java, you throw away any option of being liberal about the XML messages. Instead, you have create a strongly-typed API that is strongly-coupled to the contract, and that passes or expects parameters of a certain kind. If they are of any other kind, or if they are simply not there, your code will never be invoked. Even if you didn’t need the parameter in the first place.

If you treat Web services like XML messaging, rather than RPC, you could have handled the message gracefully: let’s see if I can find the first name under the person element, and if it’s not there, I’ll try and find in anywhere in the document. Still not there? Perhaps it’s an older message: I’ll just apply this stylesheet, and see if I can find the first name then. Et cetera, et cetera.

Hopefully, these tips will help you create flexible, interoperable Web services that gracefully handle XML messages.

Quack!

25 Comments

  1. Peter Storch said,

    March 27, 2007 @ 7:54

    I fully agree. The XPath is a good tip, I’ll try it next.
    I’ve used AXIS for Java but used XMLBeans as JAVA/XML mapping framework. This provided me with the loose coupling I needed. It doesn’t validate and it doesn’t fall over unknown elements.
    On the server side I randomly insert elements into the output XML to force my service consumer not to validate :)

  2. Standard Deviations » Duck Typing for Web Services said,

    March 27, 2007 @ 14:11

    […] Arjen Poutsma is dead-on with his WS-Duck Typing post: […]

  3. Jimmy James said,

    March 27, 2007 @ 16:18

    “Code written with XML API like DOM, SAX, or StAX is typically quite fragile when it comes to element ordering, nesting, or unexpected elements”

    If you are using SAX and your code is sensitive to ordering and unexpected elements, you are most likely not using SAX effectively. That’s just FUD.

    As far as nesting goes, I completely disagree with the premise that you can ignore nesting. It’s an absurd notion. It amounts to guessing what the intent of the caller was. I’d much rather a web-service reject message than take what are essentially random actions on bad input. It’s insane that anyone would think this is a good idea.

  4. Dan North said,

    March 27, 2007 @ 16:56

    I’m not so sure. When I look at a form (a paper form, that is), and see a field labelled “mobile number”, I tend to assume the number in the box is the person’s mobile number. If I see a different version of the same form, still containing a field labelled “mobile number”, I will absurdly assume that its contents still represents a mobile number - even if the field is in a different place!!

    Now I could choose to be suspicious of any new versions of a form that have a field labelled “mobile number”, and either callously reject them out of hand, or throw a tantrum (sorry, exception). But that wouldn’t make me a very nice person. In fact, I’m insane enough to think that looking anywhere on the form for an appropriately-labelled field is the better thing to do.

    If the form contained two mobile numbers, one for business and one for personal, I might look for a “mobile number” field in the Personal Details section, but again I wouldn’t care where in the Personal Details section I found it.

    So how is it different with computers?

  5. Jilles van Gurp said,

    March 27, 2007 @ 18:18

    Nice comments. This is actually pretty much how I do web services when not doing SOAP. I have a nice handcrafted library for Java which compiles and then caches each XPATH expression I use. After that, XPATH is quite fast. Of course you still need to pass a DOM model (memory intensive).

    I also used my library in combination with DOM generated by JTidy from ordinary HTML pages. JTidy is a bit slow but the XPATH bit was pretty damn fast. I used this to parse microformatted HTML content which requires a Duck typing approach since you don’t know the exact tree structure. XPATH is great for this kind of work.

    Not depending more on structure than is strictly necessary is a good concept since it allows for more robust code that continues to work unless it must necessarily break.

    BTW. I agree with Jimmy though, if you use those APIs properly, the order of the elements does not really matter.

  6. Integrate This»Blog Archive » David Baron on versioning said,

    March 27, 2007 @ 19:43

    […] Update; Stefan Tilkov finds that Arjen Poutsma essentially says essentially the same thing. • • • […]

  7. Jimmy James said,

    March 27, 2007 @ 20:15

    “If the form contained two mobile numbers, one for business and one for personal, I might look for a “mobile number” field in the Personal Details section, but again I wouldn’t care where in the Personal Details section I found it.

    So how is it different with computers? “

    When you take the most insignificant pieces of info that might be sent and build your assumptions around them, it does seem fine.

    A given B2B document could have 5-10 addresses on it. You can have a head office location, a contact address, and a shipping address. If you just grab any address and ship 10000 laptops to it, do you think the customer is going to be happy when the laptops show up at their corporate headquarters instead of at their warehouse? Who do you think will pay for the mistake? Who will be blamed? I can tell you that the answer to both questions is not the customer.

    Purchase orders can have line items. Sometimes a partner might want to ship everything to one address and put it at the purchase order level and other times they might want to put it on the line item when they want to ship it to multiple locations. If the customer wants to ship to two locations but one of the addresses is omitted, do you think it’s OK to send both line items to the one provided address? If it’s not what they wanted, they will definitely not be happy that you made this assumption. The items don’t arrive in the proper location on time and good money will have to be spent to ship the goods across the country at express rates.

    Hasn’t anyone ever heard the saying “Don’t assume, it makes an ass out of u and me.”?

    Aside from the problems that could come from doing this, it’s only going to address situations where the message has the same element name in the wrong place. If it’s named slightly differently you aren’t going to find it.

    There’s nothing saying you can’t support different messages for different partners. In fact you should assume you will have to support that exactly. Trying to support many different messages with the same code is unworkable and has been known to be so for decades. If the same partner is sending different messages for the same services, then good luck with that.

  8. Stefan Tilkov said,

    March 27, 2007 @ 21:06

    @Jimmy: If your service consumes real XML documents and you use e.g. DOM na⁄ively, your code will break if someone inserts a comment, or adds an optional element you didn’t expect. Using XPath is good advice — check e.g. Elliotte Rusty Harold’s book Effective XML. The relevant chapter is even online: http://www.cafeconleche.org/books/effectivexml/chapters/35.html

  9. Anders Norås' Blog said,

    March 27, 2007 @ 22:11

    WS-Duck Typing

    Arjen Poutsma , who is the man behind Spring-WS has written about a more flexible way to do web services

  10. Mark Baker said,

    March 28, 2007 @ 2:00

    Nicely said. FWIW, I’ve talked about this recently too;

    http://www.coactus.com/blog/2006/12/validation-considered-harmful/
    http://www.coactus.com/blog/2007/01/two-more-reasons-why-validation-is-still-harmful/

  11. Jimmy James said,

    March 28, 2007 @ 3:45

    I worked on a system that consumed hundreds of thousands of ‘real’ documents a day for years. The issue you bring up is insignificant in the ‘real’ world. I don’t need to read a book.

    Real issues are when you 250,000 orders get held up in the warehouse and have to be manually routed because bad data was processed through the system (that was a bad day.) That costs real money. Rejected messages are a minor issue with a simple solution. It costs nothing to reprocess a request. Depending on what you are doing, you might not want to process messages that have unexpected elements either. You might think it’s unimportant but that doesn’t mean the sender does. Just because something is not required doesn’t mean it’s not crucial information. If a sender sends something and you don’t know what it is, you should find out.

    I never said not to use XPath either. I actually advocate XPath as it is much better for working with XML data than Java or other imperative languages. We used it for every service in the system mentioned above. But this is a completely different thing than saying you can skip validation (entirely) and use promiscuous queries. Validation doesn’t have to be w3c schema either. It can be programmatic and have multiple layers. Actually w3c schema doesn’t really validate well enough by itself.

  12. blog.jillesvangurp.com » Blog Archive » links for 2007-03-28 said,

    March 28, 2007 @ 8:21

    […] The Ancient Art of Programming » WS-Duck Typing some good advice for doing webservice programming (tags: xml webservices soa) […]

  13. Stefan Tilkov said,

    March 28, 2007 @ 14:39

    Jimmy, you wrote:

    >If you are using SAX and your code is sensitive to ordering
    >and unexpected elements, you are most likely not using SAX effectively. That’s just FUD.

    This is where I believe you are wrong, that’s why I pointed you the article because it makes the point much better than I can.

    In your last post you wrote:

    >I actually advocate XPath as it is much better for working with XML data than Java or other imperative languages.

    This seems to be the exact point I understood you were arguing against — I must be missing something.

    I am tempted to point you to a definition of FUD, too, but it somehow doesn’t seem likely you’d read it.

  14. Jimmy James said,

    March 28, 2007 @ 16:48

    You are mixing a bunch of different points that I have made.

    The author of this article said SAX is fragile with respect to element ordering. This is false. SAX doesn’t care about element ordering. In fact, a SAX by itself, SAX doesn’t give you any direct clue what level of nesting an element is on nor does it give any information of the ordering of elements. A SAX element handler that captures any element with a given name is much easier to write than one that is sensitive to nesting or ordering. You must write (or acquire) code to capture that information on your own. That code might be fragile but it’s not the SAX parser. A lot of people who aren’t comfortable with functional-style APIs struggle with SAX. I worked with SAX based code that would fall over when a optional element was missing. But it wasn’t SAX. The developer that wrote the handler was just clueless. Saying SAX is fragile with respect to nesting and ordering is FUD. It’s false information that creates Fear, Uncertainty and Doubt.

    The other, separate point that I made was that XPath is a good choice for working with XML. It’s XML specific and provides functionality that does not come in SAX, DOM etc. That has nothing to do with whether SAX is fragile.

    What I am arguing against is:

    1. That validation is unnecessary/problematic. You can validate and use XPath. They are not mutually exclusive. You’d better tell your clients if you don’t plan to validate their messages. This puts a lot more pressure on them to make sure things are perfect. You’ve abdicated your responsibility to verify their message.

    2. That if an element is not where it’s supposed to be, you can just grab an element with the same name from somewhere else. This is just mind-bogglingly dumb. This idea goes completely against the concept of hierarchical structures. Two entities with the same name in different places in a hierarchy are unique and anyone who tells you otherwise is very confused. You might be able to get away with it with a few elements that are unique in the document but as a general solution it’s untenable.

  15. Jimmy James said,

    March 28, 2007 @ 16:56

    BTW - The article you reference doesn’t mention SAX. SAX is not tree based. It doesn’t build an in memory model. Multiple text nodes are a consideration but are easily joined together. Comments easily ignored (don’t implement any handling for them.)

    It also contains the following quote:

    “If you absolutely insist that there can be only one result, then by all means validate the document against a schema which requires this (Item 38) before processing the document, and reject it if it fails the schema. Do not simply assume that all documents passed to the program meet the constraints. Such assumptions fail far more often than most developers initially expect, and over a time period of several years almost all such assumptions fail.”

  16. Contract Nazi said,

    March 28, 2007 @ 23:36

    If it’s services you are talking about, then it’s all about the service contract which, in an abstract sense says:

    “If you send me a message constructed within these constraints I’ll guarantee to respond in this predicatble way.”

    The inference being, that if the service consumer does not abide by the published constraints, that is the contract, then all bets are off. The service provider doesn’t need to validate the request in order to meet his side of the contract.

  17. Arjen Poutsma said,

    March 29, 2007 @ 0:33

    @Jimmy

    In my post, I said that code written with SAX is “typically quite fragile”. This doesn’t mean that all SAX code is fragile. It just means that, as you stated yourself, a lot of people don’t get functional APIs, and tend to write fragile SAX code. This isn’t about what I do, nor is it about you do, nor is it what I recommend, but it’s about what I’ve seen happen in the field. The same thing applies to StAX or DOM code, by the way: the code in the link that Stefan posted can be made less fragile by not assuming the first element is the only interesting element. It can be done in a better way, but it’s not typical. XPath just conveniently bypasses these issues.

    Also, grabbing an element from somewhere else can be perfectly acceptable in some cases. If a small document only contains one phone number, then who cares where it is located? If it contains multiple phone numbers (like in the address example you describe), then grabbing it from anywhere is probably a bad idea. It all depends on the context.

    @Contract Nazi
    I don’t agree. The question is not whether the message conforms to the schema; but whether it contains the information required to process the request. These two might be the same (i.e. the schema only contains the necessary information), but that might mean that you will end up with a lot of similar types in the schema (i.e. one type per operation). So instead, in might be useful to create one course-grained type, and use that in all operations, which eventually results in sending more information than is strictly required. The question is whether you want to validate this non-required information.

  18. Jimmy James said,

    March 29, 2007 @ 14:47

    Arjen-

    That’s reasonable. I suppose a lot of this depends on the context. In my B2B experience, messages were effectively contracts often worth hundreds of thousands of dollars a pop. It’s not sensible to be loose and fast with this kind of thing. Accepting a bad message would even violate our service agreements. But now I am using XPath to glean info from exported ‘code’ from a gui and I couldn’t care less about the schema. I’m /*//ing without a care in the world.

    But to imply validation is an ineffective technique is not correct. It may not be appropriate in all contexts but it’s definitely very useful and important in many others. The problems with different messages still arise but the solution is different when you need strict validation.

  19. Jimmy James said,

    March 29, 2007 @ 14:59

    Arjen-

    You write:

    “So instead, in might be useful to create one course-grained type, and use that in all operations, which eventually results in sending more information than is strictly required. The question is whether you want to validate this non-required information.”

    This may seem like a good idea but when you have hundreds of different producers all with their own idiosyncrasies, it’s going to turn into spaghetti code. How is this any different than the monolithic procedural approaches of the past? I prefer an object-oriented-like approach where each message has it’s own type.

    I think calling this approach duck-typing is a little misleading. In duck-typing, a given object is treating as being a certain type if it has the required operations. That’s part of the technique you are describing. But the other part is not similar to duck-typing. It’s more like pattern matching but I predict that in most cases it will degenerate into something more like the following (sometimes necessary) anti-pattern:

    if type(input) == str:
    // do this
    elif type (input) == dict:
    // do that
    else:
    // do the default

  20. sean said,

    March 29, 2007 @ 18:08

    I can see valid points to both validating and non validating xml schema with respect to messages, but I’d have to agree more with the validating side when it comes to real-world business driven scenarios. the notion, “If I cant find ‘phone number’ where I thought it would be, but it does exist somewhere else in a message, so just use that one and be ‘graceful’” simply seems unrealistic and irresponsible when you’re doing anything more than “hello world, here’s my phone number”.
    I think the thought process behind this blog post about “duck”-typing can lead to more tolerant technologies in the future, but as it stands currently, offers little real world value where contracts are more a tool to describe where one party’s concern ends and another begins.

  21. Jimmy James said,

    March 29, 2007 @ 19:25

    On a side note, RELAX NG appears to offer a lot of improvement over the W3C schema (I haven’t had a chance to use it for real work.) But I know it allows validating an element’s contents without respect to order. w3c doesn’t provide an easy way to do things like this. This kind of improvement might alleviate some of the problems people are seeing with strict validation.

  22. Stefan Tilkov said,

    March 29, 2007 @ 20:50

    @Jimmy: Your points are valid, and you are right about SAX not being mentioned in ERH’s article. In my experience, XPath is a lot easier to get right, and code based on XPath is a lot more stable in face of changes, than equivalent code that uses XML APIs - especially DOM, but also (to a lesser degree) SAX and StAX, although with the latter two, the reason is most likely a lack of experience on the programmer’s side.

    I can understand your POV on validation, but I don’t agree with it. Which I think is OK, and does not imply that either you or Arjen practised “FUD” — just that we have different experiences.

    Glad to see the discussion focusing on technical aspects now.

  23. James Watson (a.k.a Jimmy James) said,

    March 29, 2007 @ 22:02

    I don’t have any disagreement that XPath is easier. I think it’s far superior to custom SAX. SAX is actually better suited for completely generic transformations (pretty printers etc.) XQuery is also something to look into. Xpath 2.0 is a very nice to work with.

    My opinion on validation comes from experience. The B2B system that I worked on tried to take this many messages, one transformation process. It ended up being a huge unmaintainable mess where pretty much every change resulted in problems processing stable clients’ messages. The approach that did work was having a canonical form and transforming everything into that form to pass on to a core processing module. This is actually more flexible because you can still use the approaches you are advocating when it makes sense. It also let us process messages using completely different and incompatible standards with the same core functionality.

    This approach takes more work than what you are advocating but for industrial strength b2b services it’s a best practice. Really, it supercedes that approach. It may be overkill for some situations but we had 100s of consumers and producers to support.

  24. Solution Architecture said,

    March 30, 2007 @ 22:44

    Entity Framework: Disconnected Problems & Solutions

    Andres Aguiar points out an enormous change to the behavior of data access in tiered architectures that…

  25. Anders Norås' Blog said,

    April 2, 2007 @ 15:38

    WS-Duck for Windows Communication Foundation

    Last week I blogged about Arjen’s WS-Duck Typing and how Spring-WS enables “duck typed” web services.

RSS feed for comments on this post