XML Security Threats

Very Very Rough Draft

I Introduction

XML Web services make business-to-business (B2B) transactions simpler, more seamless and more efficient than ever. XML (eXtensible Markup Language), built on open industry standards that enable Internet based data transfer across any platform or application, is fast becoming the “common language” of the Web. XML is a mark-up language standardized by the World Wide Web Consortium. It provides semantics-aware markup without losing the formatting and rendering capabilities of HTML. XML’s tags capability of self-description is shifting the focus of Web communication from conventional hypertext to data interchange. XML’s extensibility allows the users and application to declare and use their own tags and attributes, ensuring the logical structure and content of semantically rich information to be retained. It focuses on the description of information structure and content as opposed to its presentation.

Where used and why.

The standard markup language represents a breakthrough in one sense by giving companies a common language to use to transmit data over the Internet. This is a huge advantage over EDI, which requires participants to rent bandwidth on a costly private network. XML has an obvious advantage over EDI in that it leverages existing infrastructure, such as the Internet and is therefore not expensive to adopt. And it does have some technical advantages. In XML tags define everything, and you can extend the document by adding more tags. Two partners can agree to a specific extension of a purchase order. That gives much more flexibility. You can also drop an XML stream on a piece of paper and read it. That's not true for EDI, which is only machine readable. [12]

XML provides an ideal methodology for electronic business because:

XML allows message type creators to clearly identify the role and syntax of each piece of interchanged data using a definition that is both machine processable and human interpretable
XML allows message type creators to identify the source of each shared structure using an Internet Uniform Resource Locator
XML allows message type creators to optionally identify which pieces of information should occur in each interchanged set of data and, where relevant, the order in which individual fields should occur in a particular message stream
XML documents can be given metadata fields that can be used to identify who is responsible for creating, transmitting, receiving and processing each message, and can have built-in facilities for identifying the storage points of programs that should be used to control processes
XML can make use of facilities provided by the latest version of the Internet Hypertext Transfer Protocol (HTTP), which can identify when a message should be moved from one stage of the interchange process to another, and to check that the relevant forms of interchange have taken place. [13]

XML Developments

As more information is made available using the XML format concerns are being raised by developers and end-users about XML security problems. Internet applications need security mechanism to protect sensitive data against unauthorized access. Standardization activities for XML digital signature and element-wise encryption already exist but a standardized authorization mechanism for XML data still remains an open issue. Security experts say the success of XML Web services makes them likely to attract attention from those who see them as an opportunity for malicious intent. Chris Christiansen, program director of Internet security at analyst group IDC Research, says it’s only a matter of time before a virus, denial-of-service attack or other threats target XML Web services

II Security Requirements

Missing -- Encryption and Digital Signature

III Security Issues

Concerns have been raised by developers and end-users about what security problems may exist. XML itself is not directly related to security, because it is just a data format for documents. But the way it is being positioned has caused some to question if additional measures will be necessary. For example, end-users will directly be able to control their Web experience by pulling the information out of an XML document that most interests them. In the same document, though, information that user should not see might be present, and the server will need to know whether the user should get specific data. A valuable benefit of XML is that a complete document can be sent as one operation and then held locally, thus reducing network traffic. But this then raises the question of how to control authorized viewing of different groups of elements. A merchant may need to know a customer's name and address but doesn't need to know the various details of any credit card being used any more than the bank needs to know the details of the goods bought. A researcher may need to be prevented from seeing personal details on medical records while an administrator may need exactly those details but should be prevented from viewing medical history; a doctor or nurse, in turn, may need medical details and some, but not all, personal material. Industry analysts and developers see authentication and encryption playing a large role in this area. With authentication, the server will know what information can be sent to the user based on that user's access level, whereas encryption will only let users with decryption keys see the message. The granularity to which these measures should work brings out differing opinions. For example, some people want to block or allow access to an entire XML instance, while others would like to control access at the tag level

As with general encryption, there's no problem in digitally signing an XML document as a whole. However, difficulty arises when parts of a document need to be signed, perhaps by different people, and when this needs to be done in conjunction with selective encryption. It may not be possible or desirable to mandate a particular sequence of sectional encryption by specified people acting in order, yet successful processing of the different parts of the document will depend on knowing this. Further, as a digital signature asserts that a certain private key has been used to authenticate something, a signer may view the item to be signed in plain text, and this may mean decrypting part of something already encrypted for other reasons

Security can be bypassed simply by exploiting XML’s flexibility and extensibility paired with its semantics and structure. XML fragments, can present data from multiple sources. The components of an XML fragment are like baking ingredients-you mix them together in varying amounts based on a recipe. These ingredients can be spread throughout the Internet “kitchen.” This potential dispersion of information introduces validation issues. Without a reliable method of validating the source of the data and the accuracy of the information itself, a hacker could introduce spoofed data into a transaction or transformation of data. XML instances can use links to resources, making them transient in nature. With all the “ingredients” identified, XML provides you with two options: You can make the cake, by collecting and presenting the XML information in an XML instance, or you can take a picture of the cake, by providing pointers and links to the applicable information. In either case, the end result for users looks the same. A complete XML instance may be presented without any real data in it, just Uniform Resource Identifiers (URIs) that point to particular elements. This transient quality really extends the “security is only as strong as its weakest link” metaphor -it could be that you have limited control over some of the data but must still rely on the security controls surrounding it. XML is transport independent; it doesn't specify a particular transport mechanism. Current implementations use HTTP for transport-a universal skeleton key for virtually any network. Firewalls don't stop HTTP, and they won't stop XML, regardless of your application. Thus, XML can generate security problems that other forms of data do not. XML instances can look exactly the same on the surface and yet still be different in content. Even well formed syntactically correct XML instances may be structured differently due to tag placement, use of white space, and other style mechanisms. These differences, though they don't impact the quality and content of the information, introduce a level of ambiguity that adds to the need for validity and security.

Bypassing security

XML offers powerful abilities to structure, manage, share, and process data, but it also opens some possibilities for hackers. Just like email and HTML files, XML files can be captured as they travel over the Internet. Do your data files contain sensitive information? It is tempting to mix sensitive and innocuous data together in a single XML data file and then use templates to format the information for appropriate audiences. Even if you control who can run the templates that display the sensitive information, the more public templates may point the way to the XML file. Once they know it is there, hackers could bypass the templates and retrieve the whole XML data file. Unicode, on which XML is based, has a huge character set (65,000 characters), offering many new opportunities for hackers to create attacks that bypass conventional protections. An example is a Microsoft IIS vulnerability that allows access to folders. [9]

In an XML-based world, firewalls must be capable of dipping into XML streams traveling over Web ports to check their payloads, much as today's email virus checkers dip into email data streams on mail servers. However, if the XML stream is encrypted, then the traditional firewall is of limited use. Because it simply cannot read the data, the firewalling logic must move to the points where the XML document is decrypted and processed. [6] XML-specific security is obscured by the complex way that data enters and leaves computers on Internet connections. Differing ports are used for separate networked applications, like e-mail or chat, and this enables firewalls to recognize and filter network traffic. However, it is now standard practice to send XML data over the ports allotted for Web traffic, effectively disguising the XML as normal Web browsing traffic. Whilst it is an appealing prospect to bypass firewall restrictions in this way, it does mean that security is required at a higher level than the network access layer. [10]

At the moment many XML users are unaware of these problems. They may rely on their systems integrators to address these issues but failing to address them leaves their companies extremely vulnerable. Take the example of a company with a back-office system used for procurement, ERP or day-to-day administration, using XML to integrate services direct into the server. By its nature, XML integration sits on top of the Web technology that is the focus of so many malicious attacks today. Any company exposed in this way runs the danger of revealing vital confidential data to outsiders; competitors could steal customer and supplier details or highly confidential pricing information. Revelation of this kind of material is not only commercially sensitive; it could also unleash all manner of data protection penalties.

Openness

Although there have been some questions about the process used to create XML, the standard itself is completely open, freely available on the web. The W3C members have early access to standards but once the standard is complete the results are public. The XML Working Group and the Working Groups for the supporting standards also release drafts of their work on a regular basis, making it possible to follow work in progress. Several non-W3C XML developments have also been extremely open. XML documents themselves are also considerably more open than their binary counterparts. Anyone can parse a well-formed XML document, and validate it if a DTD is provided. While companies may still create XML that behaves in a specific way bound to their application, the data in the XML document is available to any application. While developers could create DTDs or encrypt their data in a proprietary manner, they would lose most of the benefits of using XML. XML doesn't bar the creation of proprietary formats, but its openness is what many consider its greatest advantage. This openness leaves it vulnerable to attacks from malicious code and allows people to view data that would they would otherwise be unauthorized to view.

What can be done?

One model used to regulate access to XML documents uses the XPath language. XPath is a language for addressing parts of an XML document. It models an XML document as a tree of nodes. An example would be a medical record like the following; the tree represents the XML document.

XPath has absolute and relative location paths used for addressing. In addition to its use for addressing, XPath is also designed so that it has a natural subset that can be used for matching. The location paths that meet certain restrictions can be used as patterns.

The development of the access control system requires the definitions of subjects and objects for which authorization rules are specified and access controls are enforced. A subject is a user with its own identifier, which can be the login name. Each user/subject can be the member of one or several groups. An object can be any node of an XPath tree. Authorization rules are formed by the set of subjects, set of objects, access and priority. The set of objects are expressed with a pattern, set of subjects is a location path, the value of access is either grant or deny and priority is optional and used to fix the priority of the authorization rule. The authorization rules are written on an XML Authorization Sheet (XAS). When a users request to see the XML source document then they are provided with the view of the document which is compatible with their rights. Here is and example that takes all the elements of the XPath language model.

Patricia Franck is a new patient and has cancer. Her life expectancy is limited to two years and the item saying she has an ulcer is a cover story. The cover story is a lie inserted in the source document in order to hide the existence of sensitive information. The attribute coverstory=”yes” informs the users who are permitted to see that ulcer is a lie.

Rule 5 says that the Franck family is permitted to see the data of Patricia Franck.

Rule 6 says that the Franck family, including Patricia, is forbidden to see the comments element of her medical record.

Rule 7 says that nurses are forbidden to see the text of the comments element of Patricia’s medical record.

Rule 8 says Patricia is forbidden to see the item that says she has cancer. This is a good example of a content-based authorization rule.

Rule 9 says that Patricia is permitted to see the item which is a cover store but rule 10 says she is forbidden to know the item is a cover story.

This models allows security policies with a high expressive power since any node can be independently protected. The semantics and the possibility of defining content based authorization rules are unique.

Conclusion

A broad family of initiatives and technologies are growing around XML, not just for desktop display but wireless, handheld and voice-based solutions, extending the predicament. The underlying philosophy of XML is to simplify integration of disparate platforms on an open systems basis, which is marvelous for simplicity and usability. Unfortunately the same levels of thoroughness and imagination have not been applied to security standards, which have not been optimized to the same degree. The requirements are demanding; it may well be that elements of a transaction forwarded to a third party need greater encryption than the rest of the communication.

Measures are now being taken to address these issues and one of them, a digital signature system for XML, provides a means of certifying transactions and establishing an audit trail. This is particularly important for companies active in the business-to-business arena. While a small proportion of counterfeit transactions may be acceptable in business-to-consumer dealings, in the world of B2B commerce it is not.