SQL Server 2000 makes manipulating XML a breeze. It added the FOR XML [RAW|AUTO|EXPLICIT] clause which allows you to convert result sets into several flavors of XML documents. In the same vein, OpenXML was a great step forward in helping with batch processing. For example, if you import 10,000 users into your database it’s more efficient to put them into an XML document and send it to SQL Server to parse and insert them all at once, rather that shove them one row at a time.
I’ve used XML features of SQL Server for over two years now and loved them. Today I ran into a weird issue which somehow evaded me all this time. I figured I’d research it and pass on a warning.
The Inside Scoop
I have a stored proc which does a simple SELECT and returns it as an XML fragment. The downside of using FOR XML EXPLICIT is that it doesn’t return a well-formed XML document, and Books Online explicitly warn you:
When you specify EXPLICIT mode, you must assume the responsibility for ensuring that the generated XML is well-formed and valid.
Knowing that I’m going to read an XML fragment as a string and wrap an opening and closing tag around it (<Tree>...</Tree>) I settled for ExecuteScalar. I was in for a big surprise when, on one particular live instance of our product, the XML string came back… truncated! Stepping through code with the debugger confirmed that the string was chopped at exactly 2,033 characters!
A quick search in Google groups turned up very little. Nobody had answers. And then I stumbled across a Microsoft’s KB article Q310378: XML Data Is Truncated When You Use SqlDataReader, and all of a sudden it became clear:
When you read Extensible Markup Language (XML) data from Microsoft SQL Server by using the SqlDataReader object, the XML in the first column of the first row is truncated at 2,033 characters. You expect all of the contents of the XML data to be contained in a single row and column.
This behavior occurs because, for XML results greater than 2,033 characters in length, SQL Server returns the XML in multiple rows of 2,033 characters each.
The keyword here is multiple. If you call ExecuteScalar you have no way of getting anything but the very first column of the very first row, so this method is a dead end.
Another KB article, Q316701: Use the ExecuteXmlReader Method of the SqlCommand Class in Visual C# .NET shows how to bypass this problem by reading in a loop until all rows are read. I ended up with this:
connection.Open ();
rdr = command.ExecuteXmlReader ();
rdr.Read ();
while (rdr.ReadState != ReadState.EndOfFile)
{
result += rdr.ReadOuterXml ();
}
rdr.Close ();
For greater efficiency I’d use a StringBuilder here to concatenate chunks of text.
Lessons Learned
- Don’t use
ExecuteScalar to read XML from SQL Server! Just don’t.
- If you read XML by hand (as shown above) read it in loops because it comes sliced 2,033 characters at a time.
Conclusion
Here’s something odd: I looked through Microsoft ADO.NET (Core Reference) and Applied XML Programming for Microsoft .NET and found no mention of this gotcha. This “little” issue is hardly documented in other books or on MSDN at all! Now, c’mon! This is the kind of warning I expect to see in books!