Producing XHTML-Compliant Pages With Response Filters

Posted on February 13, 2004   |   Download sample code

59 comments

Programming with web standards in mind, although vastly ignored, is becoming more and more important. It almost seems it took too long to promote ASP.NET. Now that we're over the hill and "this stuff works" it is about time to start paying attention to web standards. In this article you will learn how to implement a response filter and plug it into the ASP.NET pipeline. The filter will transform outgoing HTML into XHTML 1.0-compliant markup.

A Call To Web Standards

Having read thousands of pages of articles on the web, magazines, MSDN documentation, online forums, etc, I still don't recall seeing a call for producing ASP.NET code compliant with web standards. Wait! I take it back. I saw one—a post in Scott Guthrie's blog. This one is a must-read! According to Scott, the upcoming version of Visual Studio .NET will feature server controls that produce web standard-compliant code, accessibility validation, etc. Now that's very good news!

As of the time of this writing ASP.NET does not produce code that is capable of passing successful validation in any of the SRTICT modes (see Eric Meyer's Picking a Rendering Mode and W3C's List of valid DTDs you can use in your document for more information on DOCTYPEs). To enforce XHTML compliant code it takes some effort to implement automatic code cleaning (all right, fudging).

The point of this article is two-fold—to reiterate the importance of web standards and learn how to implement response filters.

Anatomy of HTTP Response Filters

Instead of creating an abstract sample for this discussion, I'll refer to a real-world example of a filter application. This very site, www.AspNetResources.com, utilizes this filter to enforce XHTML 1.0 Strict compliancy.

The HttpResponse class has a very useful property:

public Stream Filter {get; set;}

MSDN provides a helpful description of this property: "Gets or sets a wrapping filter object used to modify the HTTP entity body before transmission." Confused? In other words, you can assign your own custom filter to each page response. HttpResponse will send all content through your filter. This filter will be invoked right before the response goes back to the user and you will have a change to transform it if need be. This could be extremely helpful if you need to transform output from "legacy" code or substitute placeholders (header, footer, navigation, you name it) with proper code. Besides, at times it's simply impossible to ensure that every server control plays by the rules and produces what you expect it to. Enter response filters.

The Filter property is of type System.IO.Stream. To create your own filter you need to derive a class from System.IO.Stream (which is an abstract class) and add implementation to its numerous methods.

using System;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;
using System.Web;

namespace AspNetResources.Web
{
/// <summary>
/// PageFilter does all the dirty work of tinkering the 
/// outgoing HTML stream. This is a good place to
/// enforce some compilancy with web standards.
/// </summary>
public class PageFilter : Stream
{
    Stream          responseStream;
    long            position;
    StringBuilder   responseHtml;

public PageFilter (Stream inputStream)
{
    responseStream = inputStream;
    responseHtml = new StringBuilder ();
}

#region Filter overrides
public override bool CanRead
{
    get { return true;}
}

public override bool CanSeek
{
    get { return true; }
}

public override bool CanWrite
{
    get { return true; }
}

public override void Close()
{
    responseStream.Close ();
}

public override void Flush()
{
    responseStream.Flush ();
}

public override long Length
{
    get { return 0; }
}

public override long Position
{
    get { return position; }
    set { position = value; }
}

public override long Seek(long offset, SeekOrigin origin)
{
    return responseStream.Seek (offset, origin);
}

public override void SetLength(long length)
{
    responseStream.SetLength (length);
}

public override int Read(byte[] buffer, int offset, int count)
{
    return responseStream.Read (buffer, offset, count);
}
#endregion

#region Dirty work
public override void Write(byte[] buffer, int offset, int count)
{
    string strBuffer = System.Text.UTF8Encoding.UTF8.«
                              GetString (buffer, offset, count);

    // ---------------------------------
    // Wait for the closing </html> tag
    // ---------------------------------
    Regex eof = new Regex ("</html>", RegexOptions.IgnoreCase);

    if (!eof.IsMatch (strBuffer))
    {
        responseHtml.Append (strBuffer);
    }
    else
    {
        responseHtml.Append (strBuffer);
        string  finalHtml = responseHtml.ToString ();

        // Transform the response and write it back out

        byte[] data = System.Text.UTF8Encoding.UTF8.«
                              GetBytes (finalHtml);
        
        responseStream.Write (data, 0, data.Length);            
    }
}
#endregion

As you can see most methods have more or less dummy code. The Write method does all the heavy lifting. Before we transform the output stream we need to wait until the buffer is full. Therefore a Regex looks for the closing </html> tag.

Now that we have the entire HTML response string we can transform it. I really liked Julian Roberts' approach as laid out in his Ensuring XHTML compliancy in ASP.NET article, although I chose to redo the regular expressions to my liking.

Forcing XHTML Compliancy

Basically, this particular filter simply tries to fix a few of the known inconsistencies:

  1. Place the __VIEWSTATE hidden input in a <div> to make the validator happy.
  2. Remove the name attribute from the main form. By default your server-side form gets a name and an id attribute. The validator is not happy about the name attribute so we need to get rid of it.

My first take is to wrap the __VIEWSTATE input in a <div>:

// Wrap the __VIEWSTATE tag in a div to pass validation
re = new Regex ("(<input.*?__VIEWSTATE.*?/>)",
                 RegexOptions.IgnoreCase);

finalHtml = re.Replace (finalHtml, 
                        new MatchEvaluator (ViewStateMatch));

The Regex class allows you to wire a match evaluator delegate which kicks in every time a match is found. The ViewStateMatch delegate is implemented as follows:

private static string ViewStateMatch (Match m)
{
  return string.Concat ("<div>", m.Groups[1].Value, "</div>");
}

If you were to implement Step 2 and use this filter as-is right now you'd run into some issues with post-back processing. Why's that? View the page source. Your __doPostBack method will look something like this:

function __doPostBack(eventTarget, eventArgument)
{
 var theform;
 if (window.navigator.appName.toLowerCase().indexOf("netscape") > -1) 
 { theform = document.forms["mainForm"];}
 else 
 { theform = document.mainForm; }

 ...
}

The gotcha here is that the form is referenced by its name, not id. If we get rid of the name attribute it can't handle postbacks. With the name attribute it's not valid XHTML code. Seems to be a catch-22 situation.

The following hack is of my own making. So far it has worked fine on this site and our www.custfeedback.com site, so I can't complain. However, keep in mind this is a hack so use it wisely and test your code well before going to production.

I decided to rewrite the __doPostback method to use DOM as opposed to the "old ways". This is to say, "To hell with old and bad browsers". Browser usage stats show that the ones without DOM1 support are almost extinct. Therefore assess your audience and see if this is going to work for you.

// If __doPostBack is registered, replace the whole function
if (finalHtml.IndexOf ("__doPostBack") > -1)
{
 try
 {
  int     pos1 = finalHtml.IndexOf ("var theform;");
  int     pos2 = finalHtml.IndexOf ("theform.__EVENTTARGET", pos1);
  string  methodText = finalHtml.Substring (pos1, pos2-pos1);
  string  formID = Regex.Match (methodText,«
          "document.forms\\[\"(.*?)\"\\];",
          RegexOptions.IgnoreCase).«
          Groups[1].Value.Replace (":", "_");

 finalHtml = finalHtml.Replace (methodText,  
     @"var theform = document.getElementById ('" + formID + "');");

 }
 catch {}
}

The transformed __doPostback should look similar to this:

function __doPostBack(eventTarget, eventArgument)
{
 var theform = document.getElementById ('mainForm');

 ...
}

This one will keep the validator happy. And last, but not least, we're supposed to remove the name attribute from the main form.

// Remove the "name" attribute from <form> tag(s)
re = new Regex("<form\\s+(name=.*?\\s)", RegexOptions.IgnoreCase);
finalHtml = re.Replace(finalHtml, new MatchEvaluator(FormNameMatch));

A corresponding match evaluator delegate is implemented like this:

private static string FormNameMatch (Match m)
{
 return m.ToString ().Replace (m.Groups[1].Value, string.Empty);
}

Installing the Request Filter

I prefer to wire a request filter in an HttpModule. The nuts and bolts of the HttpModule and HttpApplication classes are outside the scope of this article. You can find a brief overview in my other article, ASP.NET Custom Error Pages.

Below is bare-bones code of an HttpModule:

// ---------------------------------------------
public void Init (HttpApplication app)
{
 app.ReleaseRequestState += new EventHandler(InstallResponseFilter);
}

// ---------------------------------------------
private void InstallResponseFilter(object sender, EventArgs e) 
{
 HttpResponse response = HttpContext.Current.Response;

 if(response.ContentType == "text/html")
       response.Filter = new PageFilter (response.Filter);
}
HttpApplication Lifetime

The app parameter passed to the Init method is of type System.Web.HttpApplication. You tap into the ASP.NET HTTP pipeline by wiring handlers of the various HttpApplication events. The diagram on the left illustrates the sequence of these events. See how late in the game your page filter is called? In the code sample above I install the response filter in the ReleaseRequestState event handler. To make sure the filter processes only pages I explicitly check for content type:

if (response.ContentType == "text/html") 
       response.Filter = new PageFilter (response.Filter);      

The final step of plugging your HttpModule into the pipeline is listing it in web.config (also explained in my other article):

<system.web>
 <httpModules>
   <add name="MyHttpModule" type="MyAssembly.MyHttpModule,
          MyAssembly" /> 
 </httpModules>
</system.web>

Remember to replace MyHttpModule and MyAssembly with appropriate module and assembly names from your project.

Performance Considerations

Back when we were implementing the Write method I used a string variable, finalHtml. Keep in mind that in .NET strings are immutable, i.e. you cannot change a string's length or modify any of its characters. For example:

finalHtml = re.Replace(finalHtml, new MatchEvaluator (FormNameMatch));

The finalHtml variable holds the entire HTML response. When the line of code above runs a whole new string will be allocated and assigned to finalHtml. If you manipulate large strings and do it again and again it may negatively effect performance and breed garbage in memory.

When Filters Don't Work At All

One last issue before I wrap up this article. Your filter won't be called at all if you call HttpApplication.CompleteRequest() one way or another. The pipeline will bypass your filter and send an unmodified response. The following methods do call HttpApplication.CompleteRequest():

  • Server.Transfer()
  • Response.End()
  • Response.Redirect()

The only one that doesn't call HttpApplication.CompleteRequest() is Server.Execute().

"You lie!!!" No, see for yourselves:

// --- HttpServerUtility.Transfer ---
public void Transfer(string path, bool preserveForm)
{ 
 if (this._context == null)
    throw new HttpException(...);

 this.ExecuteInternal(path, null, preserveForm);
 this._context.Response.End();
}
// --- HttpServerUtility.Execute ---
public void Execute(string path)
{ 
 this.ExecuteInternal(path, null, 1);
}
// --- HttpResponse.End ---
public void End()
{ 
 ...
 this.Flush();
 this._ended = true;
 this._context.ApplicationInstance.CompleteRequest();
} 
// --- HttpResponse.Redirect ---
public void Redirect(string url, bool endResponse)
{
 ... 
 if (endResponse)
  this.End();
}

If HttpApplication.CompleteRequest() is called during an event the ASP.NET HTTP pipeline will interrupt request processing once the event handling completes. If it's of any consolation it will fire the EndRequest event.

Conclusion

I hope this article was a wake-up call in terms of programming with web standards in mind. We looked at a real-world example of writing a request filter and enforcing XHTML 1.0 compliancy. This is a highly experimental article as you will most likely discover other gotchas when you validate your pages against the W3C MarkUp Validation Service.

As indicated at the beginning of the article, ASP.NET 2.0 is supposed to bring to the table a host of useful features. Why bother with request filters then? Why all this hacking? XHTML compliancy is promised anyway. Well, we can just sit around and drool over the features of Whidbey, Yukon and what have you. We have real jobs, real projects and a real paycheck. Besides, I illustrated only one practical application of a filter. There are many more.

There's a common misconception that ASP.NET is easy to master and that it just takes care of everything for you. Not so. ASP.NET is not easy. It's powerful. It puts you in the driver's seat. Therefore hacking doesn't go away any time soon.

59 comments

SomeNewKid
on February 15, 2004

An interesting article, Milan.

I do have a question, however. (As I'm only a newbie developer, it *is* only a question ... not a challenge.)

Isn't running the output of *every* page through THREE RegEx processes extremely poor in terms of performance and scalability?

Why not just use custom Page and Form classes, as per the following article:
http://www.liquid-internet.co.uk/content/dynamic/pages/series1article1.aspx

Thanks again for an interesting article.


Milan
on February 16, 2004

Yep, I've seen this article. It's an interesting approach, but... ASP.NET is a pretty complex framework. There's a lot of plumbing in place. I really wouldn't want to maintain an alternative server-side form, its viewstate, etc. It seems like too much hassle. I find it easier and more efficient to tap into a generated response and tweak it. Also, it's been promised ASP.NET 2.0 would produce web standards compliant code. We'll see about that in due time. If they meet the promise my filter won't be necessary which I'm ok with.

As to manipulating text - yes, it may create overhead. Since strings in are immutable you end up re-allocating chunks of memory each time you need to assign a string a new value. You can compensate this with page caching or page fragment caching. The benefit of caching is tremendous.


Paul
on February 19, 2004

Nice article, good to see more and more people catching onto standardizations. I really like this as I can apply this filter on a page by page basis or a base page class in template scenarios. I'd like to see an implementation that changes the bytes directly as they are read through...

Anyone interested in standardized markup with asp.net may be interested in a new product on the markup ( not mine, not a shameless plug ): www.xhtmlwebcontrols.net


Bill
on February 29, 2004

Interesting article. I've just started doing C# with XHTML (with some XML thrown in for fun). I'm still a ways off, but I've cut my validation errors way down - by half so far. I'm going to have to dig into your article a bit more to see what I'm missing yet. Thanks for getting this information out there.


Lon Palmer
on August 26, 2004

Oh my.

I have a project to do for a client that is running MS 2003 servers. Naturally I thought "I'll use ASP.NET, it's native to his server!" My next thought was "I'll make it standards compliant. I'll use XTML 1.0 strict!"

Now, after reading this, I want to install Tomcat and go back to Java.

Most of my PC coding, I do in C# WinForms. My server coding was always in Java (J2EE) and I'll probably move back to it.

I don't like alot of Java Script in my pages (any if I can help it). Nor do I like the idea of not being in control of the output of my page. What were the ASP.NET developers thinking when they took so much control? What if their code breaks a browser that I need to target, Like a web enabled phone or a pda? Seems they've painted themselves into a "Web from a PC only" box.

Is ASP.NET really viable in the enterprise? Really?


Lon Palmer
on August 26, 2004

Ok, One kudo to ASP.NET. It seems VERY fast.

nuff said.


Jason
on September 13, 2004

Won't response.Filter overwrite any existing filters in place? Forgive me if this is a stupid question, but I am completely new to the HttpModule framework...


Milan Negovan
on September 13, 2004

You need to make sure you don't just overwrite the old filter, but "chain" it. The code shows it passes the existing response.Filter on to the contructor on my filter and thus preserves it (look up InstallResponseFilter in the HttpModule). Makes sense?


Nick
on October 25, 2004

Milan,

how efficient is that HTTP Filter in terms of performance & scalability, when using it in high loads conditions (hundred of thousands users per months for example..) ?

It seems to be the easiest way to make XHTML compliant my aspx before ASP.Net 2.0 comes out in 2005, well at least easier than XHTMLWebControls which require to modify the core of my webapp


Milan Negovan
on October 25, 2004

I'd be careful with it---under heavy loads it might not perform that well because of all the RegEx processing. Caching cleaned up pages and compressing them will surely mitigate the initial performance hit.

The upside of this approach is its ease of use. You simply plug it into the pipeline.


David
on November 04, 2004

Good article, i've implemented on my site and it worked well until .NET SP1 was installed where the __doPostBack function has both the language and type attribute, your code replaces the language attribute with the type attribute therefore causing duplicate tags.
I also wrapped the __EVENTTARGET and __EVENTARGUMENT hidden values in div tags as this was required.
I added an extra bit to remove whitespace too
// Remove whitespace
if (bool.Parse( ConfigurationSettings.AppSettings["RemoveWhitespace"].ToString() ))
{
finalHtml = Regex.Replace(finalHtml, "\t", string.Empty );
finalHtml = Regex.Replace(finalHtml, "\n", string.Empty );
finalHtml = Regex.Replace(finalHtml, "\r", string.Empty );
finalHtml = Regex.Replace(finalHtml, "", "// --> \n" );
}
based on a web.config setting, knocks about 10% off the page size


Milan Negovan
on November 04, 2004

Duplicates? Hmm.... I'll look into that. As to replacing white space characters---this is what I wanted to work on next as an addition to the filter. ;)


David
on November 10, 2004

I suppose you could remove the addition of the type attribute, I think the .NET Framework 1.1 SP1 adds this now, i'll try it later.

Here's a link to the regex to clean up whitespace, this form seemed to remove a line from my code above.

http://dotnetjunkies.com/WebLog/donnymack/archive/2003/09/08/1468.aspx


David
on November 10, 2004

Yep, remove the regex expression that replaces the language attribute with the type attribute for .NET 1.1 SP1, just make sure any SCRIPT tags have the type attribute on in your code.


Bruce
on November 10, 2004

Hi,

I am getting the following error when compile the HttpModule class : 'MyHttpFilter.xhtmlFilter' does not implement interface member 'System.Web.IHttpModule.Dispose()'.

I have added the function and it works now, should this be done or is my configuration wrong?

Great fix for xhtml BTW.


Thomas
on November 12, 2004

I have the same problem. ('MyHttpFilter.xhtmlFilter' does not implement interface member 'System.Web.IHttpModule.Dispose()'.)
As I am not so familiar with c# it would be great if anyone could tell me how to fix it.

TIA
Thomas


Milan Negovan
on November 12, 2004

Thomas and Bruce, if you've created a new VB.NET project, make sure you clear out the Root namespace.


David Rhodes
on December 08, 2004

Milan, could you point out the code section needed to re-write the action property of the form when using url re-writing, I can't seem to find it in this article


Milan Negovan
on December 11, 2004

David, I took it out of this HttpModule because something wasn't quite working out with rewriting the action attribute. As I indicated in a blog post I moved it into another module. I'm still trying to remember where it went. :) I'll post it here as soon as I find it.


Basic Date Picker
on December 20, 2004

Milan, nice work on your XHTML filter. We too had a problem with the filter rendering double ‘type’ attributes in the script blocks.

Inside the Write() method we changed the following:

This...

// Replace language="javascript" with script type="text/javascript"
re = new Regex ("(?<=script\\s*)(language=\"javascript\")", RegexOptions.IgnoreCase);
finalHtml = re.Replace (finalHtml, new MatchEvaluator (JavaScriptMatch));


Became this...

// Replace language="javascript" with script type="text/javascript"
// This will match language="javascript", language="javascript1.1", etc.
string regexJSLanguage = "(?]*?)(language=\"javascript[^>]*?\")";
re = new Regex (regexJSLanguage,RegexOptions.Multiline | RegexOptions.IgnoreCase);
finalHtml = re.Replace(finalHtml, new MatchEvaluator (JavaScriptMatch));


// Check for blocks that have double "type="text/javascript"" attributes and strip to only one.
string regexDoubleJSType = "(?]*?)(?type=\"[^\"]*?\"\\s?)(?[^>]*?)?(?type=\"[^\"]*?\"\\s?)(?[^>]*?>)";
re = new Regex (regexDoubleJSType,RegexOptions.Multiline | RegexOptions.IgnoreCase);
finalHtml = re.Replace(finalHtml, new MatchEvaluator(DoubleJSTypeMatch));


We basically pull apart the block into it's parts and glue back together using only one type attribute instead of two. I'm sure there is a way both those javascript replace methods could be combined, but it is what it is at the moment. I'm no regex expert and the above fix does not 'appear' to take any performance hit, although we have not run through any load stressing to confirm.


The following Match was added to handle the doubleJS...

private static string DoubleJSTypeMatch (Match m)
{
return m.Result("${startTag}${miscAttributes}${secondTypeMatch}${endTag}");
}


Keep up the excellent work Milan.

Geoff - http://www.basicdatepicker.com


Nicholas Berardi
on January 01, 2005

This is a wonderful article. I have a simple queston as to why you choose to use ReleaseRequestState instead of PreSendRequestContent. Is there a difference in where they execute and that is why you can't use filters in the PreSendRequestContent?


Milan Negovan
on January 01, 2005

PreSendRequestContent is a non-deterministic event. It's timing of invocation is not completely random, but timing is important. Also, I wanted to make sure everyone in the pipeline had a chance to contribute to the response, which is why wire my filter so far down the pipeline.


Franck Quintana
on February 10, 2005

First of all this article is great! Thank you Milan :)
I think i have a bit optimization :

// The title has an id="..." which we need to get rid of
re = new Regex ("", RegexOptions.IgnoreCase);
finalHtml = re.Replace (finalHtml, new MatchEvaluator (TitleMatch));
-----------------------------
if you replace it by:

re = new Regex ("", RegexOptions.IgnoreCase);
if(re.IsMatch(finalHtml)) {
finalHtml = re.Replace (finalHtml, new MatchEvaluator (TitleMatch));
}


and the same for the others re.Replace...

testing IsMatch on each Replace avoid memory consumption because of immutable strings.

Hope this helps!
Franck.


Pragati
on February 14, 2005

Thanks for the valuable information. The article provides sufficient inputs to start with web accessibility in .net for me.


Vadra Rowley
on April 11, 2005

Thank you for addressing this issue, but I was disappointed after I had read half and scanned the rest of the article. At the very beginning, you stated the article had a two-fold purpose. I was waiting for you to address the first mentioned... the importance of following xhtml standards. Could you or anyone comment on this? I need to convince a few people who don't convince easily.


Milan Negovan
on April 11, 2005

Point them to this Web Standards Primer and The way forward with web standards.


Tim
on April 22, 2005

Many thanks for an overview of Accessibility - this has formed a positive start. I may wait until .NET 2.0 comes out instead of venturing into overcoming some of the accessibility issues associated with .NET 1.0.


Bjorn
on June 27, 2005

I just moved from Java to .NET. I'm shocked by the appaling status of webstandards compliance in ASP.NET. Sigh.

Good to see people like you putting focus on it, though.


Jeremy
on July 16, 2005

Is there a reason for not including the option to compile the regex statements for reuse? This would slow down the first call but speed up the successive ones.

Also, this block of code is flawed:
// If __doPostBack is registered, replace the whole function
if (finalHtml.IndexOf ("__doPostBack") > -1)
{
try
{
int pos1 = finalHtml.IndexOf ("var theform = document.getElementById ('');
theform.__EVENTTARGET", pos1);
string methodText = finalHtml.Substring (pos1, pos2-pos1);
string formID = Regex.Match (methodText,«
"document.forms\\[\"(.*?)\"\\];",
RegexOptions.IgnoreCase).«
Groups[1].Value.Replace (":", "_");

finalHtml = finalHtml.Replace (methodText,
@"var theform = document.getElementById ('" + formID + "');");

}
catch {}
}

as exception handling is expensive, it should NEVER be used to control code flow. Nitpicky? maybe... but we're dealing with code here that potentially runs for every text/html response on a site. Every little bit adds up.


Milan Negovan
on July 18, 2005

Jeremy, good point about the regex. I guess it could benefit from a compilation flag.

The part where the form is replaced is quite touchy in the sense that "it's subject to change without notice" and not handling an exception there would bring down every web page, which would render the site useless.


Dan
on August 30, 2005

Hi All, good article ... :)

Someone have the code in VB.NET?

Thanks


Fordiy
on September 01, 2005

I added these module to my existing C#.net project. I got the same problem when I compiling the codes.

('MyHttpFilter.xhtmlFilter' does not implement interface member 'System.Web.IHttpModule.Dispose()'.)

Can you put the detail procedure here to plugin existing project?

Thanks


vbguy
on September 28, 2005

Is it possible for someone to post this filter using vb .net code?


JfK
on November 15, 2005

I would like to mention one thing that is missing here: naming consistency. Generally speaking, your article Milan is ok, but something hides there that causes some confusion: you articles title is "... with response filters" but then in the middle there is a section titled "Installing the Request Filter". Hmmm... Even more interesting - in this section about "request filter" there's a code line "response.Filter = new PageFilter (response.Filter);". Obviously something's mixed up here. I've found your page by google. I was searching for clues in writing _request_ filter. Oopps, it's not this page :-) Everybody's write _response_ filters but plainly _request_ filters are less popular. I had to write one, because of viewstate errors caused by mad&old mobile browsers which can sometimes urlencode viewstate _twice_ (!!!) or forgot that '+' sign has to be urlencoded as well. No solution on the web. No solution anywhere. .NET Core classes - that's another story. Try to change some behavior there. Good luck :-) If I could onlyerase keywords private and internal from brains of M$ core developers. I've lost 8h looking for any way of hooking into viewstate before it is mangled to have an opportunity to fix it. No no no, M$ tells me it's no good. If anybody wants to look into devil's eyes I advise to switch Reflector on and see MobilePage class. There is a private field _requestValueCollection. Now travel to base class Page and... there is another copy of _requestValueCollection! Hooray! Who wrote this I would like to ask, but I don't think anybody can answer. This small shitty thing repels you from any serious viewstate manipulation in MobilePage. Enough of this, sorry for the bloat, but I had to throw it out of myself :-) Concluding: Milan, plese fix that sections title because googling asp+request+filtering leads to your article - and you don't filter requests, do you? Regards!


Milan Negovan
on November 26, 2005

The "request" in the context of this article is the whole chain of events: starting with a request from the client, down the ASP.NET pipeline, and the subsequent response. In this sense I do filter requests.

The issue is that we often allude to classes that handle the entire request, such as HttpRequest, HttpResponse, HttpApplication, etc. Their naming might confuse the issue of request processing.

As far as mobile controls go, I've heard from way too many people how raw those controls are. Not good.


DOC Holiday
on December 08, 2005

I'm having trouble using your methods using VB.NET - can anyone post examples on how to do this in VB.NET?


Euan
on December 14, 2005

Yeah vb.net would be nice


Derek
on December 27, 2005

Thank you very much Milan, good job! Below is the VB.NET "translation"

Public Class PageFilter
Inherits Stream
Private responseStream As Stream
Private _position As Long
Private responseHtml As StringBuilder

Public Sub New(ByVal inputStream As Stream)
Me.responseStream = inputStream
Me.responseHtml = New StringBuilder
End Sub

Public Overrides ReadOnly Property CanRead() As Boolean
Get
Return True
End Get
End Property

Public Overrides ReadOnly Property CanSeek() As Boolean
Get
Return True
End Get
End Property

Public Overrides ReadOnly Property CanWrite() As Boolean
Get
Return True
End Get
End Property

Public Overrides Sub Flush()
Me.responseStream.Flush()
End Sub

Public Overrides ReadOnly Property Length() As Long
Get
Return 0
End Get
End Property

Public Overrides Property Position() As Long
Get
Return Me._position
End Get

Set(ByVal Value As Long)
Me._position = Value
End Set
End Property

Public Overrides Function Read(ByVal buffer() As Byte, ByVal offset As Integer, ByVal count As Integer) As Integer
Return Me.responseStream.Read(buffer, offset, count)
End Function

Public Overrides Function Seek(ByVal offset As Long, ByVal origin As System.IO.SeekOrigin) As Long
Return Me.responseStream.Seek(offset, origin)
End Function

Public Overrides Sub SetLength(ByVal value As Long)
Me.responseStream.SetLength(Length)
End Sub

Public Overrides Sub Write(ByVal buffer() As Byte, ByVal offset As Integer, ByVal count As Integer)
Dim strBuffer As String = System.Text.UTF8Encoding.UTF8.GetString(buffer, offset, count)
Dim eof As New Regex("", RegexOptions.IgnoreCase)

If Not eof.IsMatch(strBuffer) Then
responseHtml.Append(strBuffer)

Else
responseHtml.Append(strBuffer)
Dim finalHtml As String = responseHtml.ToString()
Dim data As Byte() = System.Text.UTF8Encoding.UTF8.GetBytes(finalHtml)
Me.responseStream.Write(data, 0, data.Length)
End If
End Sub

Public Overrides Sub Close()
Me.responseStream.Close()
End Sub
End Class


Milan Negovan
on December 28, 2005

Many thanks, Derek!


Kieran
on January 08, 2006

Hi,

I think the error:

"'MyHttpFilter.xhtmlFilter' does not implement interface member 'System.Web.IHttpModule.Dispose()'."

can be fixed wit the following:

public void Dispose () {}

K


Sigurd
on February 08, 2006

I used a similar technique to add content to pages produced by a third party. Essentially it injects a "standard" header and footer into the html output.

I ran into some trouble regarding concurrent requests. It appears that the buffer parameter sent to the filter's Write() method includes more than just the output from a single request.

Limiting the "work area" to what was specified by the Write() method's offset and count resolved the issue.

-S


Jeff Sargent
on April 13, 2006

Milan,

Love the article - I'm using this technique on the company website to throw an intermediate page before any outbound links. I don't like the practice, cause it's annoying, but we apparently have a lot of complaints about "pages being broken", and we find that the user never realized they followed an outbound link that we don't control.

Anyhow, onto the tech - the Regex I'm using matches links with http:// and https://, processes them a bit and compares them to an XML file of "internal" domains that we don't actually want to be flagged as external to our main website. If it doesn't find the domain in the list of "internal" domains, it prepends "outbound.aspx?link=" to the link.

Here's my problem - I want to exclude "outbound.aspx" from being processed under this httpmodule - when I grab the link out of the querystring (link=) and put it inside the page as a "Continue to..." link, the httpmodule processes that link also, causing a recurring link to "outbound.aspx?link=http://www.the.original.link/". How do I exclude just one page?

Thanks!


Milan Negovan
on April 17, 2006

Jeff, I'm not sure, but this looks like an exercise in regular expressions. Would you like to send me a code snip so I can see what's causing a link loop?


Jeff Huck
on May 19, 2006

Thanks for the great article. Does anyone know if this technique still results in the correct Content-length header or why that may not matter?


Milan Negovan
on May 20, 2006

The Content-length header is correct. It's the length of the compressed content.


PavelBure
on June 07, 2006

This code is vulnerable to a simple attack. If a malicious user could enter /html tag somewhere in your site (e.g. in forums), this would cause the content to be written twice.

A regex check for the end of the document can be avoided like this:
public class HttpFilter : Stream
{
private Stream m_objSink;
private StringBuilder m_objResponseHtml;

public override void Write(byte[] buffer, int offset, int count)
{
string strBuffer=HttpContext.Current.Response.ContentEncoding.GetString(buffer,offset,count);
m_objResponseHtml.Append(strBuffer);
}

public override void Flush()
{
string strHtmlOutput=m_objResponseHtml.ToString();
//here we can change the content
byte[] data=HttpContext.Current.Response.ContentEncoding.GetBytes(strHtmlOutput);
m_objSink.Write(data,0,data.Length);
m_objSink.Flush();
}
}

This seems to work at my site.


Jeff Magill
on July 06, 2006

Thanks for the great article Milan. This is exactly the functionality I had been looking for though my situation is a bit different. I do have a couple issues though.

I'm having trouble understanding why you are using a HTTPModule for this. I know you said you wanted to filter the HTML early on in the request, however, it seems to me that a HTTPModule is a lot of work for something that can be accomplished quite easily elsewhere.

I also tried to test out the loss of functionality when Response.Redirect() is used. Perhaps I misunderstood your point, but in my testing, my filters remained intact despite the the use of Response.Redirect().


Milan Negovan
on July 10, 2006

It's only a matter of personal preference, I think. I like the modular design of HTTP modules. They give me a lot of flexibility in coding and maintenance.

I believe you can tap into the page filtering from the Page class itself, though.


Marcus
on July 11, 2006

I have been using this method to get XHTML-Compliant Pages in asp.net 1.1 but now it´s time to move on to asp.net 2.0. The output seems to be pretty nice and the code validates on http://www.w3.org/ but when i run the html validator "tidy" i recieve one error " ID "__VIEWSTATE" uses XML ID syntax" Does anyone have any idea how to fix this error?


007dad
on July 28, 2006

I am having same concerns as 49 Marcus


Ben Strackany
on November 28, 2006

Yep, ASP.NET 2.0 is much more compliant than 1.1.

You're getting a validation error in your viewstate because it has an id of __VIEWSTATE, and the HTML 4.01 spec says ids must start with a letter, not underscores.

Setting your page doctype to XHTML should resolve the issue, and/or telling Tidy (or whatever validator you're using) to validate the page as XHTML, not HTML. However, it could be that certain validators (like Tidy, perhaps) are going to complain about the underscores no matter what. If that's the case, you can disable viewstate, ignore the validation errors, or hack into the Page class & rename the ViewState to something else.


Ben Strackany
on November 28, 2006

You might find some tips on renaming or getting rid of ViewState here

http://www.codeproject.com/aspnet/ServerViewState.asp

and here

http://www.eggheadcafe.com/articles/20040613.asp

e.g. in the SavePageStateToPersistenceMedium and LoadPageStateFromPersistenceMedium overrides.


Arul
on April 09, 2007

Can anyone tell the regular expression syntax for replacing the empty div with proper closing div.

should be replaced with .
please tellme the syntax using regex.replace


Chris
on August 27, 2008

It is a good thing to move to more standards complient XHTML, but the resulting changes are don't do anything but please purist-webstandard-types (i am one of them :) ).

In my opinion ASP.NET folks moving toward webstandards should check out the CSS Friendly Adapters(http://www.codeplex.com/cssfriendly). This project takes care of creating more accesible, semantic XHTML. The CSS Friendly Adapters for example modify the RadioButtonList control from table markup to an unordered list markup.

That said, creating true perfect shiny XHTML, CSS and JS is next to impossible using 'classic ASP.NET'. My hope is that the ASP.NET MVC architecture will get out of the way!


Qurban Ali
on August 28, 2008

If you click Switzerland in this page, you will see there is diamond like char. Can we remove someway bu the method you described?


Milan Negovan
on August 28, 2008

Let's take this offline. Please see my email.


Matthew Marksbury
on September 25, 2008

What if the underlying page does not have an HTML closing tag, or has multiple closing tags? Sure, this is non-compliant, but with a CMS system and business users managing content, it is very possible for this to happen.

How would you work around that issue?


Milan Negovan
on September 25, 2008

Matthew, I make no attempt to "beautify" HTML. I go after the parts I know are dirty. Or *were* dirty back in 1.x days. If I wanted to clean HTML, I'd have to resort to parsing it, etc, which is a long saga.

In ASP.NET 2.0, this is handled much better and you don't even need my filter. ;)


Magento
on February 02, 2010

Good article, good to meet more and more people catching onto standardizations. I really enjoy this as I can use this filter on a page by page basis or a base page class in template scenarios. I'd like to find an performance that transforms the bytes directly as they are read through.