Pragmatic Developer

Ali Özgür

Bookmark Blog

Add to Technorati Favorites

Google Talk

Chat with Ali Özgür

Purchase PragmaSQL from

Calendar

«  August 2008  »
MoTuWeThFrSaSu
28293031123
45678910
11121314151617
18192021222324
25262728293031
1234567
View posts in large calendar

Tag Cloud

Don't show

    Authors

    Recent Comments

    Banners




    BlogEngine.NET version 1.3 has a syntax highlighting extension included but is in beta, so I looked around for another syntax highlighter, since this blog heavly uses code snippets. After looking for a while I found this syntax highlighter extension. This extension uses Wilco Bauwer's syntax highlighter library which is impressive. But the main problem with this extension is, it does not handle HTML tags and special HTML characters like &nbsp &lt and &gt very well. Tags and special characters are left as garbage after extension tries to highlight the code. You see something like this

    private voidTest&nbsp{

    <p>&nbsp;private&nbsp;int&nbsp;i=0;

    <p>}

    After inspecting SyntaxHighlightingExtension.cs file I saw that the extension matches the source code with a regular epression and feeds the Wilco's highlighter with the raw html (body). This was kind of incomlete implementation causing the side effect I mentioned above. We need to clean html tags and special characters from the raw html(body). So I changed Highlight method of the extension. The resulting method is something like this.

    001 private string Highlight(HighlightOptions options)
    002 {
    003  string parsed;
    004  uint id = NextCodeID();
    005  string name = options.Language; 
    006
    007
    008  HighlighterBase highlighter = GetHighlighter(name);
    009  if (highlighter != null)
    010  {
    011   name = highlighter.FullName;
    012   highlighter.Parser = htmlParser;
    013   
    014   string body = Regex.Replace(options.Code,@"<(?![!/]?[>\s])[^>]*>",String.Empty,RegexOptions.CultureInvariant| RegexOptions.IgnoreCase | RegexOptions.Singleline);  
    015   body = HttpUtility.HtmlDecode(body);
    016   parsed = highlighter.Parse(body);
    017   highlighter.ForceReset();
    018  }
    019  else
    020  {
    021   name += " (not highlighted)";
    022   parsed = options.Code;
    023  } 
    024
    025
    026  if (options.DisplayLineNumbers)
    027  {
    028   string[] lines = parsed.Split(new char[] { '\n' });
    029   StringBuilder outputBuffer = new StringBuilder(); 
    030
    031
    032   for (int i = 0; i < lines.Length; i++)
    033   {
    034    outputBuffer.AppendFormat(linenumberingTemplate, i+1, lines[i]);
    035   } 
    036
    037
    038   return string.Format(OutputTemplate, id, name, options.Title, outputBuffer);
    039  } 
    040
    041
    042  return string.Format(OutputTemplate, id, name, options.Title, parsed);
    043 }

    The change I made resides starting from line 12 and ending in line 21. I simply stripped out the html tags with a regular expression and then used HttpUtility.Decode method to decode special HTML characters and feed the parser with normalized body text and voila the extension started performing well.

    By the way I want to remind you that if you are using the default editor (TinyMCE) you should copy and paste your source code to a plain text editor like Notepad++ and the copy from Notepad++ and paste to TinyMCE. I think TinyMCE should consider to add something like Paste as PlainText functionality to their editor like FCKEditor.


    Posted in: BlogEngine.NET  Tags:

    Currently rated 5.0 by 2 people

    • Currently 5/5 Stars.
    • 1
    • 2
    • 3
    • 4
    • 5

    Comments


    April 6. 2008 17:24
    starec.eu
    Thank you very much for article! I was worried that I'm doing something wrong, but I see now that it was not my mistake Wink

    http://www.starec.eu/

    Add comment


    (Will show your Gravatar icon)  

      Country flag

    [b][/b] - [i][/i] - [u][/u]- [quote][/quote]



    Live preview


     
    August 21. 2008 10:06

    no site