BlogEngine.NET version 1.3 has a syntax highlighting extension included but is in beta, so I looked around for another syntax highlighter, since this blog heavly uses code snippets. After looking for a while I found this syntax highlighter extension. This extension uses Wilco Bauwer's syntax highlighter library which is impressive. But the main problem with this extension is, it does not handle HTML tags and special HTML characters like   < and > very well. Tags and special characters are left as garbage after extension tries to highlight the code. You see something like this
private voidTest {
<p> private int i=0;
<p>}
After inspecting SyntaxHighlightingExtension.cs file I saw that the extension matches the source code with a regular epression and feeds the Wilco's highlighter with the raw html (body). This was kind of incomlete implementation causing the side effect I mentioned above. We need to clean html tags and special characters from the raw html(body). So I changed Highlight method of the extension. The resulting method is something like this.
001private string Highlight(HighlightOptions options)
002 {
003 string parsed;
004 uint id = NextCodeID();
005 string name = options.Language;
006
007 HighlighterBase highlighter = GetHighlighter(name);
008 if (highlighter != null)
009 {
010 name = highlighter.FullName;
011 highlighter.Parser = htmlParser;
012
013 string body = Regex.Replace(options.Code,@"<\s*br\s*/\s*>","\r\n",RegexOptions.CultureInvariant| RegexOptions.IgnoreCase | RegexOptions.Singleline);
014 body = Regex.Replace(body,@"<(?![!/]?[>\s])[^>]*>",String.Empty,RegexOptions.CultureInvariant| RegexOptions.IgnoreCase | RegexOptions.Singleline);
015 body = HttpUtility.HtmlDecode(body);
016
017 try
018 {
019 parsed = highlighter.Parse(body);
020 //parsed = HttpUtility.HtmlDecode(parsed);
021 }
022 catch
023 {
024 name += " (not highlighted)";
025 parsed = options.Code;
026 }
027 finally
028 {
029 highlighter.ForceReset();
030 }
031 }
032 else
033 {
034 name += " (not highlighted)";
035 parsed = options.Code;
036 }
037
038 if (options.DisplayLineNumbers)
039 {
040 string[] lines = parsed.Split(new char[] { '\n' });
041 StringBuilder outputBuffer = new StringBuilder();
042
043 for (int i = 0; i < lines.Length; i++)
044 {
045 outputBuffer.AppendFormat(linenumberingTemplate, i+1, lines[i]);
046 }
047
048 return string.Format(OutputTemplate, id, name, options.Title, outputBuffer, options.InitialStyle);
049 }
050
051 return string.Format(OutputTemplate, id, name, options.Title, parsed, options.InitialStyle);
052 }
The change I made resides starting from line 12 and ending in line 21. I simply stripped out the html tags with a regular expression and then used HttpUtility.Decode method to decode special HTML characters and feed the parser with normalized body text and voila the extension started performing well.
By the way I want to remind you that if you are using the default editor (TinyMCE) you should copy and paste your source code to a plain text editor like Notepad++ and the copy from Notepad++ and paste to TinyMCE. I think TinyMCE should consider to add something like Paste as PlainText functionality to their editor like FCKEditor.
e67034dd-dd55-490e-a100-1d0a5b5613b5|2|5.0