mailing list archives
Re: Fwd: CERT Advisory CA-2000-02
From: hno () HEM PASSAGEN SE (Henrik Nordstrom)
Date: Sat, 5 Feb 2000 12:13:55 +0100
Marc Slemko wrote:
Also note that filtering or encoding things is not as easy as you may
think. There are far too many very annoying things, including characterset
issues and browser specific extensions.
It is if you only accept ASCII/ISO-8859-1(or another defined character
class) with some simple markup extensions. The markup extension could be
a small strict subset of HTML, or a completely different one.
I do not understand why everyone claims that sanitizing HTML content is
that hard. For most applications where it is needed, the fancy features
of HTML simply isn't needed. If your are reading email, then it does not
matter much if the layout does not match to 100% of what the original
author intended, as long as the information content is properly
presented and you know that you safely can view the content.
For the case of publishing information on a shared web site using strict
HTML filterin is also beneficiable as it forces all authors to use a
common HTML dialect, guaranteed not to disturb the site enforced layout
or presentation, and helps keeping the information authors on track for
providing the information rather than fiddling around to much in layout
or presentation details. If you question the validity this approach to
information processing, take a visit to your closest larger news paper
and study the flow of information there.
You need to take separate views on information and layout. The two are
quite separate from each other. Defining a strict syntax for information
isn't hard, doing so for HTML layout not using pre-defined style-sheets
is a tricky issue.