(Mark: I think this is interesting and it was returned on timeout)
Thanks Jeremiah
One righteous within the City of Sodom.... I already thought that I'll
get only tones of answers showing me the light of "file" ...
> -----Original Message-----
> on Monday, January 10 Jeremiah Grossman wrote:
>
> Explicitly trusting the content-type header value and the file
> extension when they match alone is not a good idea.
> Here's a process I have seen done in the past that seems to work
fairly
> effectively.
>
> 1) Is the content-type header value a valid file-type your expecting
to
> receive (HTML/XML/GIF)? If not, fail.
> 2) Does the content-type header value and file extension match with a
> mime-type entry? If not, fail.
> 3) Is the file within certain size constraints? If not, fail.
As this is not directly type related, it sort of belongs to along list
of checks that an Application IDS should do (RFC compliance for
example).
> 4) Does the file have proper format according to what its claiming to
> be? If not, fail.
> 5) If is an HTML file, then run it through some security filtering
> libraries.
Probably true to other file types as well...
>
> * Hopefully I got the steps right. Someone might want to double check
> the logic flow *
>
> With respect to the question, step 4 is where the details really
> matter. While you could use the unix 'file' command (as suggest by
> another poster) to determine actual file type, I would prefer another
> approach.
How right you are regarding "file": while file is a very nice and useful
utility, it is productivity oriented and not very security oriented:
- It matches very short signatures, making it relatively simple to evade
it.
- It has some big identification holes, at least in the magic file I'm
using (while it detects sub versions of a PDF file, it detects both word
and excel as a "Microsoft office document"
- It does little to detect content of text files, so that a perl, shell
and java script files are all detected the same.
The reason is not just that these shortcomings is not just that the
magic file is not large enough: the detection operators it supports are
rather limited. For example it does not support scanning the files for a
signature, but only looking for it at a predefined offset.
I'm also not sure that it is very well optimized for real time traffic
inspection required by an application security protection system such as
my company's product.
>
> Use the content-type header value that the files claims to be and
parse
> based on that premise. For instance if the file claimed to be GIF when
> it hits step 4, run it through an image parser and see if there are
any
> errors. Usually when I see files uploaded via web interface, the
> expect type of file is fairly limited for the most part. Normally
maybe
> a few types of text files (HTML, CSV, XML), pictures (GIF, JPG, PNG),
> possibly mp3's, etc. I would handle each type of file on a
case-by-case
> basis.
>
The problem with full parsing of each type is that it just takes too
long for a real time product such as the one we do. I'm looking for an
interim solution that does not require full parsing but does not rely on
limited signatures.
One tool that I've found is trid (http://mark0.ngi.it/soft-trid-e.html).
It is signature based but employs much stronger signatures. It also has
a unique tool to build those signatures from a collection of files.
>
> I don't know exactly how the unix 'file' command works, but I believe
> its going to make its best guess based on certain identifiable format
> indicators. And if this is all your looking for, its a great util.
> Personally I think its better to know if a file would actually parse
> rather than just appears to be something it might not be.
>
> Just preference between the different methods.
>
>
> jeremiah-
>
>
>
> On Sunday, January 9, 2005, at 01:22 PM, Ofer Shezaf wrote:
>
> >
> > Hi Jeremiah,
> >
> > I was researching lately the issue of ensuring that files (uploaded
and
> > downloaded) are of the right type.
> >
> > Do you think that matching extension and content type header would
be
> > enough? If no, are you aware of any technology to determine a file
type
> > according to its content?
> >
> > ~ Ofer
> >
> > Ofer Shezaf
> > CTO, Breach Security
> >
> > Tel: +972.9.956.0036 ext.212
> > Cell: +972.54.443.1119
> > ofers_at_breach.com
> > http://www.breach.com
> >
> >> -----Original Message-----
> >> From: Jeremiah Grossman [mailto:jeremiah_at_whitehatsec.com]
> >> Sent: Saturday, January 08, 2005 3:44 AM
> >> To: Alfred Hitchcock
> >> Cc: webappsec_at_securityfocus.com
> >> Subject: Re: Content monitorting in Application Security
> >>
> >> Sounds like common web site functionality and the resulting
security
> >> challenge.
> >>
> >> Here are techniques that may help...
> >>
> >> 1) When receiving an uploaded file of any kind, use various parser
> >> libraries to sanity check the actual format of data. Ensuring the
file
> >> being uploaded is what it claims to be. With the incoming file
> >> extension and content type header in agreement. jpeg's should be
> >> formatted like jpegs, mp3's like mp3's, html like html and so on.
> >>
> >> 2) If you plan on handling files beyond plain text, such as zips
and
> >> exe's, you may consider using some type of A/V product as well. A
nice
> >> security add-on that can be useful depending on the situation.
> >>
> >> 3) This following method is strictly about XSS and HTML/JavaScript
> >> content.
> >>
> >> While its fairly easy to filter all HTML tags from a file to
prevent
> >> XSS, its exponentially harder to separate HTML from executable
> >> client-side code (JavaScript). Especially when the HTML is freeform
> > and
> >> most tags need to be supported on the web site. I've long said its
a
> >> slippery slope to support use-submitted HTML, but sometimes it
can't
> > be
> >> helped.
> >>
> >> There are a few things than can do help mitigate the risk of the
> >> uploaded files.
> >>
> >> a. Filter out potentially malicious HTML tags or only allows a
> >> strict
> >> set of safe HTML tags.
> >> b. Filter out potentially malicious tag attributes or only
> > allows a
> >> strict set of safe tag attributes.
> >>
> >> * The either or is a give and take of security vs.
> >> functionality/ease-of-use.
> >>
> >> Depending on the programming language you are using, there might
> > be
> >> some libraries available that could help make this process easier.
I
> >> haven't used them, but I noticed there are libraries available for
> > Perl.
> >>
> >> http://cpan.uwinnipeg.ca/dist/HTML-StripScripts
> >> http://cpan.uwinnipeg.ca/dist/HTML-Scrubber-StripScripts
> >>
> >> There might be some available if you use some other language.
> >>
> >>
> >> best of luck!
> >>
> >>
> >> jeremiah-
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Friday, January 7, 2005, at 04:55 AM, Alfred Hitchcock wrote:
> >>>
> >>> Hi All,
> >>> I have a major doubt it would be of great help if anybody can
> > provide
> >>> solution to this.
> >>> I have a web page which allows to upload files such as jpeg and
html
> >>> files.
> >>> Is there any mechanisms which can detect malicious html files.
E.g.
> > if
> >>> a html page has got a malicious java script such as alert('xss')
> > then
> >>> how can we check these things. One more point to be noted here is
> > that
> >>> uploading of file can be done by any user.
> >>>
> >
Ofer Shezaf
CTO, Breach Security
Tel: +972.9.956.0036 ext.212
Cell: +972.54.443.1119
ofers_at_breach.com
http://www.breach.com
Received on Jan 24 2005