On Thursday, January 13, 2005, at 06:04 PM, Ofer Shezaf wrote:
>
> Thanks Jeremiah
>
> One righteous within the City of Sodom.... I already thought that I'll
> get only tones of answers showing me the light of "file" ...
Lets hope the Internet doesn't suffer a biblical apocalypse...
>> -----Original Message-----
>> on Monday, January 10 Jeremiah Grossman wrote:
>>
>> 3) Is the file within certain size constraints? If not, fail.
>
> As this is not directly type related, it sort of belongs to along list
> of checks that an Application IDS should do (RFC compliance for
> example).
Another fine option. Actually, I guess there could be three places
to put this check. In the web server config, the web application, and
IDS device.
Either way, you really don't want someone uploading a Gig of garbage to
your web server.
>> 5) If is an HTML file, then run it through some security filtering
>> libraries.
>
> Probably true to other file types as well...
Certainly.
>> * Hopefully I got the steps right. Someone might want to double check
>> the logic flow *
>>
>> With respect to the question, step 4 is where the details really
>> matter. While you could use the unix 'file' command (as suggest by
>> another poster) to determine actual file type, I would prefer another
>> approach.
>
> How right you are regarding "file": while file is a very nice and
> useful
> utility, it is productivity oriented and not very security oriented:
> - It matches very short signatures, making it relatively simple to
> evade
> it.
> - It has some big identification holes, at least in the magic file I'm
> using (while it detects sub versions of a PDF file, it detects both
> word
> and excel as a "Microsoft office document"
> - It does little to detect content of text files, so that a perl, shell
> and java script files are all detected the same.
> The reason is not just that these shortcomings is not just that the
> magic file is not large enough: the detection operators it supports are
> rather limited. For example it does not support scanning the files for
> a
> signature, but only looking for it at a predefined offset.
> I'm also not sure that it is very well optimized for real time traffic
> inspection required by an application security protection system such
> as
> my company's product.
Very interesting data points. Someone has obviously done they're
homework on the matter.
>> Use the content-type header value that the files claims to be and
> parse
>> based on that premise. For instance if the file claimed to be GIF when
>> it hits step 4, run it through an image parser and see if there are
> any
>> errors. Usually when I see files uploaded via web interface, the
>> expect type of file is fairly limited for the most part. Normally
> maybe
>> a few types of text files (HTML, CSV, XML), pictures (GIF, JPG, PNG),
>> possibly mp3's, etc. I would handle each type of file on a
> case-by-case
>> basis.
>>
>
> The problem with full parsing of each type is that it just takes too
> long for a real time product such as the one we do. I'm looking for an
> interim solution that does not require full parsing but does not rely
> on
> limited signatures.
Performance. A good point I failed to consider. And yes, the process
would severely
lag depending on the file-type, length and number of connections.
Do you think it might be possible to check a documents data format
validity without actually parsing it into a data structure? I think
some XML tools might do something like this currently.
> One tool that I've found is trid
> (http://mark0.ngi.it/soft-trid-e.html).
> It is signature based but employs much stronger signatures. It also has
> a unique tool to build those signatures from a collection of files.
Interesting. I hadn't come across this before.
Received on Jan 16 2005