Dailydave mailing list archives

Re: Quick thread on SQLi


From: Nate Lawson <nate () root org>
Date: Thu, 8 Mar 2012 15:47:29 -0800

On Mar 8, 2012, at 11:17 AM, Michal Zalewski wrote:

There are many SQLI patterns that are hard for automated tools to
find. This is an obvious point, so I'm sorry to pedantic, but I think
a survey based on automated scanning is a misleading starting point
for the discussion.

Well, the definition of a web application  is a surprisingly
challenging problem, too. This is particularly true for any surveys
that randomly sample Internet destinations.

Should all the default "it works!" webpages produced by webservers be
counted as "web applications"? In naive counts, they are, but
analyzing them for web app vulnerabilities is  meaningless. In
general, at what level of complexity does a "web application" begin,
and how do you measure that when doing an automated scan?

Further, if there are 100 IPs that serve the same www.youtube.com
front-end to different regions, are they separate web applications? In
many studies, they are. On the flip side, is a single physical server
with 10,000 parked domains a single web application? Some studies see
it as 10,000 apps.

[more about various subdomain configurations deleted]

This is actually a researched topic, but in the area of massive web crawlers. The reason for this is that you need to 
balance:

* Parallel queries to different domains for performance but not overload a single server hosting them
* Make forward progress against different subdomains but not be vulnerable to a spider trap DNS that returns 
$(PRNG).example.com

The best paper on this so far is for IRLBot:

H.-T. Lee, D. Leonard, X. Wang, and D. Loguinov, "IRLbot: Scaling to 6 Billion Pages and Beyond"
http://irl.cs.tamu.edu/people/hsin-tsang/papers/tweb2009.pdf

See sections 6 and 7 for their scheme to balance these priorities. It's quite clever how they combine this with a 
disk-based queue to avoid running into RAM limits. The result is a web crawler that saturates the network link and has 
no weak points where it sits waiting for a robots.txt response or something.

On your topic, perhaps you can apply some of their algorithms + some heuristics (exclude "it works" pages, find .php 
extensions, etc.) to get a fair estimates of the number of web apps at the subdomain level. This would leave out 
multiple web apps on a single subdomain, but at least it's a start.

-Nate
_______________________________________________
Dailydave mailing list
Dailydave () lists immunityinc com
http://lists.immunityinc.com/mailman/listinfo/dailydave


Current thread: