comments in-line:
Matt wrote:
> So let me get this straight...
>
> So what you want to do is go through someone's SOA (Start of
> Authority) and search for just keywords that you choose in order to
> find all sites containing those keywords?
-----------------
i just want to search for domain names similar to what netcraft is doing
but on locally downloaded zone files that will parse/match keywords such
as *sex.*, *hate*.*, *porn*.*...etc.
>
> I don't think that's gonna happen. There's no way you're getting the
> entire SOA for any registrar so that you can do that. You would be
> 100,000,000 times better setting up your own proxy firewall and
> setting up a content filtering on it and use the same keywords to
> prevent people from accessing those sites. If you wanted to, over
> time, you could log the events of attempted traffic with those
> keywords and the sites people are trying to go to in order to build
> yourself a listing of prohibited sites and then drop the keyword
> filtering, but your strongest option is to stay with a proxy with
> content filtering.
----------
trying to build this list for a content filtering product :-)
>
> There's a reason why there are companies out there that make big money
> doing this kind of filtering technique. Because it's not that simple
> to do. Cosmin's idea is kinda close to a reasonable way to go out and
> get addresses, but could take a long time of searching to pull down
> every possibility (e.g. Google search Results 1 - 10 of about
> 76,800,000 for inurl:porn. (0.12 seconds)). Good luck reading all 76
> million results.
-----------------
i wish i could *grin*, but i can only hit up to 1000 sites even though
it says 990,000 sites. just wondering whether having an engine sitting
locally (such as google) will help overcome this limit besides other
features it offers.
regards,
/vicky
>
> Just my .02
>
>
> --
>
>
> On Mon, 28 Mar 2005 12:36:50 -0800, Vicky Rode <vicky.rode_at_gmail.com> wrote:
>
>>We've already looked at netcarft and it has been partially helpful.
>>
>>What I'm looking at doing (besides data that I receive via peering) is
>>searching via keywords through sync'd dns zonefiles and parse the output
>> to a filter database something similar to update file if you will.
>>
>>This is being done as a home-grown solution.
>>
>>regards,
>>//vicky//
>>
>>J. Oquendo wrote:
>>
>>>Actually Vicky, you're quite wrong. I'm sure this will be what you
>>>speficied more or less. Netcratft's search DNS
>>>http://searchdns.netcraft.com/?host
>>>
>>>However, I think it only finds sites that have either been checked on
>>>Netcraft, or perhaps sites that have been queried or something. Not sure
>>>of the parameters behind how they obtain the information.
>>>
>>>On Fri, 25 Mar 2005, Vicky Rode wrote:
>>>
>>>
>>>
>>>>absolutely NOT but in fact to search for offending sites (porn,
>>>>call-home..etc) to be blocked at our filtering appliance.
>>>>
>>>>
>>>>
>>>>regards,
>>>>/vicky
>>>>
>>>>Alexander Chamandy wrote:
>>>>
>>>>
>>>>>On Wed, 02 Mar 2005 17:42:24 -0800, Vicky Rode <vicky.rode_at_gmail.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>>Hi there,
>>>>>>
>>>>>>Just wondering if there is any way I could use a scanner (I have a home
>>>>>>grown script for this) that would go thru the DNS registries from some
>>>>>>public source, scan for keywords in the domain name.
>>>>>>
>>>>>>Will appreciate if someone can point me in the right direction.
>>>>>>
>>>>>>regards,
>>>>>>/vicky
>>>>>
>>>>>
>>>>>You mean to scan whois records for particular domains for keywords in
>>>>>the registration information or scan the registry for domain names
>>>>>with certain keywords? This wouldn't be used for gathering
>>>>>information such as e-mail addresses to spam, would it?
>>>>>
>>>>>
>>>>
>>>=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
>>>J. Oquendo
>>>GPG Key ID 0x0D99C05C
>>>http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x0D99C05C
>>>
>>>sil @ infiltrated . net http://www.infiltrated.net
>>>
>>>"How a man plays the game shows something of his
>>>character - how he loses shows all" - Mr. Luckey
>>>
>>>
>>>
>>
>
Received on Apr 04 2005