Home page logo

fulldisclosure logo Full Disclosure mailing list archives

Re: Squashing supposed hacker profiling
From: "Steven Adair" <steven () securityzone org>
Date: Tue, 19 Jun 2007 10:30:30 -0400 (EDT)

Amazing, you were able to find multiple instances where a script-based
gender guesser was wrong?  This is more profound than the initial research
itself.  I suppose I could post a series of 10 writings where it was
correct, but what would that prove?  Did you try reading this from the
same page:


A few quick notes:

    * The system generates a simple estimate (profiling). While Gender
Guesser may be 60% - 70% accurate, it is not 100% accurate. This is
better than random guessing (50%), but should not be interpreted as
"fact". In particular, men should not be offended if it says you write
like a girl.

    * People write differently in different forums. For example, a single
writing sample may appear MALE for informal writing but test as FEMALE
for formal writing. Be sure to interpret the results based on the
appropriate writing style. (These notes, for example, are more
informal/blog than formal/non-fiction.)

    * Many factors can impact the interpretation from any single person's
writing. The content, knowledge of the material, age of the author,
nationality, experience, occupation, and education level can all
impact writing styles. For example, a woman who has spent 20 years
working in a male-dominated field may write like her co-workers.
Similarly, professional female writers (and experienced hobbyists)
frequently use male writing styles. Gender Guesser does not take any
of these factors into account.

    * Email can blur the lines between formal and informal writing styles.
An informal email from a manager may have traces of formality, and a
formal email from a 12-year-old is likely to be informal compared to a
letter from a 40-year-old. Do not be surprised if email messages sent
to public forums test incorrectly -- when writing for an audience,
people commonly use informal words, phrases, and slang within a formal
writing style.

    * Quotations, block quotes, and included text usually carries the
gender from the initial author. Be sure to remove quoted text from any
pasted content. Also, significant changes from a copy-editor can
result in a different gender analysis. (A male editor may make a
female author's news article appear MALE or as a Weak MALE.)

    * Lyrics, lists, poems, and prose are special writing styles. This
tool is unlikely to classify these texts correctly.

    * The system needs a paragraph or two of text in order to observe word
repetition. A good sample should have 300 words or more. Fewer words
can lead to more variation in accuracy, and a single sentence is
unlikely to generate an accurate result. Pasting the same text
multiple times will not change the results!

    * People tend to write with consistent styles. If the system
misclassifies a particular author, then other writings by the same
author will likely be misclassify the same way.

    * And most importantly: This is an ESTIMATE. Please do not email me
about instances where it made the wrong determination. (I've seen it
generate incorrect results lots of times already.)


I can't tell if you're trolling or you have actually taken the bait.  You
do realize the person that you were responding to in earlier posts is not
actually Neal Krawetz, right?

All female authors...  Your so called gender guessing mechanism is
flawed either way you want to cut it. You could try fuzzy math based on
theories to profile anyone on this list, but unless you have feasible
and PROVEN without reasonable doubt, its all a guessing game bottom
line. Anyhow back to security, sociolinguistics is not meant for this

According to Dr. Krawetz's Gender Guesser...
Genre: Informal
  Female = 104
  Male   = 602
  Difference = 498; 85.26%
  Verdict: MALE
Genre: Formal
  Female = 116
  Male   = 239
  Difference = 123; 67.32%
  Verdict: MALE


Genre: Informal
  Female = 442
  Male   = 555
  Difference = 113; 55.66%
  Verdict: Weak MALE
Genre: Formal
  Female = 364
  Male   = 570
  Difference = 206; 61.02%
  Verdict: MALE


Genre: Informal
  Female = 218v
  Male   = 1186
  Difference = 968; 84.47%
  Verdict: MALE
Genre: Formal
  Female = 414
  Male   = 576
  Difference = 162; 58.18%
  Verdict: Weak MALE


(text by Sue Lange)
Genre: Informal
  Female = 210
  Male   = 481
  Difference = 271; 69.6%
  Verdict: MALE
Genre: Formal
  Female = 260
  Male   = 408
  Difference = 148; 61.07%
  Verdict: MALE


Genre: Informal
  Female = 415
  Male   = 559
  Difference = 144; 57.39%
  Verdict: Weak MALE
Genre: Formal
  Female = 180
  Male   = 312
  Difference = 132; 63.41%
  Verdict: MALE


To be fair I had to go to the most feminine place I could think of, even
then it was iffy.

Genre: Informal
  Female = 226
  Male   = 337
  Difference = 111; 59.85%
  Verdict: Weak MALE
Genre: Formal
  Female = 326
  Male   = 314
  Difference = -12; 49.06%
  Verdict: Weak FEMALE


J. Oquendo
echo infiltrated.net|sed 's/^/sil@/g'

"Wise men talk because they have something to say;
fools, because they have to say something." -- Plato

Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

  By Date           By Thread  

Current thread:
[ Nmap | Sec Tools | Mailing Lists | Site News | About/Contact | Advertising | Privacy ]