Full Disclosure mailing list archives
Re: Squashing supposed hacker profiling
From: "Steven Adair" <steven () securityzone org>
Date: Tue, 19 Jun 2007 10:30:30 -0400 (EDT)
Amazing, you were able to find multiple instances where a script-based
gender guesser was wrong? This is more profound than the initial research
itself. I suppose I could post a series of 10 writings where it was
correct, but what would that prove? Did you try reading this from the
same page:
-----
A few quick notes:
* The system generates a simple estimate (profiling). While Gender
Guesser may be 60% - 70% accurate, it is not 100% accurate. This is
better than random guessing (50%), but should not be interpreted as
"fact". In particular, men should not be offended if it says you write
like a girl.
* People write differently in different forums. For example, a single
writing sample may appear MALE for informal writing but test as FEMALE
for formal writing. Be sure to interpret the results based on the
appropriate writing style. (These notes, for example, are more
informal/blog than formal/non-fiction.)
* Many factors can impact the interpretation from any single person's
writing. The content, knowledge of the material, age of the author,
nationality, experience, occupation, and education level can all
impact writing styles. For example, a woman who has spent 20 years
working in a male-dominated field may write like her co-workers.
Similarly, professional female writers (and experienced hobbyists)
frequently use male writing styles. Gender Guesser does not take any
of these factors into account.
* Email can blur the lines between formal and informal writing styles.
An informal email from a manager may have traces of formality, and a
formal email from a 12-year-old is likely to be informal compared to a
letter from a 40-year-old. Do not be surprised if email messages sent
to public forums test incorrectly -- when writing for an audience,
people commonly use informal words, phrases, and slang within a formal
writing style.
* Quotations, block quotes, and included text usually carries the
gender from the initial author. Be sure to remove quoted text from any
pasted content. Also, significant changes from a copy-editor can
result in a different gender analysis. (A male editor may make a
female author's news article appear MALE or as a Weak MALE.)
* Lyrics, lists, poems, and prose are special writing styles. This
tool is unlikely to classify these texts correctly.
* The system needs a paragraph or two of text in order to observe word
repetition. A good sample should have 300 words or more. Fewer words
can lead to more variation in accuracy, and a single sentence is
unlikely to generate an accurate result. Pasting the same text
multiple times will not change the results!
* People tend to write with consistent styles. If the system
misclassifies a particular author, then other writings by the same
author will likely be misclassify the same way.
* And most importantly: This is an ESTIMATE. Please do not email me
about instances where it made the wrong determination. (I've seen it
generate incorrect results lots of times already.)
----
I can't tell if you're trolling or you have actually taken the bait. You
do realize the person that you were responding to in earlier posts is not
actually Neal Krawetz, right?
All female authors... Your so called gender guessing mechanism is flawed either way you want to cut it. You could try fuzzy math based on theories to profile anyone on this list, but unless you have feasible and PROVEN without reasonable doubt, its all a guessing game bottom line. Anyhow back to security, sociolinguistics is not meant for this list. According to Dr. Krawetz's Gender Guesser... (http://www.hackerfactor.com/GenderGuesser.html#Analyze) http://girlygeekdom.blogspot.com/ Genre: Informal Female = 104 Male = 602 Difference = 498; 85.26% Verdict: MALE Genre: Formal Female = 116 Male = 239 Difference = 123; 67.32% Verdict: MALE REALITY: WRONG http://www.darkreading.com/blog.asp?blog_sectionid=342&WT.svl=blogger1_5 Genre: Informal Female = 442 Male = 555 Difference = 113; 55.66% Verdict: Weak MALE Genre: Formal Female = 364 Male = 570 Difference = 206; 61.02% Verdict: MALE REALITY: WRONG http://invisiblethings.org/papers/joanna-talk_description-CCC04.txt Genre: Informal Female = 218v Male = 1186 Difference = 968; 84.47% Verdict: MALE Genre: Formal Female = 414 Male = 576 Difference = 162; 58.18% Verdict: Weak MALE REALITY: WRONG http://www.techsploitation.com/2007/05/31/what-the-hell-was-i-thinking-about-green-libertarians/ (text by Sue Lange) Genre: Informal Female = 210 Male = 481 Difference = 271; 69.6% Verdict: MALE Genre: Formal Female = 260 Male = 408 Difference = 148; 61.07% Verdict: MALE REALITY: WRONG http://thelizardqueen.wordpress.com/2005/06/08/a-thoroughly-and-utterly-girly-blog-post-sorry-4/ Genre: Informal Female = 415 Male = 559 Difference = 144; 57.39% Verdict: Weak MALE Genre: Formal Female = 180 Male = 312 Difference = 132; 63.41% Verdict: MALE REALITY: WRONG To be fair I had to go to the most feminine place I could think of, even then it was iffy. http://groups.ivillage.com/motherdaughter/ Genre: Informal Female = 226 Male = 337 Difference = 111; 59.85% Verdict: Weak MALE Genre: Formal Female = 326 Male = 314 Difference = -12; 49.06% Verdict: Weak FEMALE REALITY: MAYBE THE AUTHOR HERE WAS FLAMINGLY GAY -- ==================================================== J. Oquendo http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1383A743 echo infiltrated.net|sed 's/^/sil@/g' "Wise men talk because they have something to say; fools, because they have to say something." -- Plato _______________________________________________ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/
_______________________________________________ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/
Current thread:
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? coderman (Jun 18)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? Dr. Neal Krawetz PhD (Jun 18)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? Michael Silk (Jun 18)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? StaticRez (Jun 18)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? Sam (Jun 18)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? scott (Jun 18)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? J. Oquendo (Jun 19)
- Squashing supposed hacker profiling J. Oquendo (Jun 19)
- Re: Squashing supposed hacker profiling Steven Adair (Jun 19)
- Re: Squashing supposed hacker profiling J. Oquendo (Jun 19)
- Re: Squashing supposed hacker profiling Valdis . Kletnieks (Jun 19)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? Michael Silk (Jun 18)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? Dr. Neal Krawetz PhD (Jun 18)
- <Possible follow-ups>
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? jt5944-27a (Jun 19)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? jt5944-27a (Jun 19)
