Web authors: defend against email address harvesters

Mon, 2007-10-15 11:35 by admin · Forum/category:

This article is for web authors. It would even apply to postings on a Content Management System like this, but for security reasons it is not allowed to enter JavaScript code here.

Never put real email addresses in web sites. The reason is that web email address harvesters will find them and use them to send spam to the address.

Here are a few JavaScript-based ways to obfuscate email addresses such that they remain clickable.

Method 1: Use a character other than @

Example using the copyright © character:

name©domain.com

Code to achieve this:

<a href="mailto:name©domain.com" onclick="var obfuscated = this.getAttribute('href'); if (obfuscated.indexOf('©') > 0) { this.href = obfuscated.split('©')[0] + '@' + obfuscated.split('©')[1]; }">name©domain.com</a>

Method 2: Insert an uncommon obfuscation string

Example using xxxremovexxx. Don't use RemoveThis or DeleteThis or upper case markers, as these may be recognized by address harvesters:

email

Code to achieve this:

<a href="mailto:namexxxremovexxx@domain.com" onclick="var obfuscated = this.getAttribute('href'); var obfuscatorString = 'xxxremovexxx'; if (obfuscated.indexOf(obfuscatorString) > 0) { this.href = obfuscated.split(obfuscatorString)[0] + obfuscated.split(obfuscatorString)[1]; }">email</a>

Notes:

  1. Of course you have to replace name and domain.com with your desired email address.
  2. The entire <a ...> tag must not contain any line break. (In fact, line breaks are often harmless, but if you don't want to dig into the details, it ie easiest to avoid them altogether.)
  3. After copying the example, replace the email address with the real one and obfuscate it, i.e. replace the @ sign or insert the obfuscator string. The obfuscator string can be put anywhere in the email address on either side of the @ sign.
  4. In method 1, instead of the replacement character © you can use any other similar-looking character, such as: ® Ø ø ¤, or you can even use multi-character combinations, such as () or (a). Don't use anything with "at", because that is already too widely used and may be recognized automatically.
  5. The content of the <a ...> tag, i.e. the text before </a>, can be a readable, but obfuscated email address or simply any other text, like, "email". This is what the reader sees as the mail link.
  6. After the reader clicked the link once, it will appear de-obfuscated during the rest of the user's session, i.e. the proper email address will appear.
  7. The few readers without functioning JavaScript will get the obfuscated email address and will have to correct it by hand.

Method 3: Yet another obfuscator

You can enter an email addres on this projecthoneypot.org page to have it obfuscated in an even more complex way, also using JavaScript.

However, this method differs in two ways from the above method, because it de-obfuscates the link already when the page is loaded, rather than later, when the user clicks on the link. This has two consequences:

  1. Disadvantage: If an intelligent harvester used a browser to render the page, then read the resulting generated code, it would pick up the de-obfuscated address. It stands to reason that at this time very few or no harvesters go to this length, but if obfuscated mail addresses gained wider adoption, that might change
  2. Advantage: The web page shows the de-obfuscated mail address, so even if a user copies and pastes it, it would work.

Technical background:

When the user clicks on the link with the first methods, the JavaScript code is activated through the onclick attribute. In the last method the JavaScript code is activated already just after the link has been loaded.

The little JavaScript program takes the href attribute and removes or replaces the obfuscation. The important point is that the final, correct email address doesn't appear anywhere in the original page code and not even in the JavaScript code, because otherwise it could be found through the simple email address harvesting method of downloading the raw HTML page and scanning its code for mail addresses.