I’ve wondered about Freenet before, and got quite interested at one point in trying to contribute to a solution to the ThinkCash problem. The basic idea is:
Given a distributed P2P architecture where two of the primary goals are anonymity and the indiscriminantly free flow of information, how do you develop a method of ensuring that computer-generated insertions of content into the network (or computer-generated requests for information) are minimized. That is, how do you ensure that the entity requesting or inserting information into the net is human, and not computer. Some further constraints:
- The test must be able to be generated by a computer, and the results judged by a computer.
- The test must be relatively easy for a human to solve.
- The test must be relatively difficult for a computer to solve.
After some careful pondering, I gave up and waited for this one to stew. It looks like a solution has emerged, at least in part:
Carnegie Mellon University’s CAPTCHA Project
Yahoo! Mail uses this technology to keep automated registration bots from creating Yahoo Mail accounts and using them to spam. The basic idea is this:
- Randomly select a word from a dictionary.
- Create an image of that word.
- Apply a filter and a background, making OCR exceedingly difficult.
- Ask the user to view the image, and type in the word that appears there.
Yahoo does an admirably good job of this, and further thwarts automated registration bots from defeating the validation method by not permitting multiple guesses. If you type in the wrong word, Yahoo simply returns another randomly generated word, similarly treated.
Several other problems that arose in the discussion over ThinkCash are also dealt with.
- The code for CAPTCHA is publicly viewable, so anyone can implement it. (I don’t know yet if it’s open source, but the philosophy behind CAPTCHA is that the code must be open, and thus not rely on obscurity to prevent breaking of the algorithm.)
- The concept does not rely on a specific language, so it can be internationalized. A user would have to indicate what language they wanted to take the test in, however.
- I think it’s relatively impossibly to defeat, without exceedingly good OCR (the type which, to my knowledge, does not yet exist.
So far, I can only find one big, gaping hole: The idea poses serious accessibility problems. Blind users, for example, are shit out of luck and won’t be able to gain access (accessibility, as a rule, typically relies on the machine-readability of content.)
Still, I do wonder. An audio version of this test would probably not work quite as well, but I’m not entirely sure. (Voice recognition technology seems to be quite a bit more evolved than OCR, to the point where many specialized systems need no training and are able to extract sensible commands from a high noise environment. And on the side of the tester, it’s markedly more difficult to generate human-sounding speech, in multiple languages, dynamically.)