A new spam tactic…
August 9th 2003
Here’s one I haven’t seen before. Of course it makes no sense to a computer, it looks like this to the computer: ________________________R___V_______U____P____E___I___A___N____R____A___A___T
___R____I____L___G_______E____C________R_______A____E________A_______L____S__
which is absolutely nonsensical. But it looks like this to us, who can read any letters in sequence…
______________________
__R___V_______U____P__
__E___I___A___N____R__
__A___A___T___R____I__
__L___G_______E____C__
______R_______A____E__
______A_______L____S__
Which is readable, of course.
How do we combat this? A simple program could parse for the text:
1) Find the character most used (spacer) in the block. In this case, "_"
2) Find out the width (number of spacers) between each character on each line. Example: on the first line, the space-array could be (2, 3, 7, 4 , 2)
3) Go through each line and find this array. Example: in the 2nd line of text, we find a new word starts.
4) Take the nth+1 character of each line, where n is the 1st index in the space-array. Remove n+1 characters from each line after the process is succesful.
5) The result is a new array, where each index is a word, and the words are in order, from left to right.
Simple. I could write it if I wanted… but I have to pack. Blah. ::







