Back in the 1980’s, when I was a student, I was also an avid shortwave listener (SWL). This was in the days before the web, satellite TV or 24-hour international cable news coverage. I had an upper floor room in Cabot Hall, and each night I would surreptitiously dangle out the window a 40-foot wire antenna attached to a small weight.
At first I listened only to the big broadcasters like the BBC Word Service, Deutsche Welle, Radio Moscow, and then moved on to smaller ones: Tirana, Malta, South Africa, etc. It was a great way to get a global perspective beyond the 2-minutes allocated to international news on a typical US-based evening news program.
Eventually I started writing the broadcasters and received many QSL cards. Some of my letters were read on the air. I’m sure I ended up on some FBI watch list for those letters to Radio Prague and Radio Havana. My subscription to Soviet Life magazine, and a Cambridge address probably didn’t help either.
But you don’t go far as a SWL before you notice that there are a lot of strange things going on in the aether. Some were easily explained — the Soviet Union jamming broadcasts of Voice of America or Cuba jamming broadcasts of Radio Martí. And then there were the commercial voice broadcasts, ship-to-shore, international aviation, time signals, etc. Then the various data services, radio teletype, weather fax, etc. And then there were the mysterious coded transmissions, which we rumored to be SAC tranmissions, “Sky King, Sky King, Do not answer”, followed by various authentication codes, which were either recall or go ahead codes for nuclear attack. It was an eerie feeling, in the hotter days of the Cold War, to lay awake at night, listening to the radio and wondering whether the sun would rise in the morning. Now I just wonder if my 401(k) will still be there.
Stranger yet were the cryptic transmissions of the “numbers stations“, which would transmit on a semi-regular schedule and merely read off a large list of numbers for 10 minutes. For months I transcribed one particular woman’s transmissions, trying to find out the pattern. I did some computer analysis, but the numbers were random in frequency, with no discernible patterns. Presumably they were encoded against a one-time pad.
And then there were the “pirate” radio stations like “The Voice of the Purple Pumpkin”.
Although most people knew about the BBC World Service, I don’t think many appreciated that a large portion of the shortwave universe was strange, that the fringe was everywhere.
I’m starting to have a similar view of the web. Their are major content providers, minor content providers, even individual content providers like me. And then their is the weirdness, the strange corners of the web, the space between the channels, where you are not even sure you are listening to signal or noise.
Here are a few random examples of web sites with no discernible purpose. They appear to be garbled republications of new stories.
Let’s start with the “Wet Paint Body Notes” blog, newly created, with only three posts. One is called “Microsoft Gets Foot in Mass. Office Door“. It starts:
In what could be a coup inwardly favour of Microsoft (Nasdaq: MSFT) and a biff to the friendly wellspring league, the stipulate of Massachusetts personal added Microsoft’s Office Open XML norm to its document of give your declaration standards it will allow for elected representatives exploit.
This is a strange kind of English. It almost seems like a poor translation, or even a poor machine translation, of a document written in another language. But if you poke around a little, you find the this blog post is an unattributed garbled derivation of a 2007 article in Linux Insider. Not only was the original article in English, the reposted version truncates the article, posting only the first few paragraphs.
So what’s up with that? There are no banner ads or other obvious sources of revenue on the garbled version of the article. It is not a link farm. In fact it has no outgoing links. So why did someone bother?
Another example. The blog “75Software-News48” has an new article “Microsoft shows support for ODF“, posted just two weeks ago, with the intro:
Amid organization hassle surrounded by wish of interoperability, Microsoft (Nasdaq: MSFT) protected Thursday announced the discovery of the Open XML Translator Project. The overhang will fry in the air permitted software to allow Word, Excel and PowerPoint to knob documents in contrary technology format.
Again, this reads like it is a poor translation from another language. But look further and you can find that the original article is actually in English, from a 2006 TechNewsWorld article.
Again, no obvious intent here. It isn’t a link farm, and there is no evident source of revenue. It isn’t informative and it certainly isn’t timely. So why did they do it?
One more example this time a LiveJournal blog called “All Microsoft”, again newly created, with a post called “Ecma Approves MS Office Format, IBM Dissents“. It opens:
Microsoft’s (Nasdaq: MSFT) Open XML bureau software format, broad of via the tech giant to chase near the Open Document Format (ODF), cleared a standards hurdle this week, successful approbation from the Ecma global standards article.
Same modus operandi here. Original source, unattributed, is from a 2006 Linux Insider article.
I have dozens of examples of this kind of thing, all within the last couple of months, mainly articles about Microsoft and ODF. Something new is afoot. But what? Anyone have any idea of what this is and who benefits from it? If this just a contest between Blogger and LiveJournal to see who can claim the most hosted blogs? Or is it some SEO ploy? It has me stumped.
Isn’t this just an SEO tactic? At the end of these pages there is a telltale link, so at the end of the “knob documents in contrary technology format” one (hah) there is a link to “buy openpim”: the target site has “oem downloads” and Vista Ultimate is only $99 there. Hmmmmm.
It does look like the pages are auto-generated; maybe some software selects the text based on known terms which are traffic drivers (for a time ‘Microsoft’ and ‘ODF’ together would have been hot tickets here).
There might be an idea that these pages will directly drive traffic to the target sites (in which case, they need to improve their software); but in any case the inbound links should improve the rank of the targetted pages in search engines …
I’ve been getting a ton of these returned by Google Blogs Alert set with terms like “ODF” and “OOXML”. I’d guess that it’s only been over the last couple of weeks, if that.
One last night was a site with Corel’s name up in lights that had Amazon links to buy Corel programs. One product per page. The pages on the site each looked like they had begun as Corel pages and maybe run through a translator program to another human language and then back to English.
All of the others I have seen that I bothered to check have one page of the kind you describe but the other pages on the same sites looked like link farms intended to up the linked sites in web search engine rankings. I didn’t bother to follow any of the links to see where they led.
But this kind of thing is outnumbering legitimate blog hits returned by Google Blogs Alert for me, roughly 2-1.
Hmmm… I think I see the technique. The syntax of new articles is identical to the original text. But it appears that they are have a thesaurus and are doing a random substitution of synonyms. No attention is given to the grammatical context. So “handle” used as a verb is replaced by “knob”.
The net effect of this is that the article is not immediately traceable to the source document. Also, terms that are not in the dictionary, such as ‘ODF’, ‘Microsoft’ and proper names are not changed. Maybe they are trying to get hits from people searching on these proper names?
The first article I mention doesn’t seem to have any outgoing links, however. Maybe the idea is to set it up, wait a few weeks for the PageRank of the page to go up, then auction off outgoing links? But that will only work if the blog post has high PageRank links to it. And right now I don’t see anyone linking to these pseudo-posts. Uh… except me… Doh!
If meaningfully mutilated articles about Open Standards adoption hidden in the dark corners of the blogosphere are not yet being used as modern day numbers stations someone is going to have to start. Because that’s so much more exciting than some idiotic SEO tactic to sell pirated software.
Răzvan Sandu says
Happy to hear that someone did the same shortwave radio listening there, on the other coast of the Atlantic Ocean. I personally did it for many years here, in Romania… (I still do, from time to time) ;-)
As a person interested in the ODF/OOXML battle here, in Romania, I’ve noticed the same phenomenon here, about the Web materials written IN ROMANIAN on the OOXML matter.
Namely, the blogs of some proeminent Romanian free software advocates, that blogged frequently about ODF, OOXML and their situation here, got superceded (in Google rankings) by this type of “automatic translations” you’ve mentioned.
It seems to me like an effort to “bury” those blogs in a lot of “noise”, trying to make them less visible in search engines…
Christian Walde says
If you’re looking for actual noise on the internet, here’s a fine example: http://ricedoutyugo.com/
That is weird. But I at least can understand the motivation. Whether Dadaist art, the poetry of Gertrude Stein, the music of John Cage, or the teachings of the Church of the SubGenius, there are times when the unexpected is expected, the inappropriate is de rigueur.
But these Microsoft/ODF web sites are not nearly bad enough to be interesting from an artistic standpoint. They to make them truly bad.
And I’m not convinced it is a commercial motivation either. If you want to get links to sell illegal OEM copies of Vista, then you load up a page with words like “Brittany Spears” or “free clip art”. If your most distinguishing key words are Microsoft and ODF, then you are not going to get many clicks. So I don’t think the OEM software link explains this.
Similarly, when the police come across a body shot in the back, and notice his wallet is gone, they don’t automatically assume that means he was shot during a robbery. It could just as well be premeditated murder by someone who took the wallet to make it look like a robbery.
So I’m not sure we’re any closer to a solution. Is it: 1. Search engine optimization, 2.Art, 3. selling OEM software to people who happen to be searching for Microsoft and ODF, 4. an attempt to make searches for these keywords less effective by decreasing the signal/noise ratio, or 5. All of the above ?
I think its #4, intentional pollution.
Porn purveyors suffered this fate. Probably religiously-motivated folk polluted the Internet with phony sites so that Googling for an on-topic site is damned difficult. Try particularly searching for a particular niche, e.g. tentacle porn, and you’ll have to wade through a few hundred pollution sites deliberately designed as interference before you find the good stuff.
David Gerard says
It’s just blog spam. Word salad (generated by a Markov chain) is too obvious, so they grab any old text and run it through a translator and back again. The result is almost sensible, but, importantly, isn’t word-for-word identical to the original, so doesn’t get penalised in Google.
Appropriate response: report as spam blog.
Răzvan Sandu says
Here’s an actual example on which I fell upon – not about ODF, but about Linux:
“Shell” is translated in Romanian as “raft”, which means is not the CLI, but the piece of furniture in a store, holding and exposing the goods for sale. ;-)
The test is almost unreadable in Romanian: an attempt to do an automated translation of a HOWTO?
Dave Leigh says
I used to maintain the particular network that generated those “Sky King” messages. In fact, I helped to build two of the stations. Your rumor mill is not entirely inaccurate.