Simple spam filter idea:
if (customentropy / #links*10 > (typical result for some that length post)) if (more than 1 such post in a hour) then hide all low entropy posts if viewed outside the low entropy posters AS# (https://github.com/psvz/tirexASN log IP to ASN).
This way to keep lot of spam threads visible they'd have to post from many many different AS sources.
Resulting entropy values:
- 7: "Hi there!"
- 10: "Hi there, bob!"
- -4: "ababababababababababababab"
- 25: "We're calculating entropy of a string ..."
(algo from the 6 upvote "just whipped this algorithm together, so I have no idea how good this is. I fear that it will cause an overflow exception if used on very long strings.")
edit Couple fixes ... & link determination
The message entropy should be calculated using set of words from prior accepted posts + dictionary of the language spoken on the forum. So it recognizes wrong language use as possible spam. Then for links you should look for valid TLD's in the post and count them as valid links with rule to except common phrases that contain valid TLD's (like " is " would not count as link but " de " would).
The entropy/links ratio should meet some threshold (stats needed) depending on the post lenght. So that the less text and more links increases the possibility of the post getting hidden if several such low entropy posts are made in that ~1 hour or whatever.