http://herbsutter.spaces.live.com/Blog/cns!2D4327CC297151BB!917.entry
Herb Sutter, the C++ guru, finally left Live Space due to the comment spam out of control.
I like Live Spaces, and I really
hate moving a blog, but unfortunately the comment spam is out of
control and I just can’t keep up with the tools available to manage it
here — other than accepting a blog with no comments at all, which I’m
unwilling to do. Your comments are too valuable to give up.
I have researched on the blog comment spams for a long time, since my blog was spammed at the first time (in Chinese). Also I created a semi-automatic solution (in Chinese) to delete the spams. I believe not only me is suffering the blog comment spams, so I was thinking to make the solution totally automatic, and create a tool to clean blog comment spams for us spammer haters.
However, even I created the tool, that does not means our bloggers should take the responsibility to do anti-spam tasks. Anti-spam is the business of the service provider. Live Space SHOULD improve its comment system, or it will become Live Spam and more and more people will leave.
今天å‘现了akismet这个网站。
通过REST架构æä¾›API,å…许blog软件验è¯æ¯ä¸ªComment是ä¸æ˜¯spam,然åŽå†³å®šæ˜¯å¦å°†å…¶å‘布到blog上é¢ã€‚WordPresså·²ç»å®žçŽ°äº†å¯¹åº”æ’件。
We can’t stand spam.
Who can? You have better things to do with your life than deal with
the underbelly of the internet. Automattic Kismet (Akismet for short)
is a collaborative effort to make comment and trackback spam a
non-issue and restore innocence to blogging, so you never have to worry about spam again.
Akismet is free for personal use.
In the spirit of helping the blogosphere as much as possible, we’ve
decided to make Akismet free for as many people as possible. We have free API keys available for your personal blog.
最近工作比较忙,所以åªèƒ½ä¸‹äº†çæžæžè¿™ä¸ªä¸œè¥¿ã€‚今天终于æžå®šäº†ï¼Œç„¶åŽçœ‹ç€æ•°å个Spam Comment瞬间æžå®šè¿˜æ˜¯å¾ˆæœ‰æˆå°±æ„Ÿçš„。
这个问题的关键在两个问题上:第一,如何得到一个Spam Commentçš„åˆ—è¡¨ï¼›ç¬¬äºŒï¼Œå¦‚ä½•åˆ é™¤å®ƒä»¬ã€‚
先考虑第二个问题。其实ä¸ç®¡Space的代ç 多å¤æ‚,最åŽå¯¹Blogçš„æ“作还是è¦ç»è¿‡HTTPä¼ è¾“çš„ã€‚æˆªèŽ·HTTPä¼ è¾“æ•°æ®çš„方案有很多,我用的是TamperData这个Firefox扩展。简å•çš„找两个Commentsåˆ é™¤ä¸€ä¸‹ï¼Œå‘现对应的HTTPè¯·æ±‚æ˜¯è¿™æ ·çš„ï¼š
POST http://ftofficer.spaces.live.com/parts/blog/script/BlogService.fpp?cnmn=Microsoft.Spaces.Web.Parts.BlogPart.FireAnt.BlogService.delete_items&ptid=0&a=&au=undefined HTTP/1.1
Host: ftofficer.spaces.live.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: zh-cn,zh;q=0.7,en-us;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: UTF-8,*
Connection: close
Content-Type: application/x-www-form-urlencoded
X-FPP-Command: 0
sc: [...]
Referer: [...]
Content-Length: 117
Cookie: [....]
Pragma: no-cache
Cache-Control: no-cache
cn=Microsoft.Spaces.Web.Parts.BlogPart.FireAnt.BlogService&mn=delete_items&d=[{1,%22cns!423B72634E2F6B7E!611%22}]&v2
å…¶ä¸æœ€é‡è¦çš„ä¸œè¥¿å°±æ˜¯çº¢è‰²æ ‡è®°çš„éƒ¨åˆ†ï¼Œå‰é¢Cookieæ˜¯ä½œä¸ºä½ å·²ç»ç™»å½•åˆ°Live Spaceçš„è¯æ®ï¼›åŽé¢çš„一串å—符串就是è¦åˆ 除的Commentçš„ID。所以我们需è¦çš„工作就是获å–到这个ID,然åŽä½¿ç”¨HTTPé‡æ”¾å°±å¯ä»¥äº†ã€‚最åŽæˆ‘直接å·æ‡’用了NetCat,写了一个Request的模æ¿ï¼Œç”¨Perl替æ¢ä¸€ä¸‹å…¶ä¸çš„ID,用NC连上Live Space然åŽé‡æ”¾å°±æ˜¯äº†ã€‚æ–¹æ³•å°±æ˜¯è¿™æ ·ï¼Œå¦‚æžœè¦åšæˆè½¯ä»¶å¯èƒ½è¿˜æ¯”较麻烦一点,看看有空å†æžï¼Œæœ€è¿‘实在太忙。
然åŽå›žåˆ°ç¬¬ä¸€ä¸ªé—®é¢˜ï¼Œæˆ‘们需è¦èŽ·å–到所有的Spam Comments的列表,这个问题ä¸å¯é¿å…çš„å°±è¦æŠŠSpace爬一é,枚举当ä¸æ‰€æœ‰çš„Comments的内容,并一一判æ–。如果符åˆæŸç§æ¡ä»¶ï¼Œå°±å°†å…¶ID记录下æ¥ã€‚ä¸è¿‡è¦ä»Žå¤´åˆ†æžSpace的页é¢ç»“构并且枚举出ID和内容也是个很好大的工程,ä¸å¦‚找找有没有现æˆçš„方案。ç»è¿‡ä¸€ç•ªGoogle,找到了Live Space Mover这个项目。这个项目当ä¸åŒ…括了枚举Live Space当ä¸æ‰€æœ‰çš„Comments的功能,功能ä¸é”™ã€‚但是ä¸è¶³çš„是åªèƒ½èŽ·å–到的是Comments的内容,å‘é€è€…这些信æ¯ï¼Œä¸åŒ…括对我æ¥è¯´æœ€é‡è¦çš„ID,ä¸è¿‡è¿™ä¸æ˜¯ä»€ä¹ˆå¤§é—®é¢˜ï¼ŒPython的脚本在哪里放ç€ï¼Œä»£ç 改改就行了。这段代ç 改过之åŽï¼ŒèŠ±10分钟把Blog爬了一é,æˆåŠŸç”Ÿæˆåˆ—表,然åŽæ‹¿perl解æžä¸€ä¸‹ï¼Œè°ƒç”¨ä¸€ä¸‹nc,æžå®šã€‚
方案是有了,有空整ç†ä¸€ä¸‹ï¼Œåšä¸€ä¸ªç‚¹ç‚¹é¼ æ ‡å°±èƒ½æžå®šçš„东西,或者一个命令行æžå®šçš„东西。
å†è¯´å†è¯´ï¼Œå¿™å¾—很。有没有人自愿报å实践一下的?
今天起æ¥åˆ°å…¬å¸æŸ¥çœ‹GMail邮件,å“了一跳,整页都是æ¥è‡ªä¸‹åˆ—æ供商的Live Space Notify,我当时就很奇怪,我这个平时没什么访问é‡çš„blog今天怎么了。点上去看看å‘现,竟然åˆæ˜¯Spam。
åˆ äº†å‡ ä¸ªè§‰å¾—ä¸å¯¹ï¼Œè¿™ä¸ªSpammer似乎把我这个空间上é¢çš„所有帖å全部spam了一é,显然是用脚本的。这手动的怎么å¯èƒ½æžå¾—è¿‡è‡ªåŠ¨çš„ã€‚äºŽæ˜¯åˆ é™¤äº†æœ€è¿‘çš„ä¸€äº›ä¹‹åŽï¼Œå…ˆæŠŠè€çš„ä¸ç®¡äº†ï¼Œæ‰¾æ‰¾æœ‰æ²¡æœ‰å¯ä»¥è‡ªåŠ¨åŒ–的方案——显然有,毕竟Spammer就是用脚本的。
考虑到上次的spam事件其实没过去多久,所以å¯ä»¥é¢„è§ä¸è¿œçš„å°†æ¥åˆæœ‰å¦ä¸€ä¸ªSpammer盯上我的å¯èƒ½æ€§å¾ˆå¤§ã€‚所以干脆这次把当时基于æµè§ˆå™¨çš„想法åšäº†å§ï¼Œè‡³å°‘让我将æ¥åˆ 这些东西也方便一点。
å…ˆç ”ç©¶ä¸€ä¸‹ã€‚
上é¢å‘的这个关于blog comment spamçš„æ–‡ç« è¢«åŒæ ·çš„人å†æ¬¡spam,而且æ£å¥½è¢«Ace的两个回å¤å¤¹äº†ä¸‰æ˜Žæ²»ã€‚实在让我éƒé—·äº†ä¸€æŠŠã€‚
既然我ä¸èƒ½å¯¹live spaceåšä»€ä¹ˆè®©å®ƒä¸æŽ¥æ”¶spam comment,那么我åªèƒ½è®©æˆ‘自己眼ä¸è§ä¸ºå‡€äº†ã€‚既然必须è¦æˆ‘自己访问自己的blog的时候æ‰ä¼šå‘现自己的blog是ä¸æ˜¯è¢«spamäº†å¹¶ä¸”ä¼šæ‰‹åŠ¨åˆ é™¤ä¸Šé¢çš„spam comment,那么我就把这个过程自动化好了。
基于GreaseMonkey好了。
å…ˆæ个架å,然åŽå‘里é¢å¡«ç®—法æ¥åˆ¤æ–这个comment是ä¸æ˜¯spam,目å‰çš„è¯ï¼Œå…ˆåŸºäºŽå…³é”®å—好了,看到那个å«åšbouboçš„é“¾æŽ¥å°±åˆ æŽ‰ã€‚
昨天的日志的一个回å¤ï¼š
è¿™ç§blog spam真的是éžå¸¸ä¸çˆ½çš„一件事情。
而且,ä¸åªæ˜¯æˆ‘,更多的其他人也在é‡åˆ°è¿™ç§é—®é¢˜ã€‚
在这个Web 2.0的时代里é¢ï¼ŒAnti-spam的工作,ç»ä¸ä»…仅是mail了。