Friday, June 20, 2008

防作弊评分算法

Web 2.0 网站中最常见的功能之一就是用户投票,常见的可选分值范围为 [1,2,3,4,5],最终得分则等于用户评价的数值平均头。这种算法有个显而易见的漏洞,容易作弊,比如某内容共计一用户投了 5 分,则此项最终评分为 5。

理想的效果是评价数越少,得分越低,最简单的解决办法是 bayesian average。贝叶斯平均使得单项评分更往平均评分靠近,对评价总数低的条目更加明显。

    b(r) = [ W(a) * a + W(r) * r ] / (W(a) + W(r)]

    r = average rating for an item
    W(r) = weight of that rating, which is the number of ratings
    a = average rating for your collection
    W(a) = weight of that average, which is an arbitrary number, but should be higher if you generally expect to have more ratings for your items; 100 is used here, for a database which expects many ratings per item
    b(r) = new bayesian rating 
    b(r) = [100 * 6.50 + 3 * 10] / (100 + 3)
    b(r) = 680 / 103
    b(r) = 6.60

当 W(a) 足够大的时候,少量的作弊无法明显改变结果。

No comments:

Post a Comment