Web 2.0 网站中最常见的功能之一就是用户投票,常见的可选分值范围为 [1,2,3,4,5],最终得分则等于用户评价的数值平均头。这种算法有个显而易见的漏洞,容易作弊,比如某内容共计一用户投了 5 分,则此项最终评分为 5。
理想的效果是评价数越少,得分越低,最简单的解决办法是 bayesian average。贝叶斯平均使得单项评分更往平均评分靠近,对评价总数低的条目更加明显。
b(r) = [ W(a) * a + W(r) * r ] / (W(a) + W(r)] r = average rating for an item W(r) = weight of that rating, which is the number of ratings a = average rating for your collection W(a) = weight of that average, which is an arbitrary number, but should be higher if you generally expect to have more ratings for your items; 100 is used here, for a database which expects many ratings per item b(r) = new bayesian rating
b(r) = [100 * 6.50 + 3 * 10] / (100 + 3) b(r) = 680 / 103 b(r) = 6.60
当 W(a) 足够大的时候,少量的作弊无法明显改变结果。
No comments:
Post a Comment