Historical Interest Only

This is a static HTML version of an old Drupal site. The site is no longer maintained and could be deleted at any point. It is only here for historical interest.

Detecting Web Spam using Machine Learning

14 January 2010 - 3:10pm — Jano.van.Hemert

Student:

Andrejs Mironovs

Grade:

second1

Primary goal: to develop a classification algorithm to detect Web Spam.

Web Spam refers to a set of techniques that intend to increase the ranking of a page in a search engine. From search engine providers and Web users point of view, Web Spam decreases the quality of information search in the Web [1] [2] [3]. The Web Spam can be broadly classified into two types: content spam and link spam. It is a critical and challenging task to detect Web Spam. The success of Web Spam detection has a high commercial value for industries.

The goal of detecting Web Spam is to identify whether a given page or website is a spam or not. This is a typical classification problem in Machine learning Field.

This project will focus on developing a classification algorithm to detect Web Spam. It is expected to target one or more Web Spam types, which may be content spam and or link spam. The outcome of this project is a classification algorithm with a prototype. The dataset is from the WEbspam-uk2006 and 2007 [4] for training and testing.

Project status:

Finished

Degree level:

MSc

Background:

Machine learning, knowledge of Database, programming in Java or other languages

Supervisors @ NeSC:

Liangxiu.Han

Jano.van.Hemert

Subject areas:

Machine Learning/Neural Networks/Connectionist Computing

Student project type:

MSc student project

References:

* [1] Z.Gyongyi, H.Garcia-Molina and J.Pedersen. Combating Web Spam with Trust Rank, In VLDB 2004. * [2] L. Becchett, C. Castillo, D. Donato, R. Baeza-yates, S. Leonardi. Link Analysis for Web Spam Detection. ACM Transactions on the Web (TWEB), 2(1) (2008) 2.1-2.45 * [3] H. Najada and I. Himeidi. Web Spam detection using Machine Learning in Specific Domain Features. Journal of Information Assurance and Security. 3 (2008) 220-229 * [4] WEBSPAM-UK2007, http://barcelona.research.yahoo.net/webspam/datasets/uk2007/

Cookie Control

Main menu

Latest news

Pages

You are here

Historical Interest Only

Detecting Web Spam using Machine Learning

Cookie Control

Search form

Main menu

Latest news

Pages

You are here

Historical Interest Only

Detecting Web Spam using Machine Learning