When you are using a search engine like Google or Yahoo, you reveal what you are searching for by typing in the search terms. In most cases you do not worry about it, but perhaps if you would live in a country where your traffic is spied and where the political regime does not like what you are searching for, you might run into trouble.
The research that we performed under the title Privacy in resource discovery tries to deal with this problem. The approach is that the person that searches does not reveal enough information to know what (s)he exactly is searching for but enough to limit the amount of potential results into reasonable proportions.
For example, if somebody want to find all documents containing information about human rights, instead of typing in the whole query ('human rights'), the person could type 'hum', which by substring matching matches with the desired documents but also with resources that are irrelevant for the user, like 'hummer' or 'humbug'.
The main question is how long the substring should be to find the right amount of 'false positives' (ie. the returned resources that after in the end do not match) which should not be too large due to efficiency reasons and not too small due to privacy reasons. In this research we analyzed a couple of algorithms to determine the length of the strings on a large dataset containing websites.
To achieve scalability at a low cost, many researchers have turned to a peer-to-peer paradigm, leading to the development of a multitude of protocols and algorithms being developed, with implementations still lagging behind. In this work we consider the privacy implications of peer-to-peer discovery systems and propose a framework for discovery of private resources. Furthermore, we propose and evaluate an architecture and a series of methods using distributed hash tables. Finally, we provide an implementation for the OpenKnowledge kernel [1].
Related publications: