IR-1006: (2015) Dori-Hacohen, S. and Allan, J., "Automated Controversy Detection on the Web," Proceedings of the 37th European Conference on Information Retrieval (ECIR 2015), Vienna, Austria, March 29 - April 2, 2015, pp 423-434. [View bibtex]

Abstract

Alerting users about controversial search results can encourage critical literacy, promote healthy civic discourse and counteract the “filter bubble” effect, and therefore would be a useful feature in a search engine or browser extension. In order to implement such a feature, how- ever, the binary classification task of determining which topics or web- pages are controversial must be solved. Earlier work described a proof of concept using a supervised nearest neighbor classifier with access to an oracle of manually annotated Wikipedia articles. This paper generalizes and extends that concept by taking the human out of the loop, leveraging the rich metadata available in Wikipedia articles in a weakly-supervised classification approach. The new technique we present allows the nearest neighbor approach to be extended on a much larger scale and to other datasets. The results improve substantially over naive baselines and are nearly identical to the oracle-reliant approach by standard measures of F1, F0.5, and accuracy. Finally, we discuss implications of solving this problem as part of a broader subject of interest to the IR community, and suggest several avenues for further exploration in this exciting new space.

Browse the full CIIR Publications Database