We have integrated several new recommendation algorithms into Mr. DLib. Some algorithms are only ought as baselines for our researchers, others hopefully will further increase the effectiveness of Mr. DLib.

Overall, Mr. DLib now uses the following algorithms:

Random

The approach randomly picks the set of documents to recommend to the user. We experiment with this approach by randomly choosing to apply a language filter 50% of the time. With the language filter, the rec-
ommended documents share the same language as the input document.

Lucene’s More Like This

This is one of the most commonly applied recommendation aproaches for content-based filtering. The approach concatenates and tokenizes a document’s title, abstract, keywords, and journal name using Apache Lucene’s out-of-the-box Standard Tokenizer. The tokens are then indexed, and recommendations are made using Lucene’s More Like This feature.

Stereotype Recommendations

Stereotyping uses a primitive user modeling strategy with fixed recommendation classes. Users are classified, or stereotyped into generic groups and each group is assigned the same set of recommendations that we hand-picked. For Mr. DLib’s recommender system, we assume that all users are researchers or students; hence, we hand-picked a number of documents relating to academic writing, peer review, and research methods.

Most Popular

This recommendation approach recommends the most viewed and exported documents of the past months.

Keyphrase Content-Based Filtering

This is an advanced approach which is an adaptation of the Key-phrase based approach used by Ferrera et al. (2011) Whereas the original approach requires the full text of a paper to build acceptable key-phrases,
we adapted the approach to do so even with only the title of the paper as input.

Ferrara, F., Pudota, N., and Tasso, C. A Keyphrase-Based Paper Recommender System. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011, pp. 14–25.

Leave a comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.