AGENDA \| TRAINING \| ABOUT \| SPONSORS \| MEETUP \| BEER FESTIVAL \| VENUE \|

General Sessions | Track 1: Day 1 | Track 1: Day 2 | Track 2: Day 1 | Track 2: Day 2

Meetup

Join us on Thursday night for an exciting, informal event featuring lightning talks from different areas of Lucene and Solr search. Speakers and topics include:

Social Media Scheduler based on Solr + Hadoop + Amazon EC2, Pablo Aragón, Cierzo Development
Introduction to Collaborative Filtering using Mahout, Frank Scholten, Jteam
Enterprise Search meets Enterprise CMS - TYPO3 and Apache Solr, Olivier Dobberkau, d.k.d Internet Service
BM25 Scoring for Lucene - From Academia to Industry, Yuval Feinstein, Answers Corporation
How We Scaled Solr to 3+ Billion Documents, Jason Rutherglen, Director of Enterprise Search, Biz360

Social Media Scheduler based on Solr + Hadoop + Amazon EC2

Pablo Aragón Slides		Speaker bio: Pablo Aragón is the Systems Analyst of Cierzo Development for SMMART (Social Media Marketing Analysis Tool) Project. Pablo has adapted SMMART procceses to Hadoop, and he has also developed the features of the current release with technologies as Nutch, Lucene, Apache Solr on Amazon Web Services. He studied Computer Engineering degree (B.Sc IT + M.Sc IT) given by the University of Zaragoza (Spain) and 60 ECTS-credits in Computer Science at Luleå Tekniska Universitet (Sweden).

Introduction to Collaborative Filtering using Mahout

Frank Scholten Slides		Speaker bio: Frank Scholten is am a Java developer at JTeam with 3 years of experience. He has a Msc. in Computer Science from University of Twente, with a Software Engineering major. Frank has worked on e-commerce sites, web-based administrative systems and systems integration projects, mostly working with Spring, JPA/Hibernate and Wicket. Currently he is researching recommendation engines and Apache Mahout / Taste as part of the Enterprise Search group at JTeam.

Enterprise Search meets Enterprise CMS - TYPO3 and Apache Solr

Olivier Dobberkau Slides		TYPO3 is an Open Source Content Management System and Framework, very well suited for internet, intranet, and extranet applications. Due to its flexible plugin architecture TYPO3 offers a maximum in terms of possibilities. That makes it one of the most popular Open Source CMS worldwide. Apache Solr for TYPO3 is a TYPO3 extension for that provides an interface to index and search TYPO3 content with Solr. By integrating Solr in TYPO3 web site visitors can use improved search capabilities and functions. We will explain the steps we have taken to implement Apache Solr with the TYPO3 project. Which milestones are completed, and which still have to be tackled. We will share our experience on how we implemented Apache Solr with the Enterprise CMS TYPO3. The presentation will show some other implementations as well. Speaker bio: Olivier Dobberkau is born in 1968 in Vevey, Switzerland. He studied computer science, political science and law at the J.W.Goethe-University in Frankfurt am Main, Germany. During his studies he worked for American Express. In 1996 Olivier established d.k.d. and he is responsible for the following divisions: Software/ IT-Development and Projects, IT/Infrastructure and Services and IT-Consulting. He is an avid Opensource Evangelist and well known in the TYPO3 community as his alter-ego: The Reverend Neverend.

BM25 Scoring for Lucene: From Academia to Industry

Yuval Feinstein Slides		This presentation describes an implementation of the BM25 scoring algorithm under Lucene. A relevance problem in Answers.com’s Lucene-based similar question search algorithm led us to Joaquin Pérez Iglesias’ BM25 library for Lucene, which gave promising initial results. BM25 performs well in benchmarks and is part of Lucene’s competitors: Sphinx, Lemur, Xapian and Terrier. In order to use the library in a multi-million query environment, we productized the library: added unit-tests as scaffolding and refactored the library to be faster and more robust. We also fitted the library to work with Lucene 2.9.1. Submitting a patch to Apache Lucene revived a conversation about the Lucene scoring module and how to make it more flexible, hopefully aiming for a pluggable scoring model. As BM25 requires storing extra data to work, this means we need to extend Lucene’s indexing and retrieval to accommodate this (and other – maybe Language Models) methods. Speaker bio: Yuval Feinstein has been working as a software engineer at Answers Corporation, Jerusalem, Israel since 2006. He is part of the NLP and IR team, developing search application for the answers.com web site. The team’s products enable tens of millions of daily search queries in Answers’ user-generated Q&A and reference topics. Formerly, Yuval worked at Silicon Design Systems, building VLSI place-and-route (EDA) software. Yuval holds a M.Sc. in Computer Science from the Hebrew University in Jerusalem. He specialized in NLP, including parsing, text classification and summarization.

How We Scaled Solr to 3+ Billion Documents

Jason Rutherglen Slides		A high level overview of the Biz360 architecture which consists of a Hadoop based pipeline system on EC2 that processes 5+ million docs per day, performs natural language processing and machine learning, and then indexes the documents into Solr. How SOLR-1301 is implemented to rapidly index the 3 billion documents using Hadoop. How Solr Cloud with Zookeeper will help Biz360 attain improved uptime while enabling server failover. The challenges of running on EC2. The BloomFilter on a field and CommonGrams filter will also be discussed. Speaker bio: Jason Rutherglen is the Director of Enterprise Search at Biz360 and has spoken on the topic of Lucene realtime search at ApacheCon 2009 in Oakland and assisted in developing the NRT patch for Lucene 2.9.

Agenda & Session Information

    Agenda Overview
    General Sessions
    Track 1: Day 1 | Day 2
    Track 2: Day 1 | Day 2
    Training