Software Project "Personal Search Engine"

Check ProjectWiki regularly, all info will be posted there!

Useful links

Participants

  • Przemyaslaw Rys. Role: 1st developer. Responsibility: developing of indexing parts, developers guide.
  • Dianjun Liu. Role: 2nd developer. Responsibility: developing searching part and user interface, user manual.
  • Michal Kopycki. Role: marketer and spokesperson. Responsibility: market analysis, requirement specification, customer interaction, responsible for presentation of final result.
  • Paul-Gabriel Müller. Role: quality analyst. Responsibility: IR consulting, testing lead.
  • Yasmin Al-Iriani. Role: manager. Responsibility: progress monitoring, project planning support, responsibility for the final documentation pack

General project description


    Many of you encountered a problem when you have hundreds and thousands of useful documents, but unable to find them when they are needed. This is the right task for the personal search engine, which can find information on your computer based on keyword query. Current project aims at developing an application for indexing and searching on the collection of personal documents.

    The Lucene is a popular java library for building search applications, we will use it for creation of the personal search engine. The main advantage of this library is the existence of the book "Lucene in action", which can be used both as a tutorial and a reference manual. We do not invent another set of classes, but rather re-use components provided by the top class developers. The book will be available in the lab, but you also welcome to buy one.

    The task is flexible and its complexity depends on your qualification and eagerness to know more about information retrieval. The simplest setup assumes command-line interface and basic keyword search functionality. If you want to do better than that, we will build search interface with advanced search features.

    The slides with project proposal [ppt].

Tentative schedule

    First Iteration:

  • 1st week. IR basics, introduction into Lucene.
  • 2nd week. Requirements analysis, desktop search engines survey.
  • 3rd week. Requirements specification, dataset preparation, test case design.
  • 4th week. First implementation.
  • 5th week. Testing and debugging.
  • 6th week. Documentation preparation.
  • Second Iteration:

  • 7th week. Refined requirements analysis.
  • 8th week. Refined requirements specification, dataset preparation, test case design.
  • 9th week. Second implementation.
  • 10th week. Testing and debugging.
  • 11th week. Final tuning and documentation.
  • 12th week. Final tuning and documentation.
  • Final presentation. Public presentation of project results.

Literature


    1. "Lucene in Action" by Erik Hatcher and Otis Gospodnetic

    2. And any of these:
  • 2.a "Modern Information Retrieval" by Ricardo Baeza-Yates, Berthier Ribeiro-Neto
  • 2.b "Managing Gigabytes: Compressing and Indexing Documents and Images" by Ian H. Witten, Alistair Moffat, Timothy C. Bell
  • 2.c "Mining the Web: Analysis of Hypertext and Semi Structured Data" by Soumen Chakrabarti

Contact


    M.Sc. Sergey Chernov, email is here email
Last update 02.11.2005