scenic view

Welcome to the Northwest

Database Society (NWDS)


Sponsored by: Yahoo!


Mission Statement

The goal of NWDS is to bring together researchers and practitioners in the field of databases and data management systems working in the Pacific North-West.

One of our main activities is a talk series with a variety of distinguished speakers from academia and industry. These talks are also part of the Yahoo! Database Talk Series, sponsored by Yahoo!


Upcoming Talks

Speaker: Ricardo Baeza-Yates, Yahoo! Research

Title: Towards a Distributed Search Engine

Where: University of Washington, Seattle.
Computer Science and Engineering Department.
Paul Allen Center, Database Lab, CSE 405.

When: Friday, May 14, 2012, 3.30pm-4:30pm.

Abstract:
In the ocean of Web data, Web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters. Web data, however, is always evolving. The number of Web sites continues to grow rapidly (180 millions of actives Web servers in February of 2012) and hundreds of billions of potential indexed pages. On the other hand, Internet users are above one billion and hundreds of million of queries are issued each day. In the near future, centralized systems are likely to become less effective against such a data-query load, thus suggesting the need of fully distributed search engines. Such engines need to maintain high quality answers, fast response time, high query throughput, high availability and scalability; in spite of network latency and scattered data. In this talk we present the main challenges behind the design of a distributed Web retrieval system and our research in all the components of a search engine: crawling, indexing, and query processing, showing that such an engine is feasible.

Bio:

Ricardo Baeza-Yates is VP of Yahoo! Research for Europe, Middle East and Latin America, leading the labs at Barcelona, Spain and Santiago, Chile, since 2006, as well as supervising the lab in Haifa, Israel since 2008. He is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra in Barcelona, Spain, since 2005. Until 2005 he was Professor and Director of the Center for Web Research at the Department of Computer Science of the Engineering School of the University of Chile. He obtained a Ph.D. from the University of Waterloo, Canada, in 1989. Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electrical engineering degree from the University of Chile, Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, as well as co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 300 other publications. He has received the Organization of American States award for young researchers in exact sciences (1993) and the CLEI Latin American distinction for contributions to CS in the region (2009). In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences. During 2007 he was awarded the Graham Medal for innovation in computing, given by the University of Waterloo to distinguished ex-alumni. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.

Speaker schedule here.


Speaker: Christopher Re, University of Wisconsin

Title: Statistical Data-Analysis in an RDBMS Almost for Free

Where: University of Washington, Seattle.
Computer Science and Engineering Department.
Paul Allen Center, Database Lab, CSE 405.

When: Friday, April 13, 2012, 3.30pm-4:30pm.

Abstract:
The main question driving my research is: how does one deploy statistical data-analysis tools to enhance data-driven systems? Our goal is to find abstractions that one needs to deploy and maintain such systems. In this talk, I describe my group's attack on this question by building a diverse set of statistical-based data-driven applications: a system whose goal is to read the Web and answer complex questions, a muon detector in collaboration with a neutrino telescope called IceCube, and a social-science applications involving rich content (OCR and speech data). Even in this diverse set, we have found common abstractions that we are exploiting to build systems. In the technical portion of the talk, I discuss one such abstraction that we found attempting to answer the question: how can we bring sophisticated data-analysis tools to data that lives in an RDBMS? My technical message is that the algorithmic problems underlying many statistical data analysis techniques can be solved with a classical algorithm called incremental gradient descent that is no more difficult to compute than a SQL AVG. To demonstrate our point, we have implemented this method on top of a handful of commercial and open-source databases. Our approach is often faster than special-purpose tools and avoids a messy export-reimport cycle. Papers, software, virtual machines containing installations of our software with data, and links to applications that are discussed in this talk are available from http://www.cs.wisc.edu/hazy.

Bio:
Christopher (Chris) Ré is an assistant professor in the department of Computer Sciences at the University of Wisconsin-Madison. The goal of his work is to enable users and developers to build applications that more deeply understand and exploit data. Chris received his PhD from the University of Washington, Seattle under the supervision of Dan Suciu. For his PhD work in the area of probabilistic data management, Chris received the SIGMOD 2010 Jim Gray Dissertation Award. Chris's papers have received four best papers or best-of-conference citations (best paper in PODS 2012 and best-of-conference in PODS 2010, twice, and one in ICDE 2009). Chris received an NSF CAREER Award in 2011 and was recently granted his first patent.


Speaker: Chris Lintott

Title: Infrastructure for 600,000 scientists

Where: University of Washington, Seattle.
Computer Science and Engineering Department.
Paul Allen Center, Database Lab, CSE 405.

When: Monday, April 16, 2012, 3.30pm-4:30pm.

Abstract:
Zooniverse.org hosts a large collection of 'citizen science' projects which provide the hundreds of thousands of registered users with authentic opportunities to engage in the process of research, whether by classifying galaxies, transcribing papyri or listening to whale calls. Project lead Chris Lintott will describe the design and infrastructure behind supporting science at web scale, with a particular focus on the tools needed to allow communities of volunteers to make serendipitous discoveries and lead their own research.

Bio:
Christopher Lintott is currently serving as the Director of Citizen Science at the Adler Planetarium. He is a post-doctoral researcher who is involved in a number of popular science projects aimed at bringing astronomical science to a wider audience. He is the co-presenter of Patrick Moore's BBC series The Sky at Night and a co-author of the book Bang! - The Complete History of the Universe with Patrick Moore and Queen guitarist Brian May.


Past Talks

Listed in reverse chronological order. Click here for Abstracts


Mailing List

Please sign up for the nwds mailing list here. We use this list primarily to send announcements for upcoming events. After you register, you can send mail to that list at nwds at cs...

To become a member, please contact Magda


History

The North-West Database Society was founded on January 1st 2006 by Dan Suciu and Magdalena Balazinska. It is inspired by the New-England Database Society.