![]() |
Welcome to the NorthwestDatabase Society (NWDS)Sponsored by:
|
The goal of NWDS is to bring together researchers and practitioners in the field of databases and data management systems working in the Pacific North-West.
One of our main activities is a talk series with a variety of distinguished speakers from academia and industry. These talks are also part of the Yahoo! Database Talk Series, sponsored by Yahoo!
Speaker: Ricardo Baeza-Yates, Yahoo! Research
Title: Towards a Distributed Search Engine
Where:
University of Washington, Seattle.
Computer Science and Engineering Department.
Paul Allen Center, Database Lab, CSE 405.
When: Friday, May 14, 2012, 3.30pm-4:30pm.
Abstract:
In the ocean of Web data, Web search engines are the primary way to
access content. As the data is on the order of petabytes, current
search engines are very large centralized systems based on replicated
clusters. Web data, however, is always evolving. The number of Web
sites continues to grow rapidly (180 millions of actives Web servers
in February of 2012) and hundreds of billions of potential indexed
pages. On the other hand, Internet users are above one billion and
hundreds of million of queries are issued each day. In the near
future, centralized systems are likely to become less effective
against such a data-query load, thus suggesting the need of fully
distributed search engines.
Such engines need to maintain high quality answers, fast response
time, high query throughput, high availability and scalability; in
spite of network latency and scattered data. In this talk we present
the main challenges behind the design of a distributed Web retrieval
system and our research in all the components of a search engine:
crawling, indexing, and query processing, showing that such an engine
is feasible.
Bio:
Ricardo Baeza-Yates is VP of Yahoo! Research for Europe, Middle East
and Latin America, leading the labs at Barcelona, Spain and Santiago,
Chile, since 2006, as well as supervising the lab in Haifa, Israel
since 2008. He is also part time Professor at the Dept. of Information
and Communication Technologies of the Universitat Pompeu Fabra in
Barcelona, Spain, since 2005. Until 2005 he was Professor and Director
of the Center for Web Research at the Department of Computer Science
of the Engineering School of the University of Chile. He obtained a
Ph.D. from the University of Waterloo, Canada, in 1989. Before he
obtained two masters (M.Sc. CS & M.Eng. EE) and the electrical
engineering degree from the University of Chile, Santiago. He is
co-author of the best-seller Modern Information Retrieval textbook,
published in 1999 by Addison-Wesley with a second enlarged edition in
2011, as well as co-author of the 2nd edition of the Handbook of
Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of
Information Retrieval: Algorithms and Data Structures, Prentice-Hall,
1992, among more than 300 other publications. He has received the
Organization of American States award for young researchers in exact
sciences (1993) and the CLEI Latin American distinction for
contributions to CS in the region (2009). In 2003 he was the first
computer scientist to be elected to the Chilean Academy of Sciences.
During 2007 he was awarded the Graham Medal for innovation in
computing, given by the University of Waterloo to distinguished
ex-alumni. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.
Speaker
schedule here.
Speaker: Christopher Re, University of Wisconsin
Title: Statistical Data-Analysis in an RDBMS Almost for Free
Where:
University of Washington, Seattle.
Computer Science and Engineering Department.
Paul Allen Center, Database Lab, CSE 405.
When: Friday, April 13, 2012, 3.30pm-4:30pm.
Abstract:
The main question driving my research is: how does one deploy
statistical data-analysis tools to enhance data-driven systems? Our
goal is to find abstractions that one needs to deploy and maintain
such systems. In this talk, I describe my group's attack on this
question by building a diverse set of statistical-based data-driven
applications: a system whose goal is to read the Web and answer
complex questions, a muon detector in collaboration with a neutrino
telescope called IceCube, and a social-science applications involving
rich content (OCR and speech data). Even in this diverse set, we have
found common abstractions that we are exploiting to build systems.
In the technical portion of the talk, I discuss one such abstraction
that we found attempting to answer the question: how can we bring
sophisticated data-analysis tools to data that lives in an RDBMS? My
technical message is that the algorithmic problems underlying many
statistical data analysis techniques can be solved with a classical
algorithm called incremental gradient descent that is no more
difficult to compute than a SQL AVG. To demonstrate our point, we have
implemented this method on top of a handful of commercial and
open-source databases. Our approach is often faster than
special-purpose tools and avoids a messy export-reimport cycle.
Papers, software, virtual machines containing installations of our
software with data, and links to applications that are discussed in
this talk are available
from http://www.cs.wisc.edu/hazy.
Bio:
Christopher (Chris) Ré is an assistant professor in the department of
Computer Sciences at the University of Wisconsin-Madison. The goal of
his work is to enable users and developers to build applications that
more deeply understand and exploit data. Chris received his PhD from
the University of Washington, Seattle under the supervision of Dan
Suciu. For his PhD work in the area of probabilistic data management,
Chris received the SIGMOD 2010 Jim Gray Dissertation Award. Chris's
papers have received four best papers or best-of-conference citations
(best paper in PODS 2012 and best-of-conference in PODS 2010, twice,
and one in ICDE 2009). Chris received an NSF CAREER Award in 2011 and
was recently granted his first patent.
Speaker: Chris Lintott
Title: Infrastructure for 600,000 scientists
Where:
University of Washington, Seattle.
Computer Science and Engineering Department.
Paul Allen Center, Database Lab, CSE 405.
When: Monday, April 16, 2012, 3.30pm-4:30pm.
Abstract:
Zooniverse.org hosts a large collection of 'citizen science'
projects which provide the hundreds of thousands of registered users
with authentic opportunities to engage in the process of research,
whether by classifying galaxies, transcribing papyri or listening to
whale calls. Project lead Chris Lintott will describe the design and
infrastructure behind supporting science at web scale, with a
particular focus on the tools needed to allow communities of
volunteers to make serendipitous discoveries and lead their own
research.
Bio:
Christopher Lintott is currently serving as the Director of Citizen
Science at the Adler Planetarium. He is a post-doctoral researcher who
is involved in a number of popular science projects aimed at bringing
astronomical science to a wider audience. He is the co-presenter of
Patrick Moore's BBC series The Sky at Night and a co-author of the
book Bang! - The Complete History of the Universe with Patrick Moore
and Queen guitarist Brian May.
Listed in reverse chronological order. Click here for Abstracts
Please sign up for the nwds mailing list here. We use this list primarily to send announcements for upcoming events. After you register, you can send mail to that list at nwds at cs...
To become a member, please contact Magda
The North-West Database Society was founded on January 1st 2006 by Dan Suciu and Magdalena Balazinska. It is inspired by the New-England Database Society.