Objective
The objective of this exercise is to learn to query data collections using the map-reduce programming model. For this purpose you will use CouchDB, a NoSQL document oriented database where data is stored/retrieved as JSON documents.
Requirements
CouchDB in a nutshell
CouchDB is a NoSQL database that completely embraces the web:
- Data is stored as JSON documents.
- Documents are created and accessed via HTTP (i.e., using a browser).
- Queries are expressed as Javascript map-reduce functions.
The following instructions illustrate how to create and populate a database in CouchDB using data coming from the Deezer‘ music catalogue.
Create and populate a database
- Create the deezer database:
# Assuming CouchDB default address (http://localhost:5984) curl -X PUT http://localhost:5984/deezer
- Download Muse‘ albums and similar artists:
curl -X GET http://api.deezer.com/artist/705/albums > MuseAlbums.json curl -X GET http://api.deezer.com/artist/705/related > MuseRelatedArtists.json # Verify the existence of the files ls *.json
- Populate the deezer database with the retrieved data:
curl -X PUT http://localhost:5984/deezer/muse_albums --upload-file "MuseAlbums.json" curl -X PUT http://localhost:5984/deezer/muse_related_artists --upload-file "MuseRelatedArtists.json"
- Verify the content of the database:
curl -v http://localhost:5984/deezer/muse_albums curl -v http://localhost:5984/deezer/muse_related_artists
- Access and observe the database deezer on Fouton (CouchDB web user interface):
http://127.0.0.1:5984/_utils/index.html
Querying the database
Queries are defined in Futon as temporal views composed of a map and (optionally) a reduce function. For instance:
- Retrieve the name and the web page of the groups that are similar to the rock band Muse.
// Map function(doc) { var artists = doc.data; if(doc._id == "muse_related_artists") { for(var i in artists) { emit(artists[i].name, artists[i].link); } } }
- Compute the total number of the albums produced by the rock band Muse —use the reduce check button (see figure). If it does not appear refresh the page.
// Map function(doc) { var artists = doc.data; if(doc._id == "muse_related_artists") { for(var i in artists) { emit('muse_albums', 1); } } } // Reduce function(keys, values) { return sum(values); }
TODO
For this practical work you will use Allocine Data Collection, which contains JSON documents with information about the films presented in 2011 in Grenoble (cf. allocine.fr). Each of these documents contain the films presented in a cinema of Grenoble at that time (i.e. there is a file per cinema and a total number of 9).
The following commands help you creating and populating the allocine database:
# Create database allocine curl -X PUT http://localhost:5984/allocine # Populate database allocine curl -T "allocineGrenoble1.txt" http://localhost:5984/allocine/allocineGrenoble1 curl -T "allocineGrenoble2.txt" http://localhost:5984/allocine/allocineGrenoble2 curl -T "allocineGrenoble3.txt" http://localhost:5984/allocine/allocineGrenoble3 curl -T "allocineGrenoble4.txt" http://localhost:5984/allocine/allocineGrenoble4 curl -T "allocineGrenoble5.txt" http://localhost:5984/allocine/allocineGrenoble5 curl -T "allocineGrenoble6.txt" http://localhost:5984/allocine/allocineGrenoble6 curl -T "allocineGrenoble7.txt" http://localhost:5984/allocine/allocineGrenoble7 curl -T "allocineGrenoble8.txt" http://localhost:5984/allocine/allocineGrenoble8 curl -T "allocineGrenoble9.txt" http://localhost:5984/allocine/allocineGrenoble9
Using the allocine database try to answer the following questions. Do not forget to save your queries as: _design: answers, view name: qX.
- Define a view in MapReduce that contains, for each theatre, the films presented in it. Hint: You do not need a reduce here.
- Modify your previous answer and filter the theaters outside Grenoble (e.g., do not include the theatres in Saint Martin d’Hères).
- Give the number of films that each theatre is presenting. Hint: You need a reduce here.
- Give the list of films with a press rating higher than 4 stars. Attention: filter duplicates.
- Give the list of films presented 2 years ago (10.12.2011), and for each film, the theatre where it was presented and its schedule.
- BONUS! Give the list of films, and for every film, the list of theatres that present it (this question is a challenge but we encourage you to try to solve it).
Resources