Pages

Sunday, April 8, 2012

Simile Exhibit @ VGSoM

Introduction

Simile stands for "Semantic Interoperability of Metadata and Information in un-Like Environments". It was a joint research project run by the World Wide Web Consortium (W3C), Massachusetts Institute of Technology Libraries and CSAIL (MIT Computer Science and Artificial Intelligence Laboratory) and funded by the Andrew W. Mellon Foundation. The project ran from 2003 to August 2008. It focused on developing tools to increase the interoperability of disparate digital collections; much of SIMILE's technical focus is oriented towards Semantic Web technology and standards such as Resource Description Framework (RDF).
The Semantic Web is a collaborative movement led by the World Wide Web Consortium (W3C) that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of data". It builds on the W3C's Resource Description Framework (RDF). The RDF is a family of World Wide Web Consortium (W3C) specifications. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax formats.
SIMILE was focused on developing robust, open source tools that empower users to access, manage, visualize and reuse digital assets. SIMILE seeks to enhance inter-operability among digital assets, schemata/vocabularies/ontologies, metadata, and services. 
SIMILE leverages and extends DSpace, enhancing its support for arbitrary schemata and metadata, primarily though the application of RDF and semantic web techniques mentioned above. The project also aims to implement a digital asset dissemination architecture based upon web standards. The dissemination architecture will provide a mechanism to add useful "views" to a particular digital artifact (i.e. asset, schema, or metadata instance), and bind those views to consuming services.
The SIMILE Project and its members are fully committed to the open source principles of software distribution and open development and for this reason, it releases the created intellectual property (both software and reports) under a BSD-style license.

The various tools published under SIMILE Project are as follows:
1.      Zotz
Zotz is a Firefox add-on giving you the ability to publish citations from your Zotero* to an Exhibit (via Citeline) in one step.
* Zotero is powerful research tool that helps you gather, organize, and analyze sources (citations, full texts, web pages, images, and other objects), and lets you share the results of your research in a variety of ways.

2.      Longwell
A web-based highly-configurable faceted browser for RDF datasets. Longwell mixes the flexibility of the RDF data model with the effectiveness of the faceted browsing UI paradigm and enables you to visualize and browse any arbitrarily complex RDF dataset, allowing you to build a user-friendly web site out of your data within minutes and without requiring any code at all.

3.      Piggy Bank
An extension to the Firefox Web browser that turns it into a Semantic Web browser letting you make use of existing information on the Web in more useful and flexible ways not offered by the original Web sites. Piggy Bank is a Firefox extension that turns your browser into a mashup platform, by allowing you to extract data from different web sites and mix them together.

4.      Solvent
A Firefox extension that helps you write Javascript screen scrapers for Piggy Bank.

5.      Semantic Bank
The server companion of Piggy Bank that lets you persist, share and publish data collected by individuals, groups or communities.

6.      Welkin
A graphical graph visualizer powered by RDF data and capable of displaying graphs with a real-time interactive visualization.

7.      Timeline
A DHTML AJAX timeline widget for visualizing temporal information. With this widget, you can make beautiful interactive timelines.


8.      Gadget
An inspector for large quantities of XML data and it's useful for useful in situations like exploration, migration, cleanup, evaluation schema emergence.
This is normally useful in situations like:
·         data understanding and exploration
·         data migration/transformation
·         data cleanup
·         data complexity evaluation
·         schema adherence understanding
·         schema emergence

9.      Referee
Referee reads your web server logs, crawls your referrers (the links that point to your pages) and extract metadata from those pages and text around the links that pointed to your pages. The website says: “Ever wondered who links to your pages and what they say about them? Ever thought that trackback might be missing something? Ever subscribed an ego feed and hated the fact that the same stuff keeps coming up over and over like it was new? Ever hated those referrer spammers that pollute your autotrackback scripts? If so, Refree is for you”.

10.  Babel
Babel lets you convert between various data formats. In particular, it lets you convert data into the Exhibit JSON format and preview the data right inside Exhibit.

11.  Exhibit
Exhibit lets you create web pages with support for sorting, filtering, and rich visualizations by writing only HTML and optionally some CSS and Javascript code. There is no database and no web application technologies involved.

12.  Appalachian
Appalachian is a Firefox add-on that adds the ability to manage and use several OpenIDs to ease the login parts of your browsing experience.

13.  Timeplot
Timeplot is a cross-browser DHTML (canvas-based) time series plotting widget.

14.  Seek
Seek adds faceted browsing features to Mozilla Thunderbird and lets you search through your email more effectively.

15.  Potluck
Potluck is a research prototype for a user interface to mix and align structured data coming from different exhibits.

16.  jsTeX
jsTeX is a javascript library that is capable of interpreting some (basic) TeX encodings and transform them into HTML definitions right directly on a web page.

17.  Citeline
A web application to facilitate the web publishing of bibliographies and citation collections as interactive exhibits and facilitate the sharing of this type of data.

SIMILE EXHIBIT


It provides a Publishing Framework for Data-Rich Interactive Web Pages.
Exhibit lets you easily create web pages with advanced text search and filtering functionalities, with interactive maps, timelines, and other visualizations like the following examples:
Example-1: CSAIL Principal Investigators grouped by positions and their office


Example-2: Billionaires in history – Where they are from?


Example-3: US Cities by Population - Using Exhibit, this map can be made with just the two simple files.


Example-4: CIA World Factbook – People. It shows birth-rate Vs death-rate statistics. As the user has clicked “Chinese” under language group, it has shown statistics for 9 countries speaking Chinese language:


Example-5: Exhibit Timeline - 63 MIT-related Nobel Prize Winners being shown on a timeline according to the year in which they won Nobel Prize:


EXHIBIT 3.0


Exhibit 3.0 is the latest version of SIMILE EXHIBIT which lets you publish data-rich web pages without complicated programming. It can be used in two forms:

a)      Exhibit 3.0 Scripted (rc1): With Exhibit 3.0 Scripted mode, you can visualize data in a Web browser with a simple HTML-based configuration. No programming or server-side set up required. Exhibit 3.0 Scripted is designed for smaller data sets – for publishing rich interactive exhibits, with thousands of items, right in your Web browser.

b)      Exhibit 3.0 Staged (beta2): Staged mode requires the use of server software to publish bigger data sets. Exhibit 3.0 Staged mode extends the capacity of Exhibit by combining the in-browser software with greater capacity of a server-based component. The server stores and indexes data, and handles browser queries.

How to choose between Scripted and Staged Exhibit?


Smaller data sets numbering a few dozen, a few hundred, or up to a thousand items can run in the browser using the Scripted mode of Exhibit. There is no set item limit for Scripted mode. If your data set has smaller items with few properties and short values, you may find Scripted mode handles a few thousand smaller items. No programming is required beyond the basic HTML you use to author an Exhibit page.
Larger data sets – up to hundreds of thousands of items – are better suited to Staged mode. Running Staged mode requires that you host the server software yourself or locate a provider who can host it for you.

Installing and Setting up Exhibit 3.0:

Scripted
Simply include the scripts hosted at simile-widgets.org in your page: 

From within the repository, the code that needs to be served can be found at scripted/src/. Making the entire scripted/src/ directory available from your HTTP server is sufficient. Access your deployed Exhibit in your pages by using:

Staged
Exhibit 3.0 Staged runs on the Backstage server. Working with Backstage is not as simple as working with Exhibit Scripted. In addition to HTML and publishing Web pages, Backstage is Java software that acts as a server. 

·         Get the source:

Creation of webpage using Exhibit

This section will describe the step by step procedure of creating a webpage with the help of SIMILE Exhibit. 
This widget from the SIMILE project helps in building semantic web-pages without the need of advanced 
programming. The following shows the steps followed:

Step 1: Gathering the tools

Before embarking on the journey of preparing a webpage using Exhibit, one must gather couple of tools. 
These include the following:

1) Text Editor
The first thing required is an editor where one can write the HTML codes. Although it can be done with the 
help of common editors such as Notepad, Wordpad etc. it is better to go for Notepad++ (Windows user) / 
Text Wrangler (MAC users). For the purpose of this tutorial we will be using Notepad++. MAC users can 
download Text Wrangler here: Text Wrangler



Notepad++ has many features which are useful for coding purpose. Visit http://notepad-plus.en.softonic.com/ 
to download the software.




Once download is finished, the installer can be double clicked to start the installation. Proceed with your 
choice of installation language and installation directory. Complete the installation. Move on to the next step.

2). Web Browser
One needs to have a browser in order to run the webpage. Any of the following browsers will work: Mozilla 
Firefox, Google Chrome, Internet Explorer. For this tutorial Mozilla Firefox will utilized. Visit 
http://www.getfirefox.net/ and download the latest version of the browser.


3.) Microsoft Excel/Spreadsheet
You also must have an application capable of creating spreadsheet. Spreadsheets act as an input to the 
Exhibit which results in creation of the WebPages. One can use Microsoft Excel, Google Spreadsheets or any 
other software. Google spreadsheets are available free of cost. One needs to follow the steps:

Step 2: Collect Data & Build Spreadsheet

For the purpose of this tutorial we have decided to build a webpage consisting of movies released in a given 
month in various industries. The data has been collected from various Wikipedia pages related to Bollywood 
movies. The data was then compiled to form a spreadsheet.

As you can see, the first row consists of the headers. It gives the field names of the various columns. Each 
row starting from the second represents a record. It consists of the movie name, the industry it has been 
released, date of release etc. The following things must be considered while constructing the spreadsheet:
a) Name one column “Label”. This column must contain unique data. The label will help in identifying each 
record. 
Label
YEAR
    GENRE
CAST
2007
Drama
Shreyas Talpade


b) To store multiple data in a single cell, enter each data separated by a semi-colon. For example suppose a 
movie can be listed under two Genres – Romance & Comedy. To record the genres under the Genre header 
enter Comedy; Romance.
TITLE
YEAR
GENRE
GENRE
Fool n Final
2007
Action
Comedy; Romance


c) Store dates in ISO 8601 format. It is important to make the date field compatible with Exhibit.
d) Do not use quotes, exclamation marks, commas etc while name a field
e) Be consistent in entering the data. Suppose you enter the Genre Comedy for one movie and comedy for 
another the movies will be considered to be from separate Genres.
One can also use Multiple Spreadsheets to create the input data for Exhibit. In case multiple 
spreadsheets are used, the tables must be differentiated on the basis of “type” – a separate column 
added in order to give an idea about the type of data the spreadsheet consists of. 

These two spreadsheets can then be combined while converting them to Exhibit usable format (JSON). The conversion process is explained in the next stage.

Step 3: Converting to Exhibit usable format

To use the spreadsheet in building of the webpage, it must be first converted into a JSON (JavaScript Object Notation) file. It is a lightweight data-interchange format which is easy for human users to read and write and also easier on the part of a machine to parse and understand. Exhibit requires that the data fed into it is in JSON format. The spreadsheet created can be converted to JSON format using Babel. Visit http://service.simile-widgets.org/babel/ to use the application.
It provides the option of converting various types of files to JSON format in addition to other output formats. The following table lists the various input and output options provided by babel.
Input Types
Output Types
Bibtex
Exhibit JSON
Excel
Exhibit JSONP
Exhibit JSON
N3
Exhibit embedding Web Page
RDF/XML
JPEG
RSS 1.0
KML
Text
N3

RDF/XML

Tab-Separated Values





A snippet of the JSON file created:


As can be seen, each column item has been defined for the item labeled “Hari Puttar: A comedy of Terrors”. The contents of each column item such as cast, director etc have been shown in the JSON file. The webpage while loading uses the styling techniques defined in the HTML file and the contents of this page. 

Step 4: Creation of the webpage

This stage involves the creation of the HTML page that will define the contents and formatting of the webpage.  We will provide with the ready to use code that you can directly copy and use. Make the changes as required by the type of data your spreadsheet consists of and you are ready to go. Follow the steps mentioned:
A) Preparing the basic structure
  • Open your notepad++. Then go to    FileàNew
  • Copy the following into the file and save it as movies.html (remember saving as type HTML)

The above figure shows the webpage created with the help of the codes just written.
Before moving on, some important points regarding Exhibit:
  • and acts as templates for data display
  • An is added to div or span in order to inform the Exhibit to handle it specially
  • Views: Are used to define how a collection of data will be displayed onscreen
  • Lenses: Lenses are used to define the formatting and styling of individual items
  • Facets: These are used to include filtering features in the webpage 

B). Adding Filtering features 

Suppose you want to give options to the user of filtering the movies shown on the page based on Genre/Date of Release/Cast etc. As anyone with previous attempts on doing so will tell you, going by conventional way is very complex and requires programming skills. Exhibit really makes the whole task just a few clicks away (or rather a few Cltr+C & Cltr+V). To add the filtering feature one has to use the facet feature. This can be done by including the following codes in “Codes Area” of movies.html.
Just copy and paste it above the other facets code just added. This search button can be used to search for any text on the webpage.

C) Adding lenses:  To define how each individual item should look on the webpage lenses can be defined. They consist of the specific formatting for each of the element shown on the webpage. 
Also add the following code inside the 

Final Webpage:

D) Adding timeline

Another feature presented by SIMILE is timeline. It creates a pictorial showing various events categorized by the time of their occurrence. This can be included in the webpage with the following code:




Prepared by:- 
Kanishka Chakraborty
Rahul Aggarwal
2010-2012, VGSoM
IIT Kharagpur

No comments: