New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here


by: Anindita Mondal

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

My research paper
Research paper
Dr. Ramez Elmasri
75 ?




Popular in Research paper

Popular in Computer Science and Engineering

This 7 page Bundle was uploaded by Anindita Mondal on Wednesday September 7, 2016. The Bundle belongs to cse5194 at University of Texas at Arlington taught by Dr. Ramez Elmasri in Fall 2016. Since its upload, it has received 4 views. For similar materials see Research paper in Computer Science and Engineering at University of Texas at Arlington.

Popular in Computer Science and Engineering


Reviews for Paper


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/07/16
Anindita Mondal 1001358370 Report on Facetedpedia: Dynamic Generation of Query-Dependent Faceted Interfaces for Wikipedia Abstract The paper aims at shedding light on a faceted retrieval system called Facetedpedia which can be used as a tool for information discovery and search in Wikepedia. A faceted interface which is used for sifting through result article is generated by Facetedpedia when a set of Wikipedia articles ensuing from a keyword search is given to Facetedpedia. Facet generation and hierarchy construction are both dynamic as well as automatic in Facetedpedia unlike other facet retrieval systems. The use of collaborative vocabulary in Wikipedia such as hyperlinks and folksonomy leads a large possibility of choices of faceted interfaces. Metrics are proposed for ranking individual facet hierarchies based on user navigational cost while metrics for ranking interfaces based in the average navigational cost and average pair wise similarities. To optimize the ranking metrics, facet interface discovery algorithms are developed. An experimental evaluation and user study is done to test if the system’s performance. Introduction Over the years Wikipedia has earned the title of being the biggest encyclopedia ever to be created boasting over 3 million articles as of date. The most common way in which users access these articles is through keyword search. This is one of the most efficient and ubiquitous means of searching for specific web pages with matching keywords. Although a user might want to access relevant topics and not just articles containing the particular keyword if she desires a more sophisticated information discovery and search tasks. This is not possible to do with keyword search alone as doing so might be error prone and time taking. Hierarchical faceted categories can be a convenient method for information exploration. Here, for a set of objects the faceted interface would be a set of category hierarchies where each hierarchy relates to a different facet of the object. The user gains access to the intersection of the chosen objects on individual facets when sifting through multiple facets. The technique hence takes after recurring constructions of conjunctive queries with selection conditions on multiple dimensions. For example, if a user is trying to probe into more information on action films, the Facetedpedia takes a keyword as input and gives a list of k values as output. If the user types in a query saying ‘us action films’ Facetedpedia will create a faceted interface dynamically. For ‘us action films’ the selected facets could be the actors, the companies, the directors etcetera, Just like Wikipedia, the user is allowed to navigate through the parent-child relationships with the help of navigation paths and titles of articles. Whenever the user clicks on a result link in Facetedpedia, the respective page in Wikipedia opens up. In the process of doing this, the user can filter out large amounts of data and find the data that is of particular interest much more easily. Overview of Challenges and Solutions Two challenges regarding dynamic discovery of query-dependent faceted interfaces are discussed and their challenges and probable solutions are discussed. When Wikipedia produces the top ranked set of s articles, Facetedpedia produces an interface which has multiple dimensions which help in exploring the articles. Both dynamic and automatic faceted dimensions are considered for the study. Because of the query-dependent nature of the study, pre-computed facets could not be considered. If faceted interfaces are considered for objects which have their schema, the objects can be captured from the schema. In such cases, a query-independent static interface can be used. Wikipedia articles fall short of having a clearly defined dimensions of their schema. Therefore, trying to implement using static interface would be futile attempt. Even though if we assume that the facets could be computed beforehand, it is futile as the results should be automatic and dynamic. The humongous size of Wikipedia and its fast changing nature prove a challenge to the scalability of the application. The two main challenges that are posed in developing Facetedpedia are: Challenge 1: Lack of availability of facets and their respective category hierarchies. Facetedpedia is built upon two important concepts, facets and their respective category hierarchy. Therefore, facet identification and hierarchy construction become two most important tasks. Challenge 2: No proper metrics to measure the goodness of the facets collected. Only those facets which are useful for user navigation need to be considered for each search. The main problem is identifying which facet can be considered as a ‘good’ facet for a particular search. Also since a search results in a collection of facets, the interoperability between the facets also need to be considered. A facet which can be ‘good’ individually may not be a ‘good’ facet when considered along with the other facets collectively. Challenge 3: Designing an efficient interface. Due to the numerous search results, it is impossible to apply direct ranking metrics to all the amount of data. The communication between the various facets make it difficult to calculate the exact cost. Calculating the cost of each facet individually without considering its interactions between the various facets would also result in wrong calculations of the total cost. Facetedpedia Outline  Concept: Facetedpediais supposed to be a unique system which is loosely based on Wikipedia. It makes use of collaborative vocabulary to select faceted interfaces for each search.  Metrics: The authors propose a technique named faceted ranking which measures the ‘goodness’ of a facet based on the navigational activities of the user. The value of goodness of the facet is graded both individually and collectively based on its interactions with other facets.  Algorithms: For faceted interface discovery, algorithms are developed. Which help search for facets in large search spaces.  System Evaluation: The quality and efficiency of the system is measured qualitatively and quantitatively based on feedback given by several users. Faceted Retrieval Systems Faceted interface has developed to be more compelling throughout the most recent couple of years and we have seen an exponential increase of its application. Commercial sellers, for example, Endeca, IBM, and Mercado, and E-commerce Websites like, etc., have formally endorsed faceted interfaces. The utility of faceted interfaces was researched in different studies, where it was demonstrated that clients occupied with exploratory undertakings frequently lean toward such result gathering over basic positioned result list (normally gave via web indexes), and also over option methods for arranging recovery results, for example, clustering. The following paragraphs would be a discussion on the various taxonomies related to existing faceted interfaces. These are then compared to Facetedpedia. Existing exploration models or business faceted recovery frameworks for the most part can't be connected to meet their objectives, since they either depend on manual or static dimension development, or are for organized records or content accumulations with endorsed metadata. Not very many have examined the issue of element disclosure of both aspect measurements and their related classification chains of importance. The authors are the first to propose a query based retrieval system for Wikipedia. A technique known as complete search helps in completion of query and refinement of the query to match the standard query patterns. CompleteSearch underpins question fulfillments and inquiry refinement in Wikipedia by an special kind of "facets" on three measurements that are altogether different from the idea of general features: completing the query as per the matching the query terms; classification names coordinating the query terms; and classifications of result articles. As of late, a faceted Wikipedia search interface came out of DBpedia venture around the same time. The dimensions there give off an impression of being query independent from normal Wikipedia information box characteristics. Taxonomy by Facet Types and Semantics The earlier systems can be broadly categorized into two groups based on this aspect. Some systems have facets on relational data and some have it based on structured attributes in the schema. In the latter, the attribute hierarchies are defined based on domains. The hierarchies could also be done manually. Manually creating the hierarchies could result in a richer web of information. In a few systems, the hierarchy is built based on IS-A relationships. These systems cannot be used much in semantic networks. In comparison, Facetedpedia uses hierarchies which are semantically rich. It does not use text attributes(eg., hyperlinked keywords in text attributes). If predefined schemata is not available, Facetedpedia uses the rich semantic networks to build a semantic rich facet based upon collaborative vocabulary. It neither relies upon IS-A relationship nor subsumption relationship. Taxonomy by Degree of Automation or Dynamism Earlier in this article, we have discussed about the two pillars of faceted interface which are the facet and the hierarchy. Facetedpedia is both automatic and dynamic in nature which differentiates it from the other faceted interface systems existing today. All the other systems existing today are fully automatic in either identifying the facets or constructing the hierarchies, not both. In a few systems that are existing today, the dimensions and hierarchies are defined beforehand and hence do not have a facet identification/discovery system or a hierarchy construction system. Any interesting facts or a subset of the available facts are taken from a set of predefined sets. In a few systems, the dimension set is pre-defined whereas the network hierarchies are constructed based upon the subsumption technique. There are a few systems where only a single special dimension is generated automatically and the remaining sets of dimensions are defined beforehand. Concerning the mechanization of faceted interface revelation, the nearest work similar to the authors work is the Castanet calculation. The calculation is planned for short printed portrayals with restricted vocabularies in a particular space. It naturally makes features from an accumulation of things (e.g., formulas). The chains of importance for the different features are gotten by first producing a solitary scientific classification of terms by IS-A connections and afterward expelling the root from the scientific classification. Faceted interface for Wikipedia by collaborative vocabulary For building on the faceted interfaces for Wikipedia, the authors have used a technique named as collaborative vocabulary. User generated collaborative vocabulary such as grass roots system is used. The internal hyperlinks in each Wikipedia article is a perfect example of collaborative vocabulary. These hyperlinks indicate a collaborative endorsement of the various relationships by the users. The collaborative vocabulary contains a set of all searches by a group of users. The searches that are done more frequently are ranked high in the hierarchy. Collaborative vocabulary is end product of all the users’ intelligence as well as the rich set of semantic information. It thus serves as a favorable base for beginning the study on faceted systems. The fact that each article in Wikipedia is hyperlinked to many other articles is enough to say that collaborative vocabulary is being used extensively. The way that the writers of an article cooperatively made hyperlinks to different articles means that the importance of the connected articles in depicting the given article. This to a great extent improves the semantic data connected with the facet. Wikipedia also uses the concept of category hierarchy. It uses a category-subcategory relationship between its various categories which also allows users to go from a very generic description of the topic to a very specific description of a specific topic. The authors then defined their proposed solution as a set of 5 definitions c1– cnare the category which can again be sub categorized. p’1 – ane the hyperlinks to the target articles p p are the target articles 1 – n Definition: Given a keyword query q, the set of top-s ranked Wikipedia articles, T ={p1, ..., ps}, are the target articles of q. Given a target article p, each Wikipedia article p’ that is hyperlinked from p is an attribute article of p. This relationship is represented as p′ ← p. Given T, the set of attribute articles is A={p’ , ..., p’ }, 1 m where each p’isian attribute article of at least one target article pj∈T. Four more such definitions are defined for each attribute such as category hierarchy, facet, navigational path, faceted interface etc. Based on the formal definitions, we can say that, if we have a category hierarchy H, and a query q and its target results T and a set of attributes A, the main problem would be to find the goodness of the facets that are thus developed. The goodness of the articles can be measured using Facet Ranking method. Facet Ranking Method For a typical faceted interface discovery system, the search space is usually very large. Given a set s of target articles in Wikipedia for a query T, the result would be an extensively large set of articles which have very complex hierarchies. If we have a very large search space, we need to devise very good ranking techniques The different facet ranking methods that have been used are: Single-Facet Ranking and Multi- Facet Ranking. User Navigation model: A client explores diverse components in a k- highlight interface. Toward the starting, the course begins from the bases of all the k highlights. At every stride, the client picks one point and looks at the strategy of subcategories accessible at the present course of action on that component. She tails one subcategory to advance go down the class chain of centrality. On the other hand, the client may pick one of the property articles reachable from the present portrayal. The choices made on the k perspectives together packaging a conjunctive request. After the choice at every stride, the rundown of target articles that fulfill the conjunctive solicitation are gone on to the client. The course closes precisely when the client surmises that she has seen engaging target articles. Algorithms One of the methods for interface discovery is to elaborate on all the possible k-facet interfaces for each hierarchy. But this method would result in a large dataset of hierarchies and would be a very naïve technique which would be extremely costly to implement. Therefore, there is a need to find the best k- faceted interface. The two important aspects of a k-faceted algorithm would be a) reducing the search space b) searching the search space effectively Reducing the search space: Search space could be reduced by focusing on the subset of the safe reaching facets. Searching the space: For efficiently searching the search space, a hill climbing algorithm can be used to find a solution that fits. It does not use the regular exhaustive technique of examining all possible solutions manually and uses hill climbing technique instead. The hill climbing technique considers both average navigational cost and pair-wise similarity of the dimensions. The k-faceted algorithm is built stepwise by following steps: construction of relevant category hierarchy, ranking of a single facet, searching for a k-facet interface. After implementing the above algorithms, Facetedpedia is developed. Experimental studies are then performed on this. Experimental Settings The language used to develop Facetedpedia is C++. Database used is MySQL. The dataset is loaded into the database. This experiment uses a Dell PowerEdge 2900 II server with a Linux kernel with dual quad- core processors which are 2.0GHz each. 8GB RAM is used with three 1TB hard drives Dataset: Wikipedia dump of July 24, 2008 was considered, particularly the tables page.sql, pagelinks.sql, categorylinks.sql and redirect.sql as these tables have all the data such as the hyperlinks between various articles, category of articles etc. Extensive querying was done on this data to build category hierarchy and also to remove any loops(circles) in the category hierarchy. The famous depth first algorithm is used to detect elementary circles. Other processing includes the removing irrelevant hyperlinks and articles and also removing special articles such as lists and stubs. 20 keyword queries were developed for the data. Characteristics of Generated Facts The results were tested upon three algorithms: Hill climbing algorithm, top k- facets, randomly k facets. Regardless of the way that hill-climbing had a hardly more terrible target article scope than the other two (5% less), it beat them in pair-wise closeness which infers the k angles picked have smaller front of navigational ways. The clear after results exhibit that hill-climbing started from picking top-k viewpoints and regulated supplanted similar components by less practically identical ones. The last k viewpoints picked by hill-climbing when in doubt were still inside the fundamental 30%, while the ones picked by k were consistently spread among the results from single-perspective situating. The ordinary width and stature of the perspectives created by the three techniques were about the same, beside that unpredictable k now and again picked some significantly more broad angles. Their ordinary width and stature were when in doubt around 10 and 6, independently. Subsequently the fan out of inside center points and the length of navigational routes are inside a sensible reach for the customers. All around, hill-climbing helps us reducing covering perspectives without losing much extent of target articles. It was also noted that Facetedpedia scaled well in time. The execution time of the system increased linearly with the increase in the number of articles. It was also noted that this application achieved fast response without performing much optimization on the system. Critique The authors have tried their level best to develop a Wikipedia like search based system to improve searching by developing multiple dimensions on which a search can be made. In this paper they proposed Facetedpedia, a faceted search framework over Wikipedia. This framework gives a dynamic and mechanized faceted quest interface for clients to skim the articles that are the aftereffect of a catchphrase look question. Given the sheer size and many-sided quality of Wikipedia and the huge space of conceivable faceted interfaces, they proposed measurements for positioning faceted interfaces and in addition productive calculations for finding them. The most important aspect of this application was the identification of facets and construction of the hierarchies of the facets. These were the two pillars upon which this application was built. They have used many algorithms such as Facet Ranking methods and category hierarchy algorithms to identify facets and build hierarchy. They have used hierarchical facet discovery to build on the articles. The more the number of articles, the better the results. I feel that the dataset that they have considered was very small when compared to the actual size of data that the application might be handling in future. References: [1]Facetedpedia: Dynamic Generation of Query-Dependent Faceted Interfaces for Wikipedia [2]S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. DBpedia: A nucleus for a web of open data. In 6th Int.l Semantic Web Conf., 2007. [3] H. Bast and I. Weber. The CompleteSearch engine: Interactive, efficient, and towards IR & DB integration. In CIDR, pages 88–95, 2007.


Buy Material

Are you sure you want to buy this material for

75 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Anthony Lee UC Santa Barbara

"I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."


"Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.