DESIGN FOR DIGITAL REPOSITORIES : CONCEPTUALIZING NEW COMMUNICATIVE PARADIGMS TO FILTER AND VISUALIZE SCIENTIFIC

This article aims to conceptualize a new communicative paradigm applied to academic scientific repositories. The publication and the querying of articles, papers, journals, books and other documents, are an integral part of the research process. However, the querying and information visualization process in a scientific academic repository, often proves to be inefficient, because the wide range of results hardly fits in the user’s specific subject. In this sense, it is presented a brief analysis around major reference projects, which although based in the metric of article citations (impact factor), the primary goal lies in the visualization of an extensive citation structure and the relations established between the different scientific fields. Based on the modus operandi of these visualization interfaces, the main objective of this paper is to propose a new approach, where the filtering and the visualization of information is based in the user’s experience instead of the usual citation "object" centered approach.


INTRODUCTION
The publication and querying of scientific articles, journals, books and other documents, are an important part on the academic research process, and in the researcher quotidian.The digital knowledge repositories (DKR) have facilitated numerous tasks related to the querying of knowledge objects (KO) (e.g.articles, journals, books, others).Despite the easy accessibility, the search of relevant information in a DKR proves to be an arduous and a time-consuming task.Normally the standard search engines used in the DKR only allow a limited refined search based on keywords, name of author/s, title, year, relevance, related articles, among other examples.Data related to the user characteristics, e.g, academic field, academic degree, which articles were consulted by the user, among other examples, are practically nonexistent.In fact, the search and information visualization process in DKR often proves to be time-consuming and an inefficient process, in part, because a large part of the obtained results are not specifically aimed to the specific interests of the researcher.This is also a cross-cutting issue in the digital academic knowledge repositories (DAKR).Despite of the DAKR primary function be aimed to the storage, structuring and search/querying of KO, they also can be redesigned to better support researchers.In fact, the specific problematic related to the search and information visualization of KO in DAKR, it is defined by the filtering and framing of the results in the perceptual and cognitive field of the user.In this sense, InfoVis has enabled the structuring of a precise and efficient relationship with information (Card et al. 1999;Tufte 2009;2011;Chen 2006;Manovich 2010;Fry 2008;Mazza 2009;Lima 2011;2014;Meirelles 2013;Liu et al. 2014Liu et al. ,1373-93)-93).It enables to go far beyond simple data gathering as it allows viewing in an analytic and synthetic way, but also to assign data to a form and efficient framework in the perceptual and cognitive field of the user (Ciuccarelli 2009).Therefore, the main objective of this this article is the conceptualization of an interface directed to the viewing of the structures that emerge from the relation established between the user and the KO stored in DAKR.

Context of the speCifiC problematiC
The artifacts developed over several ages, such as maps, libraries, encyclopedias, digital libraries and digital repositories, show the cultural evolution of information systems.The current development of Information and Communication Technologies (ICT) have enabled significant progress e.g. in logistics, financial management, accessibility, knowledge systems.However, an efficient communication of information is a complex task that the global networked society faces.A typical example is that there is still a great difficulty in effectively communicating information in various sectors and services of society (Wurman 2001, 9).According to Wurman (2001: i), what in fact this reflects is not an excess of information, but an explosion of "non-information", data that simply doesn't inform.In this sense, the main question that arises around the abundance of information, leads us to another subject related to the problem which comes during the continuum understanding process (Shedroff 1999, 4).Specifically, when we feel overwhelmed either cognitively or perceptually by a type of information that does not correspond to our specific interest (Wurman 2001, 14-5).In fact, a large part of the published and accessed information is not subject to a process of efficient filtration (Thackara 2006, 163).A process that should consider not only the state of knowledge (Shedroff 1999), but also the shape, structure and framework as fundamental aspects in the relationship between the user and the information.Wurman (2001, 9), states that the task of developing and exploiting new forms of communication that aim for a more efficient meaning of content is entirely the responsibility of the Design/er.The DKR are such example, by improving the browsing and the retrieval/ searching of information.The DKR are characterized as being complex and multifaceted information structures.Normally the organization and information browsing/search process is based on an indexing system (database), which enables the user to find KO by querying for relevant metadata, e.g.subject, title, ISBN, DOI, year of publication, authors, publishers, reviews, detailed descriptions as abstracts or summarys, number of downloads, impact factor, relevance, descent articles (as in ACM Digital Library or IEEE Xplore Digital Library section "cited by").However, faced with the exponential volume of stored data, and taking into account the specific research interest of the user, the standard search engines proves to be inefficient, in part by the wide range of results obtained.The user chooses mostly the first result, despite the additional filters available to support search and browsing tasks.According to Chen et al. (1998, 583-4), navigation and search tasks are susceptible to the problem of information overload.Usually, the browsing behavior is adopted when the user does not have a specific research objective.Task that reveals to be an inefficient process for the user who wants a more targeted approach, by the fact that results are typically found in a serendipitous manner.When the user has a specific purpose in mind, the adopted behavior is the searching mode (Zhu et al. 2006, 160), performing additional tasks to obtain details about the searched subject, like reading the abstracts, or the references section.In this sense, the current analysis context describes the specific problematic, which is the relationship between the user, the DKR and the search and browsing tasks perfomed.

DIGITAL SOCIAL FILTERING: FOLKSONOMIES AND REPUTATION SYSTEMS
The increased storage capacity and the resulting exponential publication of data, led to a constant search of information sources based in self-interests (Wurman 2001, 8).As stated by Darlin (cit. in Johnson 2010, 118), "Everything we need to know comes filtered and vetted.We are discovering what everyone else is learning, and usually from people we have selected because they share our tastes".Fact which in turn enables access to an independent type of information that was not easily available previously (e.g.product features), allowing the user to realize a more oriented approach.Currently, the user has at his disposal a number of tools that allow a more sustained research.A typical example is that we seek the users feedbacks, like comparisons, ratings and comments within a networked community with a common interest for a given product or content.Content tagging systems are not an innovation of current ICT.According to Wright (2008, 25), the first taxonomic systems, precedes the first pre-literate civilizations.In this sense, it should be noted that the first taxonomic systems are not based on a scientific culture, but on an oral culture rooted in tribal communities, directly related to the necessity to categorize the species (Idem, 2008).These were used to classify and organize into categories a body of knowledge related with the natural world (e.g.plants, animals, environment) (idem 2008, 22-38).The use of the classification systems, allied to a strong survival instinct, has triggered the need to categorize, collect and thus spread a set of valuable information about the natural world (ibid. 2008, 24).In fact, folksonomies were fundamental tools to the group's survival, since the domain knowledge about the flora and fauna guaranteed the perpetuation of the community/human species (idem 2008, 24-25).The form and the categorization mechanisms used until nowadays were shaped by folksonomies.In fact, these mechanisms are directly influenced by the principles that shape hierarchical and relational structures.
For example, a folksonomy is a hierarchical system which depends on the agreement (consensus) of the meaning within a social network community (relational structure), where the categorical hierarchy establishes the framework for the acceptance of the meaning, while the underlying social network structure establishes the cultural consolidation of that meaning (Wright 2008, 22-38).
The current digital collaborative tagging systems are an example of the evolution of folksonomies.Therefore, the interaction between users and content is supported by an open system of collaborative tagging, that allows the user to publicly classify the resources available (e.g.Del.icio.us, Flickr).However, it should be noted that the stability of a community is the result of an immediate and conscious feedback whether at individual or collective level (Golder & Huberman 2006, 39).Thus, the contribution of each individual user gives rise to an emergent social feedback of categorization standards (Golder & Huberman 2006;Obreiter & Nimis 2003).
According to Quintarelli (2005), a folksonomy emerges from an association between keywords and content, based on the "wisdom of the crowds".It should also be noted that according to Quintarelli (2005), folksonomies trigger serendipity, which means they are not an objectified solution to an targeted search, however constitute a valuable resource on the labeling of contents in open systems.
In this sense, and according to Jacob (cit. in Mathes 2004), collaborative categorization of content is less accurate and has unclear boundaries.
Because it is usually based on synthesis of similarity, it is simultaneously less focused on the systematic order of the KO.The KO may also have various terms associated, which are related and vary depending on the user's culture.As a filtration system, collaborative tagging of content is not a plausible option to answer the stated problem.
Relatively to reputation systems, it is noted that its development is directly related to the evolution of biological (cooperation between organisms) and cultural information systems.In the context of the evolution of cultural information systems, the formalization of daily activities was performed until a certain period of history through social behaviors, as in the case of formal agreements sealed with a simple handshake.With the introduction of documents, social behavior gave rise to written agreements, including documents and treaties, in order to promote the consolidation and expansion of each individual's social network (Wright 2008, 106-7).
According to Rheingold (2002, 128), reputation is one of the key factors of cooperation.In fact, reputation systems are characterized as the point of convergence between technology and cooperation (Rheingold 2002, 114), and therefore go beyond quantitative efficiency, enabling a rapid performance of tasks and processes considered slow and expensive (e.g.product analysis).In fact, according to Rheingold (2002, 114), "connecting human social proclivities" to the efficiency of information technologies, triggers an unprecedented scale factor of cooperation.In this sense, Resnick (2000, 45-8) states that reputation systems store, publish and aggregate feedback/experience and/or past user behaviours.Therefore, although the users of these systems are, usually, unknown to each other, they help each other in decision-making through advertising in whom to trust, encouraging safe behaviours and discouraging less correct behaviour.
One negative aspect about open reputation systems is the credibility of the assigned ratings.In fact, the peer to peer reputation/rating systems developed up to date have a few limitations in terms of credibility (Thackara 2006, 163), more specifically concerning the ratings and reviews that we use as a reference e.g. when buying online products.Dellarocas (Rheingold 2002, 127) and Resnick (2000) emphasized that the main problem detected in open reputation systems based on user feedbacks (e.g.Amazon, Ebay), lies in the vulnerability associated with the manipulation of ratings and reviews (e.g.User Identification).One of the main factors contributing to the limitations of the evaluation systems applied to the Web, comes from the fact that these systems are an open network structure (Resnick 2000).In fact, the main problem detected in open reputation systems technologies lies precisely in their vulnerability and consequent susceptibility to manipulation (Dellarocas, cit. in Rheingold 2002, 127).
Despite the issues of relevance and degree of reliability of the reviews and rating used on open network systems, the underlying concept of reputation systems allows users to play an individual role in a large cooperative network, wherein the individual feedback of each user contributes to the building of a broad view about a particular product or service.This implies that if the user of the open networked communities shares "what he knows and how he feels", it is then possible to create a reliable "database" to extract knowledge and create opportunities (Smith, cit. in Rheingold 2002, 30).
In the specific case of the problematic referred to in this article, the importance of collaboration concerning the evaluation of the contents acts as a complement to the KO research, according to the user's perspective.

RELATED WORK
The bibliographic citation is a common practice in various types of academic publications and an important measure of credibility.The citation ranking developed by Garfield (1955), is a tool that allow to measure the impact factor of scientific papers by the number of citations.This means that the relevance/impact factor of a paper, stems from the number (frequency) of citations (Wright 2008, 203).In this sense, the science citation index (SCI) has allows measuring the impact factor of one particular scientific paper, based on the cumulative value of citations.This means that the importance of a scientific paper is determined collectively by the research community (2008,204).The bibliographic references section of a scientific paper, is a key element that allows to verify the existence of a relational structure.In fact, large parts of quantitative studies (e.g scientometrics, bibliometrics) in the field of science, are characterized by the analysis of scientific citation flows, which are based not only in the reference/citation between publications, but also, in co-authoring publications, including collaborative structures between researchers (Staudt 2011, 1) (Börner 2014, 55).In fact, quantitative analysis around scientific structures are mainly defined by the number of papers written, number of authors of a paper, number of researchers involved, the existence and extent of a network of researchers, and degree of cluster (Newman 2001).Taking into account the problematic of visualization and filtering information, it's important to analyze some major reference interfaces dedicated to the visualization of scientific knowledge networks aimed at the visualization of trends and citation patterns, and to the classification and tagging of contents.Thus, the following three interfaces are highlighted: The Well-formed Eigenfactor is an academic research project which results from a collaboration between the Eigenfactor Institution (data analysis) and Stefaner (Visualization) (2009).It is an interface that consists of four interactive visualizations (in this paper we only highlight two modes), that aims to the exploitation of citation patterns based on Eigenfactor metrics.
The main objective of the interface lies in the mapping and visualization of citation patterns between various scientific journals.Given that academic references incorporate a vast network of citations, the Eigenfactor metric uses the overall structure of a network of scientific publications to evaluate the impact factor of each journal based on the citations number of Thomson Reuters Journal Citation Reports from 1997 to 2005.The aggregation of different networks results from the use of a theoretical method developed by Rosvall & Bergstrom (2008).With regard to visualization techniques used in the interface, we highlight the relational structure and the hierarchical edge bundling algorithm developed by Holten (2006) [Fig.1], and the tiling algorithm (treemap) of Johnson et al. (1991) [Fig.2].Regarding the radial hierarchical clustering algorithm, it is important to highlight that the hierarchical grouping of the edges allows a reduction of the visual clutter (Holten 2006).The treemap visualization technique developed by Johnson & Shneiderman (1991), consists of a hierarchical contention structure, where the size of the rectangles representing the journals varies according to the Eigenfactor score scale.Also the arrow size indicates the amount of citation flow, where the the black indicates the outgoing citation and the white the incoming citations flows (Stefaner 2009).
The Citeology: Visualizing Paper Genealogy developed by Matejka (2012), is an interactive display aimed to the representation of the relationships between scientific papers, based on a sample of 11,699 citations between 3,502 scientific papers published between 1982 and 2010 at two series of conferences by the Association for Computing Machinery Conference on Human Factors in Computing Systems (ACM CHI) and User Interface Software and Technology (UIST).The relational structure represents the genealogy of the selected paper, where the blue branches establishes the connections to the descendant papers and the red branches establishes the connections to the ancestor papers.The lines connecting nearby generations are thicker and opaque, and for distant generations the line is thinner and transparent (Matejka et al. 2012, 181-90).
The Metadata Platform for Architectural Contents in Europe (MACE), closed in 2013, is an interdisciplinary project, aimed at students, teachers and architecture professionals.The platform consists of an interconnected infrastructure of repositories spread throughout Europe.The MACE platform is an access service and efficient search of the stored content learning objects (LO).It should be noted that the content search is based on a collaborative tagging system.For the content enrichment (tagging) different types of metadata are used (Stefaner et al. 2008, 29).The browsing of the tagging vocabulary is supported by an interactive structure of the terms and their relationships, namely a radial hierarchical structure (Lima 2011, 132), which provides an overview of the used classification terms.It shows more than 2,800 tags used by the platform in a variety of languages (Lima 2011, 132).It should be noted that the radial hierarchical structure [Fig.3] is based on the algorithm developed by Yee (2001,(43)(44)(45)(46)(47)(48)(49)(50), highlighting the improvements at the level of the edges based on the Gestalt law of good continuation (Stefaner et al. 2008, 44).The varying sizes of the circles translates the number of resources related to the tag as well as the volume of usage.

DISCUSSION AND FUTURE WORK
Of the three analyzed interfaces, the Well-formed Eigenfactor is based on the visualization of journals citation patterns, this means that within a given filed or subject, it becomes possible, based on the Eigenfactor metric, to observe trends and patterns.In the case of the Citeology, the interface provides a temporal and chronological perspective of the citations network, from one selected scientific paper.At the level of interactivity, we highlight the absence of a zoom feature, an issue reported by the authors as well.
The wide range of results obtained, in the first place, incites the adoption of a search behavior.Taking into account the specific research topic of the user, it forces a brief reading of the selected papers.However, as mentioned in the previous point, the individual reading process of each KO is a time-consuming and a inefficient procedure.
Since the previous cases provide solutions for viewing patterns and trends, specifically interfaces aimed for the visualization of scientific network knowledge structures based on impact factor of a journal, e.g. in the case of the Well Formed Eigenfactor.The MACE interface incorporates simultaneously a content enrichment process based on a collaborative tagging system, and an interactive structure that provides an overview of the used terms.However, it should be highlighted that the issue concerning the credibility of the classification and tagging process is one of the main problems and it has been previously identified in section 2. Yet in the MACE platform, the used terms are subjected to an approval process conducted by specialists (Stefaner 2008 et al., 38).It should also be noted that, according to Quintarelli (2005), the collaborative tagging systems do not provide a solution for a more targeted approach/search.
Despite of the different approaches presented, the techniques and strategies adopted provide fundamental clues to the conceptualization of new ways to interact with DAKR.However, one of the main problems of the DAKR interfaces is an approach exclusively centered in achieving results (more data), not including the user's feedback.In this sense, it becomes clear the need to develop new paths aimed to the visualization of structures that emerges from the relationship between the community and the search for KO, and a scenario that includes the participatory role of the user in the enrichment of the contents.
Regarding to future lines of research, it is necessary to briefly explain a problem that emerges from the relationship of the user with a DAKR.The following example illustrates metaphorically the referenced problem: when we stand before a large amount of KO, and according to our particular subject, we frequently face a vast informational ocean (Wright 2008, 171-5).In this sense, the question that arises from this experience is logically what is the most appropriate or specific KO to a user's search theme, taking into account the user specific interest.The specific problematic enunciated, namely the relationship between the user and the Academic Repository, such as the UPV RiuNet (Institutional Repository of the University Polytechnic of Valencia), is defined by the filtering and visualization of results.
Although they only allow the statisticall view of the number of times that the KO were downloaded or specify a distribution by typology (e.g. by authors, date, keywords, area of knowledge, relevance, tags and comments (e.g.UPV Polibuscador).Even when this data is available, it is not possible to understand the pertinence and relevance of the information for its users, as the experience structure that emerges from the interaction of users with the queried information is not perceivable nor viewable.In this way, the problematic is related with the objects that best suit to the specific research.But, if we think that the KO are accessed by a significant number of users with a specific interest in a subject, and in the course of their research, they handle a significant amount of KO, it is then possible to consider the existence of a structure of evidences, as a result of the relationship between the various users and their specific interests.The proposal to solve the problem stated results from the conceptualization of a collaborative interface directed to the rating of the KO, based on a reputation system, and on the visualization of the structures that result from that action.In this sense, the goal is to interpret, summarize and present dynamically and interactively the emerging hierarchical and relational structure of evidences, resulting from the connections concerning to user interaction with the search of the KO.Therefore instead of the usual "citation object" centred approach like Well-formed Eigenfactor and Citeology, an approach based on the user experience will be established.In this sense, the interface architecture [fig.4] is defined by the relationship established between the community and the rating of the KO, the user's feedback (comments) and the interactive structures to be generated.An important aspect for future work is the study of the weight of the assigned evaluation that will have a direct relationship with the field and academic degree of each user.For instance, a rating from a professor will have more weight in relation to the student evaluation; or when users from different fields evaluate the same paper, the user who is directly related with the specific field of the paper will have more impact.Different scenarios are being equated.
It is a fact that the DAKR solved the issues related to storage, retrieval and information search.However, given the exponential growth of information, a query based exclusively centered on the results, proves not to be efficient for the user who is looking for a specific subject.In this sense, the need to structure an interactive, efficient and functional relationship with a wide range of KO, reveals in the current paradigm of abundance of information a large-scale problem.Thus, there is an urgent need to develop tools that allow users play a social active role.However this is an approach that contradicts the ingrained thinking in the Design discipline, that thinks and describes the user as a simple potential consumer, when in fact it is imperative to think of him as an actor (Thackara 2006, 221).