Category Archives: Text Analysis

Data Visualization: Research and Creative Practice

 

Josh On, They rule: http://www.theyrule.net/

The following are the notes of my presentation, part of a workshop at ADRI on October 3, 2017. They are made available on Remix Data for those who attended the workshop, and for anyone online who may be interested in data visualization as a creative tool. The notes are meant to function as an entry point for anyone who is not familiar with data visualization, but hopefully, even those who are well acquainted will find relevant some of the concepts, and especially the examples discussed. You can download the technical tutorials of SVGs and a basic bar chart developed with D3, which were discussed at the end of the workshop:

Basic_SVGs_To_Share

Basic_Chart_To_Share

This session focuses on how data visualization relates to creative practice. First I will go over the basics of data visualization to then provide a few examples of artists, designers and researchers who have implemented some form of data visualization to produce creative works of art or design.

Three basic media forms to analyze:

Image
Sound
Text

No matter which type of form is analyzed, the data usually will  be organized as textual data that can then be visualized and represented in some way.

The data in turn can also be represented in the three forms

Image (data visualization)
Sound (sonification)
Text (datasheets)

There are many tools available for analysis at this point, for this reason one should become familiar with as many related to one’s interests before deciding to engage in advanced development.

Owen Mundy: https://owenmundy.com/

One thing to keep in mind is that information based production, which is the platform that informs our way of working at the moment, consists of key stages, which include:

  • An early stage in which the development took place by institutions and corporations that had major resources. This time goes back to the days when ENIAC was developed in 1945. In this case it was the military and a Research University (The Moore School at U Penn) that developed a room sized computer.
  • The next stage consisted of the privatization and commercialization of computing, leading to the culture of the 1960s closely linked with Xerox PARC in the Bay area of California.
  • The next stage would lead to principles of open source throughout the 1970s  which now inform our current culture. This stage is what made the Internet possible.
  • The attitude found in open source in effect is the backbone of contemporary collaborative production manifested in the arts and humanities as new media culture throughout the nineties to mid-2000s, which has now evolved in academia into the interdisciplinary field of the Digital Humanities.
  • Our current stage can be considered “Permanently Beta,” (a term I borrow from Gina Neff and David Stark). [1] The term points to the second of two ways of working:
    1) The first is using tools that are available, as one would use established software such as Photoshop, Illustrator, etc.
    2) The other is developing tools for one’s research that are shared with others for possible enhancement based on their own research interests. (It must be noted that the permanently beta principle is encouraged and promoted by corporations in order to get feedback as well as ongoing development based on a crowd sourcing model closely intertwined at this point with social media.)
  • In effect, the scholarship of the digital humanities is largely focused on how technology can be used for the evaluation and better understanding of cultural production.

One thing that I find Digital Humanists should be aware of is of becoming too focused on technological innovation. In effect, one may feel certain pressure, in order to become an established researcher, to develop something that appears technologically innovative as opposed to using tools that may already be available. This could lead to a model of innovation for innovation’s sake, which could be paralleled to moments in the history of art, in terms of “art for art’s sake.”

Brooke Singer: http://www.toxicsites.us/

The key thing, then, is to begin with questions that are of real interest to a researcher, and based on those questions, one should search for tools that will be useful. Eventually, one may develop a need for specialized tools which are not available, and that is when one may become an actual developer.

To drive the point home on this issue, in the spirit of remix, I borrow a quote from Charlie Gere, who in turn borrowed it from Gilles Deleuze: “the machine is always social before it is technical. There is always a social machine which selects or assigns the technical elements used.” [2]

If one has never had exposure to Digital Humanities methods, then one may not find how it can be relevant to one’s research. So, one should understand that the questions that may already be informing one’s research can be reshaped once one is exposed to different tools. With this in mind, I will explore some questions that I usually introduce to students who may not be familiar with text-mining or image-mining.

Key Questions for all analysis (image, sound, text):

  • How does the data help you evaluate your experience/understanding of the work?
  • Does your data analysis expose some elements of the work that you could not perceive by just viewing/reading/listening?

The above questions have embedded the foundational question for any subject: How does “it” work?

To engage with this basic question, one usually will analyze the material (image, sound or text) carefully, then writing down one’s observations, which in theory, others could review and analyze. With data-mining tools, one can actually use quantification to evaluate patterns, to then consider how such patterns may shed light on one’s question, which often times leads to more complex questions on the subject.

This process then consists of four parts:
• Perform research based on a question (image, sound, text)
• Gather data
• Evaluate data
• Represent the data (image, sound, text)

Giorgia Lupi and Stefanie Posavec, Data drawing pen pals, http://www.dear-data.com/

Evidently, this process cannot be covered in one session, so what I will show at this point are a few data visualization examples for image, sound and text, so that we get a sense how this process takes place. Given that this session is about data visualization, I will finish by focusing on the development of basic shapes that can be used to visualize any type of data. This might be basic for people acquainted with web development and programming, but the point is to keep in mind the conceptual process that informs how and why we come to choose certain forms to represent/visualize our research.

Artists and designers have actually used data visualization to produce creative work.  Some examples include (discussed with participants during live session):

Josh On, They rule: http://www.theyrule.net/
Owen Mundy: https://owenmundy.com/
Brooke Singer: http://www.toxicsites.us/

Image Visualization as a video presentation:
Josh Begley, Every NYT front page since 1852
https://vimeo.com/204951759
Cyrus Kiani
The Hawaiian Star
https://vimeo.com/37001373

Sound represented visually:
Manny Tan & Kyle McDonald,
Bird Sounds
https://experiments.withgoogle.com/ai/bird-sounds

Dear Data MoMA Collection:
https://medium.com/@giorgialupi/dear-data-has-been-acquired-by-moma-but-this-isnt-what-we-are-most-excited-about-bdaa3376d9db
Entry explaining the process:
Data drawing pen pals
http://flowingdata.com/2015/03/19/data-drawing-pen-pals/
Actual Project
http://www.dear-data.com/

In the spirit of the above examples, particularly the last one, I will focus on basics, because, as you can see, we don’t have to program to begin visualizing information. We can draw our interpretation of data. It is, however, important to eventually program your visualization so that it can reach a fuller potential. What is worth noting here is that when we visualize data, we are actually interpreting it. Data visualization can be a fair presentation of raw data when we consider how and why we choose colors, line shapes, spacing between objects etc. Such basic graphic objects  affect how the data is perceived. We must be conscious of this aspect of data visualization, and make sure we are being fair to the project when we decide to use specific visual elements to present factual information.

Data Visualization is in large part developed with SVGs. For this reason, I will go over how basic SVG shapes are developed. It is my experience that having a basic hands on understanding of the foundation of tools one will use helps one develop an intimacy that otherwise will never emerge if we only function as users. In other words, knowing what the code behind tools actually does helps in deciding how to use such tools.

You can download the tutorial if you don’t know how to use SVGs by clicking the links below. A basic bar chart is also included which can help in developing an initial sense of how data viz functions.

Basic_SVGs_To_Share

Basic_Chart_To_Share

footnotes:

[1] Gina Neff and David Stark, http://www.umass.edu/digitalcenter/events/2002Workshop/Papers/stark_permanently.beta.pdf, accessed October 2, 2017

[2] Charlie Gere, Digital Culture, “Introduction,” Digital Culture (London: Reaktion Books, 2008), 17.

Preliminary Notes on Theodor Adorno’s Minima Moralia Part 3

MinimaMoP2LongShot

Figure 1: Detail of Minima Moralia Redux Remixes 51 – 55. First set of entries part of the second part of Minima Moralia Redux.

Note: This entry was updated on April 19, 2015 in order to add details on formal aspects of the project.

Minima Moralia Redux, a selective remix of Theodor Adorno’s Minima Moralia, enters a second phase in 2015. This was not foreseen when I began the project back in 2011, because the work is not only a work of art, but also research on data analytics, as well as a critical reflection on networked culture.

The first part of Minima Moralia Redux (entries  one to fifty), consisted of updating Theodor Adorno’s aphorisms–that is to remix them as contemporary reflections of the way global society and culture is engaging with emerging technology. When I finished the first section, I realized that the project’s aesthetics were changing. This was for a few reasons. In terms of research, the first section provided more than enough data for me to data-mine Adorno’s approach to writing; therefore, I came to see no need in following this methodology. I plan to make my findings about this aspect  public in a formal paper in the future.

Truth

Figure 2: “of the truth comes,” part of a sentence of Minima Moralia’s aphorism 54 that, when clicked, opens a Google search with relevant results.

In terms of art as a form of reflection on the times it is produced, it became evident to me that Adorno’s writing needed to be connected directly with the network on which it functions as a remix; and for this reason, I opted for the current format, which consists of the text as it is available in English at Marxists.org, translated by Dennis Redmond.

metaphysics

Figure 3: “Truth,” part of a sentence of Minima Moralia’s aphorism 54 that, when clicked, opens a Google search with relevant results.

The  format for part two of my art and research project consists of linking phrases, parts of phrases, or just  single words, devoid of any punctuation, to corresponding Google searches. (Figures 2 and 3) The reader can click on any link and be taken to respective search results, which will change according to Google’s updates on its search engine. The concept of making every word in a text a link is certainly not new. Heath Bunting had already explored this concept with his early net.art piece called readme (Figure 4), in which he took every single word in a review of his own work published in The Telegraph, and linked it to corresponding  .com URLs, some which did not exist at the time, but Bunting’s proposition was that one day they likely would.

BuntingReadMe

Figure 4: Heath Bunting, “Readme,” (1998). Online artwork that links all words of a Telegraph review of Bunting’s work to  specific .com URLs.

The second phase of Minima Moralia Redux functions more or less in the same way as Bunting’s work, only in this case, the words or phrases are linked to search results as opposed to specific URLs. In this way, Adorno’s second section of his book becomes interconnected with an apparatus (networked communication) that he likely would have been quite critical of. The second part of Minima Moralia Redux, consequently, is an intertextual mashup of Bunting’s early net.art piece and Adorno’s writing.

The above is made possible based on complementary technical and formal implementation of data-visualization. To develop visualizations in support of the new format of links to Google searches, I decided to keep using word-cloud visualizations, at this point Wordle (I used Many Eyes in the past), which provide an emphasis on certain terms for each entry. The number of words vary, and may range from three  to as many as sixteen. These words are then used to perform an image search on Google. I then take a screenshot of the top results and place it at the top of the entries. I also input the text for each entry on Voyant in order to see the overall word frequency and their contextual relation (I don’t include my qualitiative analysis of the entries at this point, but it is important for future in-depth evaluations of the project). The text-mining results, along with the word-cloud visualization are placed at the bottom of the page to share the sources used in the development of each blog entry.

Beyond these formal and technical aspects of the project, there is another layer in the second phase of Minima Moralia Redux. It also borrows from Traceblog a previous blog project which I finished in 2013. In Traceblog I made public, ghost logs of my searches online. The logs were fake results produced with a free plug-in for Firefox called Track me not. In Traceblog I wrote nothing, but rather made available material that was produced based on the “disguising” of my online activity. The second part of Minima Moralia Redux is similar in that I don’t produce anything, but rather repurpose pre-existing content, in this case Adorno’s aphorisms, which, when put through a search process, give way to showing how search is connected to content online. This type of writing is more a form of curating the sentences,  phrases, and words to provide searches that do not lead directly to Adorno’s book available on Marxists.org, but to the phrases that form Adorno’s argument as they may appear on thousands of websites at divergent times.

This led me to realize an important aspect of sampling in general that I plan to develop further in theoretical essays, but anyone who has followed my argument thus far will find evident in this brief reflection, which is why I think I must elaborate on it at this point.

What the linking of phrases or words for each entry in the second part of Minima Moralia Redux makes evident is how we come to develop what we may consider original content. I realized that if I put an entire sentence into a google search I would have Adorno’s publication at the top of the search, which meant that Google was able to recognize the string of terms as a direct “sample” from Adorno’s writing. I wanted to get diverse results that did not lead to Adorno directly, so I decided to adjust the search to a string of words that would provide a result that would not bring Adorno’s text within the first page of results.  This at times is not possible, but what becomes evident in this process is that  we develop our own work from things that are pre-existing. A single word search is likely to provide the most diverse result on Google, but the more specific the string of words, the more likely one is to reach a specific “sample” that may be deemed the original work of a particular person.

In effect, this second phase of Minima Moralia Redux exposes that what we tend to recognize as someone’s creation in any media is really a specific combination of elements that are mashed together by  producers according to what they want to communicate, or express. In effect, even when we speak, we are borrowing from a set of samples (words) archived in a database we carry, called our “memory.” Such combinations are seen as “property” when they are placed in a format that is more static–a product. With the speed of network communication, however, the static state of things is coming to an end, and the ever-changing state of  forms produced (viral memes are an early example of this) will become valued more than a single instance of production.

Nothing is original, just unique.

Preliminary Notes on Analysis of Theodor Adorno’s Minima Moralia Part 2

OverallView

Figure 1: Overall view of a force layout visualization  of Theodor Adorno’s Aphorism one, Minima Moralia, developed using D3, Eduardo Navas, February 2015.

Note: For the first post on Adorno’s text, see “Preliminary Notes on Analysis of Theodor Adorno’s Minima Moralia” part 1.

During the month of September of 2014, I began evaluating different types of visualization methods which could be used to examine the quantifiable relationship of image and text  in closer relation to semantics. This can be a challenge for researchers who are using quantification to analyze visual and literary works, which in the past were solely evaluated with comparative methods relying on close readings of the subjects of interest.

My decision to develop D3 force layout visualizations of selected aphorisms of Theodor Adorno’s Minima Moralia came about in part due to extensive discussions on research methods I have been having since the beginning of the 2013 academic year with Graeme Sullivan, Director of the School of Visual Arts (SoVA) at Penn State. At one point Professor Sullivan wondered if a tool could be developed or repurposed that would show the relationship of different elements in clusters. He was interested in using such tool for studies in Art Education and related fields of research.  I thought that, in my case, I could use it for visualizing different projects of my own.

In effect, I evaluated D3 and realized that its force layout features could be effective in developing visualizations for the multiple purposes I had been considering for over a year. In my case, this meant an in-depth understanding of Theodor Adorno’s writing for my project Minima Moralia Redux.

In what follows I explain how I will be implementing the D3 visualization template that I developed in order to write a critical analysis of Theodor Adorno’s Minima Moralia as part of my project Minima Moralia Redux. The visualization I share in this entry is the first of six that I will be developing in the next few months.

 

The_Detail

Figure 2: Visualization detail highlighting the number of recurrences of the article “the” throughout the text.

Prior to the force layouts, I had been using Many Eyes to visualize the importance of words in the aphorisms. [figure 5] These visualizations were not meant to provide a quantifiable presentation of the number of words, but rather  offer at a glance an idea of how the aphorisms tended to have certain words repeat more than others, and consider this in relation to one’s reading of the actual text. It must also be kept in mind that Minima Moralia Redux is an actual artwork, so the quantification of information is used in part for aesthetic exploration. At the same time, I aim to develop a precise understanding of Adorno’s writing approach. One could argue that there are some limitations to my project given that I am only analyzing the English translations of Adorno’s book, not the German version.  Nevertheless, doing the analysis of the English version available online in comparison to the official Verso English publication, in my view, is a valid pursuit, given that most people will be exposed to these versions of Adorno’s writing, due to the prevalence of English around the world. For this visualization, I used the online version of aphorism one.

The force layout is able to expose certain elements that a word cloud is not able to make evident. For one, it shows the recurrences of each word with its actual number. This was deliberate on my part. I could have shown only the word with a circle corresponding in size to the word’s recurrence. This means that , if I had not included the actual number of recurrences, in a way, the force layout would function similarly to the word cloud–by giving a somewhat abstract idea of the relation of words. But I decided to include the actual number of recurrences because I want the size of each circle to be equated with a specific number that can then be compared with other circles and numbers in the visualization. The connections among the words, in effect, are developed in relation to the recurrence of each word, which is interlinked to the words that come before and after. The result is a visualization that provides a sense of the actual relation of vocabulary to repetition, and how this might play a role in the argument of the text itself.  what follows explains how this takes place.

of_Detail

Figure 3: Visualization detail highlighting the number of recurrences of  the preposition “of” throughout the text.

Word clouds show that the most common words will always be articles, prepositions, and common verbs. These are the binders in a sentence. Without them, we would not be able to communicate as we currently do. Many word clouds, such as the one available on Many Eyes and Wordle, omit common words. The results are visualizations that give a vague but decent sense of what the content may be about. What is not evident is how the vocabulary at play is enacted to make the argument effective.

With my implementation of a D3 force layout visualization, I aim to examine a few things about Adorno’s approach to critical writing. For one, the particular entry being examined here has three words that are mostly used, which are usually among the most common in all texts: “The” (24), “of” (19), and “is” (14). (See figures 2-4.) The number of words repeated after these diminishes drastically. Here is the list of the top ten words that recur more than once, which I extracted using Voyant:

the: 24
of: 19
is: 14
a: 8
and: 8
to: 7
he: 6
that: 6
who:  6
not:

For a complete list, access the text on Voyant.

The total number is 368 words and 212 unique words. It must be noted that not all the words in the text were visualized, only the words that repeated at least twice. This was done for the sake of simplicity. If all words had been included, the visualization would have been completely interconnected except for two words, because all the words would be linked to the terms that come before and after, thus creating a hard to read visualization.

What we have instead is a force layout that shows how the most repeated words bring together other words to develop a text that is dense in content but minimal in actual repetition. The first hint of this is the number of unique words, which is 212 out of 368–that’s roughly two thirds of the text is composed of unrepeated terms.

is_Detail

Figure 4: Visualization detail highlighting the number of recurrences of  the verb “is” throughout the text.

Based on my assessment of this visualization in relation to the repeated and unique words, it appears that a text that makes use of a large vocabulary will show a few  words that are used only when necessary to connect unique words. Meaning, an essay that is  redundant will result with a greater number of large circles, differing from the visualization of aphorism one, as shown in this entry. The implication of this is that Adorno’s aphorism was carefully constructed, given that it would take great effort not to repeat words while trying to develop a complex argument. Trying not to repeat words for the sake of not repeating them would be an empty exercise, so it must be kept in mind that the high number of unique words comes at a rigorous process of selecting from a large vocabulary in order to produce the best argument possible.

This, however, does not automatically mean that a text with less unique words is not as effective, or unable to convey meaning. What can be assessed from this brief analysis is that Adorno took the time to think about the composition of his text, and that he did actually practice what he promoted in his critical writings. This could be perceived when a person reads the text, of course, but the numbers and visualization make this evident with concrete data. The entire premise of Minima Moralia, in fact, is to be critical of mindless recursion. Adorno believed that one should be wary of repetition as a regressive mode leading to passive thinking. As the force layout visualization shows, Adorno’s approach to life was implemented into the formal development of his work.

MinimaMoralia_1_2

Figure  5:  Word cloud of “For Marcel Proust” as visualized for Minima Moralia Redux, Aphorism 1 Remix.

There are more details that could be discussed about the relation of words to the actual content of the text, such as the numbers of the words to the actual message being conveyed. In this case, Adorno is reflecting on the importance of Proust, and Proust’s name is only mentioned once, as part of the title of the aphorism but never within the actual text. One would never be able to assess Adorno’s views on Proust by analyzing the visualization alone. This means that one must always be engaged with the actual text, and consider any type of visualization supplementary to the experience of the content. The quantification of data can be helpful only in verifying what one may perceive is at play within the aesthetic experience of a work of art. This, I find, appears to be a constant challenge to individuals who quantify data to evaluate aesthetics, which may appear like a paradox.

 

One_Detail

Figure 6: Visualization detail highlighting the number of recurrences of  the word “one” throughout the text.

Others things that  I will be evaluating once I visualize the remaining six aphorisms I have selected is how word usage may fluctuate from entry to entry over the development of the actual book and how this may relate to the number of unique and repeated words in relation to the overall focus of the work.

The potential of a D3 force visualization in terms of general use is to understand how the size of words or images (this type of visualization can be used for image data as well) can become an effective mapping tool, once a standardized relation of the nodes created are understood with a specific meaning consistently. In this case, this means that a few large number of circles versus a large number of small circles implies diversity in vocabulary. For this to happen, however, parts of the process need to be automated, so that the input of information is much faster.

PlayDetail

Figure 7: Visualization detail highlighting the number of recurrences of  the word “play” throughout the text.

@Poemita Selected Poems in D3 Force Layout

Subaltern

Figure 1: Eduardo Navas,  #Subaltern, 2010 tweet rewritten as a poem, February 2015

Selected Poems in D3 Force Layout
2010:
#Nature || #Opinions || #Fatty || #Subaltern || #Migrants
2012:
#Modules || #Nano_specs || #Standards
2013:
#Abundance || #Plastic || #Universals || #Predominance

__________

During the month of January and February of 2015, I began to consider how to reconfigure selected tweets of  my @poemita twitter account as poems. The first outcome of this process was three sets of image-layouts of selected poems from the years 2010-2013 which I called “Poem Portraits.” They are available on the main @Poemita project page:

Poem Portraits 2010
Poem Portraits 2012
Poem Portraits 2013

Simultaneously, I had been working with D3 to develop a force layout for visualizations of selected entries from my project Minima Moralia Redux (This set of visualizations will be discussed in a separate entry). Such layout is designed mainly to show the relevance of words within each  of Adorno’s aphoristic essays.  At one point in this process, it occurred to me that I could use D3 force layouts not only for research based visualizations, but as an actual medium to rewrite poems. Hence, I repurposed D3 features to develop a set of poems as shown in figures 1 – 4.

Standards

Figure 2: Eduardo Navas, #Standards, 2012 tweet rewritten as a poem, February 2015

Some of the poems I selected also are part of the Poem Portraits, but the interactivity that D3 offers led to a very different feel for each piece of writing. The basic premise behind the force layout poems is that each line of text should be connected to the line below it  and above it, so that  each poem could be read similarly to a print version. One can get a sense of this when comparing the D3 layouts to the Poem Portraits.

Abundance

Figure 3: Eduardo Navas, #Abundance, 2013 tweet rewritten as a poem, February 2015

D3’s force feature allows for the words to push away from each other, and it is the lines that keep the words linked; otherwise they would float away randomly. These simple features offer unexpected results, which make the reading of the poems potentially different every time they are accessed. There is also a sense of discovery as the reader needs to figure out how the words are connected. Some poems are move ambiguous than others, and one could read the lines in different order.

 

Predominance

Figure 4: Eduardo Navas, #Predominance, two 2013 tweets rewritten as a poem, February 2015

I found that poems with longer lines took on a layout similar to their print counterparts, while poems that had more unconventional formats turned out more ambiguous. #Subaltern (figure 1) and #Abundance (figure 3) for instance, read very much like their corresponding print versions. Nevertheless,  when each poem is launched, the first line could begin at the top or bottom of the page; also the layout could be reconfigured by dragging the circle with the mouse to any area. This offers some play with the layout of the words. And as it is well known, the layout of words is always of out most importance to poets.

#Standards (figure 2 ) and #Predominace (figure 4) offer more ambiguous layouts. Each poem may take longer to read given that the words are connected in ways that make sense to the corresponding print version, but due to D3’s random feature at initial launch in combination with its force feature, the reading of the lines turns out to be open-ended.

At this moment I have a set of tweets from the year 2014 that I am working on, which likely will be influenced by my awareness of the creative potential behind a force layout and a more conventional print layout. I’m not sure how the next set will turn out, but the experimentation with the years 2010 -13 is definitely informing the ongoing rewrites of tweets.

 

 

 

 

Analysis of the Films In Cold Blood, Capote, and their Corresponding Novel and Biography

InColdBloodCapote

Figure 1: selected shots from Capote (left) and In Cold Blood (right).

Interdisciplinary Digital Media Studio is a class in the IDS program in The School of Visual Arts (SoVA) at Penn State in which students are introduced to methodologies and conceptual approaches of media design. For the class, I taught them how to research and develop design presentations with the implementation of data analytics for moving images and texts.

One of the assignments consisted in analyzing the films Capote (2005) directed by Bennett Miller and In Cold Blood (1967) directed by Richard Brooks in relation to their corresponding books, Capote by Gerald Clarke and In Cold Blood by Truman Capote. We viewed the films in class, and read, both, the novel and the biography. The class then analyzed the respective books by doing word searches, analysis of specific passages, and creative approaches by the respective authors, to then evaluate those searches in relation to the films.  For the films I provided montage visualizations, which are selected screen shots representative of all the scenes (figures 2 and 3).

InColdBloodSmall

Figure 2: Visualization of the film In Cold Blood (1967). Click for a larger image.

The students were free to use the film visualizations and the data-mining of the texts according to their interests. Some of the options I suggested, but which were certainly not the only possibilities, included: times names or any other terms were mentioned, difference in the way particular moments appeared in the films and the books, and which parts were  omitted.

CapoteMontSmall

Figure 3: Visualization of the film Capote (2005). Click for a larger image.

The students had to design an infographic, or a similar form of visual design that presented their findings in an easy to understand format. The students had to also write an assessment of their findings. What follows are selected projects which I consider successful for elements that I explain for each case. All my students worked very hard on their visualizations, and ideally, I would like to show all of their projects,  but for the purpose of this post, I selected designs that show  unexpected or experimental approaches in the implementation of analytics to find possible answers to diverse questions on the same subject. The samples that follow are successful in some ways, while having shortcomings in others.

M_Regan_Capote

Figure 4: Visualization of common elements between Capote, the book and its respective film adaptation, by Michael Regan. Click image for larger file.

Michael Regan looked at commonalities and differences in Capote the book and film. In this regard he writes:

The focus and content of my infographic is in the arrangement of common elements between a book and its film adaptation. This is a very relevant way of viewing the comparison of two media representations of the same content. It is very interesting to view how differently the film creators had to move content to best adapt the form. The In Cold Blood film rearranges everything from the book into a different order to better fit a film format, while the Capote film takes most of its content entirely from one section of the book. These are two ways of adapting a book into a film, and looking at the infographics allows a very quickly visual way of understanding these techniques. It also shows the way that the books are constructed in the first place. The In Cold Blood book shifts back and forth often among the murderers, the victims and townspeople of Holcomb, and the investigation. This helps to explain the switching back and forth of elements and their ordering between the film and the book. This is contrasted by the writing style of the Capote biography. The small relative amount of time the book spends on the In Cold Blood writing process, as shown by the infographic, shows the focus of Gerald Clarke, the author. He chooses to explore the entirety of Capote’s life, without giving an undue amount of attention or priority to the In Cold Blood part of his life, however sensational a time it may have been.

Regan’s visualization represents the emphasis of the film on particular aspects of the book, and where they may or may not overlap. One can get a decent sense of the adaptation process, and to some degree, assess how such process is enacted in the particular production.   His visualization is indexical, meaning that it allows the user to go to specific areas that Regan evaluates are important to the process of adaption from book to film.

AndrewHeoAdjusted

Figure 5: Visualization of color mentioned in the two films and books analyzed in class by Drew Heo. Click image for larger file.

Drew Heo decided to focus on the role color played in the films and the books. In his assessment paper he writes:

Color is a visual component, and Truman’s novel and biography are black and white on paper. Due to the two differing mediums, it’s only fair to allow both works to shine in the fields they are best in. The bar graph is a visual indicator of the specific mentions of colors in the texts of both books, as well as the amount of times a color is verbally spoken in the films. As expected, the films fall short when compared to the writings, as color is a visual thing and not used as much in speech. In order to compensate for the lack of mentions, below the graph is a condensed form of the colors present in both films in a “Movie Barcode” format, which has become recently popular on the internet for analyzing films.

To consider how color plays a role in the film is a unique question certainly worth pursuing. The accuracy of the numbers do need to be double checked, but what is worth noting here is his particular approach to evaluate a concept across media that is not easy to quantify. If anything, his focus and approach demonstrate a potential for more in-depth analysis of details shared across works that are adapted and crossover relevant productions that are often compared, but hardly ever analyzed quantitatively.

EthanInfoGramEdited

Figure 6:  Recursion of characters in the biography, novel, and respective film adaptions, by Ethan Jones. Click image for larger file.

Ethan Jones focused on the number of times the names of particular characters appeared in the text, and the number of times each character appeared in the film.  He writes:

There was a decision early on to reduce the commonalities between all four works by omitting what was exclusive to one story or the other. For example, the Truman Capote character is omitted because of his lack of presence in the In Cold Blood story (book and film). On the other side of that, minor characters from In Cold Blood (book and film) were discarded because of their lack of involvement in the Capote biography. Several characters remain constant, such as Perry, Dick, Alvin Dewey, Susan Kidwell, and the Clutter family. Simply looking upon the amount of times a given character was mentioned proved to be sufficient.

With Jones’s barchart we can evaluate how the characters are at play in all four works. One thing to note is that he counts the appearances of a character in the respective films every 5 frames. This, however, is dependent on the selective shots I provided, so the number of times a character may actually appear is not related to every five frames from the actual film but frames based on my selection  of shots representing each scene. Nevertheless, the chart does provide a general idea of the prevalence of the characters in the respective publications and films. This can be cross-examined with the analysis of other students which is not included in this entry.

Nikki_in_cold_blood

Figure 7: Analysis of parts of the book and film where dialogue did not play a major role, by Nikki Tatsumi. Click image for larger file.

Tatsumi focuses on parts that had next to no dialogue in both the film and the book, In Cold Blood, she writes:

For this project I focused on the parts of the movies and books where dialogue did not play a major part in the creation of the scene. This was most relevant in the book In Cold Blood. This book, borderlining a research novel, was beyond descriptive. Its specificity allowed the reader to truly identify and imagine these people to the point of skin color, body language, tattoos and countless other tidbits that Capote decided to include. If not enough, Capote’s rendition of each character’s backstory also helps to fuel the reader’s imagination. It is through these scenes that I think Capote is the most successful. Though dialogue gives a character a voice, Capote was able to utilize his research and observations to create characters who did not need one. By their mannerisms alone, Capote could write this whole story. But when these heavily described sections of the book meld with the movie. Directors used the score to help make up for lost information through the text. In Cold Blood and its movie, had
the best transition.

In the movie version, heavy use of jazz influenced music helped to create the mood. It was a lot more foreboding due to the the choice of score. It was the type of music that was not soothing or collected, it was erratic and soft at times, loud and uneasy at others, as if the viewer did not expect what was next. The scenes that this score was utilized really well were the juxtaposed part of Perry’s past and the beginning scene with Perry at the bus terminal. They utilize the uneasy jazz to mirror Perry’s
feelings at both times.

In Tatsumi’s design we can note that she was selective of which parts to feature, as there were many more moments and passages in both the book and the film that would fit the subject of her research. In this case, we can consider her focus and design similarly to Heo’s, with potential for more in-depth exploration. Her focus also shows how analytics, while at times may be presented to be objective in what the results may be, are actually informed with particular questions and interests that drive the research. This is something that individuals using quantitative tools need to be more honest about, and not claim that what they found is entirely “objective.” Tatsumi’s focus is unique in that she focuses on moments in which dialogue did not play a major role. No other student considered this, and I never would have thought of this focus. In this sense her research points to an aspect of the film that could be very interesting to further analyze.

One thing that I should note is that her design could be much better. However, given that many of the students were dealing with multiple elements of creativity and research I did not push her to redesign her presentation given that she had touched on an interesting aspect of both the film and the book that she could have explore more in-depth.

 
MeganColdBlood_Capote

Figure 8: Diagram of biography, novel and both films by Megan Coren. She focused on similarities of the four works. Click image for larger file.

Megan Koren focused on the similarities of all four works. Similarly to Tatsumi, her diagram shows that she was rather selective. She writes:

The nature of my infographic does not show many of the differences between the books and films; rather, it shows the similarities of In Cold Blood and a follow-up comparison of how Capote visually paralleled this information. Because Capote was a secondary addition, I feel I provided a skewed representation of the biography. I included text from the book that aids the comparisons in my infographic, but Capote is not really about the novel In Cold Blood.  It is about Truman Capote’s life, spanning childhood through adulthood.  Although interviews with the Clutters cover only roughly 50 pages of the monstrous 547-page biography, the film diminishes this biographical time-line and completely focuses on Capote’s writing of the book.  The film concentrates on the obsessive Truman himself, while shot-wise it accurately parallels the In Cold Blood film. We see the similarities between Capote and Perry and why Capote sympathized with him so strongly. Regardless of their different perspectives, I was genuinely impressed at how many connections the directors were able to make between scenes in the Capote and In Cold Blood movies.

This particular presentation is easy to read, and one is able to figure out how the four works  are related according to Koren’s interest.  However, it does have inaccuracies that are evident upon closer observation. for instance, the moment when Perry speaks on the public phone is not at 0 minutes of the film, but actually more or less around 10 minutes into the film. Koren should have spent more time with the films to evaluate their respective chronologies. If she did this her assessment, which is quoted above would be stronger.  Her selection of passages from the books also lack a systematic approach, which makes one wonder why the citations were chosen over any others. In any case, Koren’s design shows potential in exploring formal aspects of presentation of a large amount of material. She needed to spend more time working on the accuracy of the data and information–her design, conceptually, shows potential, although it could be further polished for a more elegant read.

 
Yoder_final_v3-04

Figure 9: Visual analysis by Caleb Yoder of the names “Dick” and “Perry” in book publications Capote and In Cold Blood, and their respective film adaptations. Click image for larger file.

Caleb Yoder decided to be very specific in his analysis and focused just on the names of the two killers.  He writes:

After data-mining information for Dick and Perry, I found that the books are often able to provide a lot more in terms of description for the characters. On page 325 of Capote, Clarke is able to give a thorough description of Perry’s appearance, personality, and childhood. The films need to rely on the actions of the characters or their conversations, and in some cases even flashbacks (like Perry’s multiple flashbacks to his childhood in the film version of In Cold Blood), in order to convey this information that the book lists off so easily. Film must also take some creative liberties with how the story functions visually–the book doesn’t describe every set piece, and so the film inevitably conveys some information independent of the book.

Both mediums make the most of their capabilities, though. The book In Cold Blood familiarizes us with the Clutter family in a way that the film fails to do (most likely due to time constraints). Our greater intimacy with the Clutter family and characters like Bobby Rupp in the book serves to make the Clutters’ deaths and the impact they have on the Holcomb community more salient to us. The film is also able to stir emotional responses through its use of music, composition, and the pacing of shots. A great example of this appears in both films when Perry is hanged at the end. The audio swells, Perry breathes heavily, and the intensity increases until the trapdoor opens abruptly.

Yoder’s is one of the most accurate analysis submitted. His focus on only two terms allowed him to develop a detailed visualization of the films and books. In turn, he is able to provide specific pages in the books and times in the films when particular moments take place, in which both Dick and Perry are prominent. This gives us a strong sense of how the stories of Dick, Perry, and Capote were intertwined in real life as well as in the semi-journalistic fiction created by Capote himself.

The visualizations presented here should be considered starting points for the future of analysis by designers. The students in my class eventually applied the methods and critical skills developed in this particular analysis to develop a research & design project of their own, which was the final assignment of class.

There is much to be said about each of the analyses covered here, as well as others. For instance, we can see how by using analytic tools with quantitative capabilities we are able to dissect the grammar of visual and textual languages to better understand their intersections. All of the projects in class, including those not featured here did show a decent balance in quantitative and qualitative analysis.  The challenge for a few of the students as is evident in some of the examples part of this entry is to push further for accuracy in their results. This is a skill students keep working on as they move on with their studies.

The Beginnings of Remix Data: Poemita, an Experimental Online Writing Project

Figure 1: The five most repeated words from 2010-2013.

Poemita began in 2010. It means little poem in Spanish. The basic premise was to experiment with tweets as new forms of writing. I eventually decided to use it as a resource (think of it as data mulch) for various projects. Some of the tweets  are being repurposed as short narratives, which I have not released. Poemita was actually preceded by writing I developed for my video [Re]Cuts, a project influenced by William Burroughs’s cut-up method. I am in the process of producing a second video that uses actual tweets from Poemita.

I worked on Poemita on and off, sometimes not posting for months at a time. In fact, I don’t have a single post for the year 2011.  But during the month of August 2014, I realized that Poemita has been a project that is closely related to my ongoing remix of Theodor Adorno’s work in Minima Moralia Redux. It could be thought of as a negative version of that project (I am using the term “negative” here in dialectical terms). To allude to this relation, I inverted the color scheme for the word cloud visualizations of Poemita to be the opposite of Minima Moralia Redux’s. Poemita takes the concept of the aphorism as Adorno practiced it and tries to make the most of each tweet. Most of the postings are well under 140 characters, and they all try to reflect critically on different aspects of life and culture.  I try to do this creatively, and write content that may appear difficult to understand, but ultimately may not even make sense; the aim is to create the possibility for the reader  to see things that would not be possible otherwise. In short it is an experiment in creative writing, and this is why the project was titled Poemita.

I may not be able to post consistently, but I will certainly be posting tweets more regularly then before.  And I will eventually be repurposing the tweets in different ways to explore how context and presentation along with selectivity are ultimately  major elements  in the creative act. This will become clear as I release the tweets in different formats in the future. This, in essence, is a way of remixing data.

To reflect on where this project is going, I decided to analyze it as I would other texts to understand how it is constructed, and to evaluate the type of patterns that may be at play in my online writing. What follows, then, is a set of studies of  the tweets for the years 2010, 2012 and 2013. I will be releasing analysis of 2014 later, after the year is over.

First, it is worth looking at word clouds for the three years:

Poemita_2010Figure 2: Word cloud of tweets for 2010

 

Poemita_2012

Figure 3: Word cloud for tweets of 2012

 

Poemita_2013

Figure 4: Word cloud for tweets of 2013

Poemita_2010-13

Figure 5: Word cloud of tweets from 2010-2013.

We can note the top four or five words for each cloud for the respective years of 2010, 2012, and 2013 and consider how they eventually become part of the larger cloud for all of the years of 2010-2013. The number of occurrences could be accounted for yearly, but for the current purpose of this analysis, it should be sufficient to evaluate the number of words in the largest cloud for all three years (figure 5).

In the cloud above (figure 5), then,  there are a total of a 1,712 words and 863 unique words. The most used words besides articles and prepositions appear much larger. These words appear the following number of times in the actual body of the text:

Time: 12
Thought: 11
Sound: 7
Space: 5
Thoughts: 5

The word trend chart at the top of this page (figure 1) shows how these words relate to each other in terms of writing sequence. If you were to choose a particular node, you would be taken to the actual text and shown how the word appears in its context. The tool I used to this word analysis is Voyant. Seeing the words in a diagram provides a visual idea of how they relate to each other within the actual writing.

This gives a sense of repetition, and may even allude to certain interests in terms of content and ideas within the corpus of the text, but it does not provide a clear sense of how the words actually function, or under what context they recur. For this, the way the words are used in actual sentences can be mapped. In the following word trees, the top five words (in order of times repeated), Time, Thought, Sound, Space, and Thoughts are linked to all the phrases that follow them:

PoemitaTime10_13

Figure 6: The word “time” linked to the phrases that come after it. Click on the image to view a larger file.

PoemitaThought_10_13

Figure 7: The word “thought” linked to the phrases that come after it. Click on the image to view a larger file.

PoemitaSound10_13

Figure 8: The word “sound” linked to the phrases that come after it. Click on the image to view a larger file.

PoemitaSpace_10_13

Figure 8: The word “space” linked to the phrases that come after it. Click on the image to view a larger file.

 

PoemitaThoughts10_13

Figure 9: The word “thoughts” linked to the phrases that come after it. Click on the image to view a larger file.

The word trees above show how each of the words are implemented to create particular statements. At this point, it is possible to make certain assessments.  Let’s take the word “thoughts” (figure 9).  We can see that three out of five times it comes at the end of the sentences. We can also note that the exception to this is a reflective statement: “thoughts of grandeur.” Let’s take a look at the word “thought” (figure 7) and we can notice that it is part of a much more complex set of phrases. Two times, the word is part of the branching recurrences “Improvisation fills one with…” and “the very thought of…” But notice that in the last one thought is also followed by a period.

Finally, we can consider the words that come before these words. Let’s take the word “thought” for a brief example. For this we can use voyant:


At this point we can get a full sense of how the word recurs and how it functions each time it appears. This approach puts me in the position to evaluate what similarities and differences their implementation may share in order to evaluate particular tendencies I may have in my writing.

We could go on and examine the other top words in the same way, but this is enough to make my point.  It becomes evident that how the word “thought” and its plural “thoughts” are used has much variation in the creative approach in terms of twitting. At least, I, as the actual writer, become aware of the way that I tend to relate to the singular and plural instantiation. This in the end is a reflective exercise that enables me to be critically engaged in understanding my own tendencies as a writer. I plan to use this analytic approach to further the possibilities of writing tweets that can offer a lot more content just under 140 characters.

One of the issues that I assess in all this is the role of repetition.  One may think that repetitive occurrences are bad for creativity, but in practice, it is through repetition that we come to improve our craft and technique in any medium. In terms of how words are used or repeated, with analytical exercises like this one, a writer can come to understand how certain words recur and under what context, to then decide if to implement them differently or omit them altogether in future writing.

I certainly was not thinking that I would use these words the most when I began writing in 2010. They appear to recur and I’m not sure why, but the point is that now I can use this awareness to improve my own creative process.

This analysis can get very detailed, obviously, but this should be enough at this point. This is just a brief sample of how I am data-mining my own writing to also develop other projects  by remixing the content. I will also be mining twitter postings to evaluate how what I learn in this focused project may or may not appear to be at play in the way online communities communicate.

The Beginnings of Remix Data as Research: Preliminary Notes on Analysis of Theodor Adorno’s Minima Moralia

The following post was originally released on Remix Theory on August 14, 2013. It is reposted to give an idea of the research process that led to Remix Data as an online resource.

Detail of Minima Moralia 21 and 22 and their respective remixes

Image 1: Word cloud visualizations of Theodor Adorno’s Minima Morlia, aphorisms 21 and 22 on the left and their corresponding remixes on the right. (Click image for detail)

My first post for Minima Moralia Redux is dated October 16 2011, but I had done much research prior to this date. I had been reading extensively on Theodor Adorno and his work, while also creating visualizations of YouTube viral memes for my post-doc at The Department of Information Science and Media Studies at the University of Bergen in affiliation with The Software Studies Lab in San Diego, now also based in NYC.  As I analyzed meme patterns, it became evident that much of the material that is discussed in terms of remixing in music and video, which is also quite popular across media culture, usually relies on acts of selectivity–meaning that with the ubiquity of cut/copy & paste, people tend to re-contextualize pre-existing material, much how DJs and producers used sampling to remix in dance music culture during the eighties. [1]

Image 2: Word cloud visualization of the first thirty aphorisms in Theodor Adorno’s Minima Moralia. (Click image to view large file)

Minima Moralia Redux is a type of mashup, itself, of art, writing as a literary act, and media research that explores how data visualization is providing new possibilities for understanding creative processes. The project explores the selective remix, which arguably is quite popular across culture since cut/copy and paste became a common act due to daily use of computers. Certainly this is the type of remixing that most people debate over in remix culture. The selective remix consists of evaluating the source material and deciding what to leave and what to omit, as well as what to add, all while making sure that the source material remains recognizable.[2]  This means that large parts are kept as originally produced while others may be radically different. A tension in authorship develops, as the remixer clearly shows creativity quite similar to an “author’s.” At the same time, the remixed work relies heavily on the cultural recognition of the author and his/her work.  Much has been written about such tensions, but it is my hope that the research I am now introducing here in preliminary fashion will be a contribution to understanding how we come to create works that appear to be autonomous and credited to a single person, and how we can move past such conventions to more productive approaches that do justice to the way culture is communicating at an ever increasing pace.

 

Image 3: Word cloud visualization of the remixes of the first thirty aphorisms in Theodor Adorno’s Minima Moralia (Click image for large file)

Minima Moralia Redux has various layers of significance. First, I wanted to explore, as I already explained, how the selective remix functions. I decided to do this by embedding myself in the process, as opposed to studying another person’s remix. In this project, I examine each entry carefully, do research on it, and eventually re-write it to make it relevant to issues that are taking place in contemporary times. While doing this, I keep in mind that it is the voice of Adorno that is at play here. This means that I need to make sure that Adorno’s theories remain his.  In other words, it is not necessarily my opinion that is expressed in the remixes, although I do take creative license and adjust– even critique Adorno’s views within his own writing. This is no different than a music remixer who often times will create a different piece of music, one which nevertheless, is not credited to him/her as author/artist, but only as a person who remixed the author’s work. In the case of music this is done in the commercial sector for increasing sales, but in remix culture, it is done because people may simply love doing it, and/or are fans of the artist/author.  Taking this approach with Adorno’s work, I argue, is only fair given that Adorno himself believed in revising one’s view on life and the world. In the 1960s, he admitted that some of his critical analysis in Dialectic of Enlightenment, which he co-wrote with Horkheimer, no longer stood their ground in 1969. He considers the book “a piece of documentation.” By this Adorno and Horkheimer let the book be part of history. [3]  Based on this critical position on his part, it is very unlikely, for instance, that in 2013, he would use the word “savage” as he did when he wrote aphorism 32.[4]  The result of this approach in Minima Moralia Redux is a new text that is clearly still in large part Adorno’s, but which I hope resonates with the language and issues of the twenty first century.

I rewrite each aphorism  one sentence at a time, evaluating it word for word. I study the history of particular words, and evaluate the sentence’s relevance during the times when the book was written. I then consider how it may be understood and at play in contemporary times. When I rewrite the aphorisms I am conscious of the way remixing functions in music and video, and apply it to writing to see what the results may be. At the same time, I become immersed in the creative process based on intuition as I am also interested in exploring aesthetics.  I use two translations for the rewriting of each entry. The first is by Dennis Redmond, available on Marxists.org, and the other is the official English publication of Minima Moralia translated by E. F. N. Jephcott for Verso Press. I combine parts from both sources, while adjusting sentence structure, and I add and delete material to come up with a statement that is relevant to contemporary times.

For the word cloud visualizations I use Many Eyes, an online resource developed by Martin Wattenberg for IBM. The clouds are useful to evaluate how often words are repeated in the original entries. The visualization of the original text appears at the top of each blog entry. The main section of each post consists of the remixed text with a link to the original source available on Marxists.org. At the bottom is a thumb image of the same visualization along with a second visualization of the actual remix. These thumb images are presented with each post to provide a quick understanding of how key terms are reused and others omitted, while others are added in accordance to the principles of selective remixing. The reader can click on each thumb image to view a detailed version and compare them. I provide two visualizations of aphorisms  at the top of this entry (image 1).

The visualizations expose the constant usage of particular words, and when comparing the original entries to the remixed versions, it becomes evident how selectivity is at play. For instance, one can notice in aphorisms 21 and 22 that some of the words that are more pronounced in the original entry are still repeated often in the remixed versions, while others disappear and others are added (larger words means more repetition, smaller, less frequent). This is similar to how remixing functions in music as well.  I am also evaluating sentence structure and actual number of word repetition for each visualization. I will be releasing a concrete analysis of all this in the future in connection to viral memes, as well as a set of YouTube video mashups. The latter research I have not made available online at all, but two of the videos part of this research can be found on page 106 in my book Remix Theory.  My research of the selective remix as found in the thirty entries that I share on this post is part of my examination of selectivity in other forms of online media production. The idea to look at how remixing functions in text developed out my research in analyzing video. My findings so far have been that there are patterns that certainly crossover among image, music and text, which enables the viewer or reader to sense how remixing is at play in particular pieces.

So far I have remixed thirty-five aphorisms, and provide visualizations of thirty of them as part of this post. Image 2 offers an overall sense of the originals, and image 3 a comparative sensibility of how they were changed after they were remixed.  The process behind the production of each remixed entry takes quite some time to perform, so it will be a while before I can release my final version of this project. This brief entry should at least provide some details on the process that makes Minima Moralia Redux possible.

Below I provide a two column comparative visualization of the first thirty aphorisms(image 4). On the left are the original entries, and on the right appear the remixes. Examining one next to the other provides an idea of how different patterns are at play within and across the originals and the remixes, while looking at them as a large group gives a sense of the aesthetics of writing as a creative act–something that certainly cannot be fully measured, but one could hope can be appreciated.

Image 4: A two column comparison of the first thirty aphorisms of Theodor Adorno’s Minima Moralia and their remixed versions. Comparing each aphorism with its corresponding remix shows the process of selectivity that takes place in remixing text, which is deliberately performed, in this case, along the line of music remixing.

 

[1] I go over much of this in my book: Remix Theory: The Aesthetics of Sampling.

[2] If  too much material is omitted, then the remix may start to lean towards other types of remixes which will not be discussed in this instance. See chapter three inRemix Theory.

[3]Mark Horkheimer & Theodor Adorno, Dialectic of Enlightenment, trans. Edmund Jephcott (Stanford: Stanford University Press, 1987), xi -xii.

[4] See my remix, which is an extensive critique of Adorno’s conflicted bourgeois position, by using his own words:http://minimamoraliaredux.blogspot.com/2013/06/minima-moralia-32.html