Thursday, October 23, 2014

Digital_Humanities, Weekly Update 10/20: Positivity Among Failure

Sinking my teeth into Mallet and Overview was relatively painless. There still were some issues that arose, mostly with directories, but overall, the process of getting to know these tools was easy. I think, after experiencing productive failure so often for the past few months, I have become used to the feeling. Instead of rage quitting when something doesn't work, I backtrack and reconsider what step I skipped or what part of my code threw the computer off. I get frustrated, to be sure, but it isn't as paralyzing as before.

In my first experiment with Mallet, the Command Line did not want to run the program or even list what items were in the directory. It was in my C:\\ drive as instructed, but I was missing the Java installation required to run the program. Similar failures occurred throughout the process, from trying to convert txt files into a readable mallet file as recommended in the Programming Historian tutorial (great tutorial, by the way. I highly recommend it. Check it out here.) to simply listing what was in a specific directory or txt file. But, with patience and a lot of back and forth between the tutorial, the GUI of the mallet directory, and my command prompt, I was able to extract the information I wanted and get some pretty interesting results.


The documents I analyzed were just for the purpose of the exercise and won't help me in my research, but I'm starting to see how text mining might be useful in the future. I guarantee that whatever files I deal with will have to be cleaned to make them suitable for these tools, but it could reveal some interesting connections that I would not have normally considered. If I do come across a large corpus, these tools will enable greater analysis and will also help my own process in understanding exactly how these documents are connected and how they might be useful. 

The content of the documents can also be made clearer by such tools as Overview. I did find this tool easier and more appealing than Mallet, in part due to my own avoidance of the Command Prompt (it's a bit hard for me to read the text in this format, even if I open it in a Notepad or Word doc). But, I could see its limitations. The tool is very specific in what it can do with text files and with text modeling. It shows the same strings of words that Mallet does, but a portion of the original connections is lost; Overview breaks down the texts in a greater degree and makes distinct divisions between different documents, which is useful in understanding what kind of files you might be dealing with, but may not reveal as clear distinctions of how they connect or how certain topics are addressed in the material as a whole. This view of Overview might just be to my own novice status with the program, but that was my initial impression none the less. 


I still plan to use this tool in the future, but in conjunction with Mallet to see what different information each tool can extract. The user-friendliness helps for this tool as well and can be a great way to initially approach digital tools and text mining, so long as the other programs are not ignored in the long run.

My own preferences and views of these programs aside, I'm starting to feel how much my toolkit has grown over this past semester and how my general understanding and approach to these once foreign tools has developed. It's a good feeling. I still have a long way to go (there are many more failures to make and moments of frustration to experience), but it's a start.

No comments:

Post a Comment