Tag Archives: EDX

EDX (post_tag auto created by Wordpresser)

How to download courses from Coursera, in 2021

To download COURSERA.ORG courses one subscribes to, either one writes its own bot, which will have to solve the authentication challenge and be able to crawl, identify and fetch all the relevant course files, or one learns to use the “COURSERA-DL” free and open source project (FOSS), mostly written in the language Python, available from:
https://github.com/coursera-dl/coursera-dl/

The first option is great for learning the correspondent skills, but it is hard work.

The second option is immediately available and is much more sensible for instantaneous results, mainly for those who are only focused in getting the course materials, for offline studying.

This post is about installing and using COURSERA-DL. The post assumes “Python” is properly installed. The commands shown were tested on a Python installation on Windows 10.

To install or update COURSERA-DL, the following sequence of commands will work. Enter the commands from any command-line console (CMD.EXE on Windows). Even if COURSERA-DL is already installed, it will remain so, keeping its configuration, and it will only be updated. The commands go a bit beyond COURSERA-DL, because I also care about EDX courses.
One project similar to COURSERA-DL is EDX-DL, for courses at EDX.ORG. Both learning sites have materials on YOUTUBE.COM, so yet another related FOSS is YOUTUBE-DL.

python -m pip install --upgrade pip
pip install --upgrade coursera-dl
pip install --upgrade edx-dl
pip install --upgrade youtube-dl

Once these FOSS solutions are made available on the system, they can be called from the command-line.

To know the technical name of a COURSERA.ORG course, pay attention to its URL, when learning in a browser session. For example, when starting to learn the Coursera course named “Build a Modern Computer From First Principles”, the URL is
https://www.coursera.org/learn/build-a-computer/home/welcome

The technical name is “build-a-computer“, i.e., the string after “https://www.coursera.org/learn/” and before the subsequent forward-slash (“/”). This parsing rule should work for any course.

To download a COURSERA.ORG course named “XPTO”, logging-in as “user@email.com”, having password “1234”, in theory, it should suffice to launch a command-line window (CMD.EXE on any Windows) and enter:

coursera-dl -u "user@email.com" -p "1234" "XPTO"

These days, this will probably FAIL, due to the introduction of CAPTCHAS which defeat many bots.

As of February 2021, COURSERA-DL does NOT defeat the COURSERA CAPTCHA, about picking images which solve some challenge. Defeating CAPTCHAs can be quite a project on its own, so it is understandable that this is happening. The workaround is easy, but not automatable.

For each COURSERA.ORG course you are subscribed to, when you use a web browser to learn it, a cookie named “CAUTH” for domain “.coursera.org” is created on the local computer. In my case, I always use Firefox and the extension “cookie quick manager”, to see the cookies for domains. Using that extension, or equivalent, just observe, text-select, and copy the string value for the CAUTH cookie, which can be a long string (hundreds of chars).

Then, provide the value of that string upon calling COURSERA-DL:

coursera-dl -u "user@email.com" -p "1234" "XPTO" -ca "hundreds of chars go here"

That is it.
For a better workflow, find the folder where the Python script for coursera-dl is; i.e. search for the local file “coursera-dl.py“.

If you have Python installed at

c:\python

the file will be at

c:\python\scripts

In the scripts folder, create a NEW text file named “coursera.conf“, consisting of the sensitive data and other eventual arguments you can learn about by reading COURSERA-DL’s documentation.

For example:

-u "user@email.com" -p "1234" --subtitle-language en --download-quizzes

The text above is the content inside the text file “coursera.conf“, saved in the same folder that contains the coursera-dl.py script.

Now, to download course “XPTO”, just do:

coursera-dl "XPTO" -ca "hundreds of chars go here"

Intro to Digital Humanities, Day 6, Lesson 2 Start

I am a student of the “Introduction to Digital Humanities” course @edx.org.

Today I started lesson 2 and completed lesson 1.5 (check the previous post in this blog).

Lesson 2 is on “Digital Humanities Projects, tools, and questions they support”.

The lesson starts with a discussion.

Discussion

Read and consider this quote taken from the book, Digital Humanities:
“The digital environment offers expanded possibilities for exploring multiple approaches to what constitutes knowledge and what methods qualify as valid for production. This implies that the 8-page essay and the 25-page research paper will have to make room for the game design, the multi-player narrative, the video mash-up, the online exhibit and other new forms and formats as pedagogical exercises. Playful, imaginative, participatory work is not the enemy of education but its exuberant and vital engine. New standards of assessments will be necessary as skills change. We struggle less to remember facts than we do to remember where and how to find them–and how to assess their validity.” (Digital Humanities External link, 24-25)
Do you agree or disagree that in addition to writing, other teaching and research practices grounded in digital tools and formats should be considered part of the “vital engine” of education?

My answer:
Title:
I strongly agree and see it as natural

Body:
I strongly agree, and I see the expansion as only natural. Languages evolve, writing evolves, and it was never literally about scribbling on a medium. “Writing” is about capturing ideas. Sculptors may write on stone, photographers may write on stills, videographers on video, etc. Now we have more tools and different media than ever before in human history, so some will prefer brushes to pencils, cameras to brushes, an artificial programming language to natural English, 3D virtual models to maquettes, and so on.

The way we express ourselves, should fit who (or what) we want to communicate with, but that is also a fact of the past. The big difference is in the diversity available to us. Diversity can be tough to accept, for many reasons, some rooted in fear, some rooted in the need to defend one approach, but is here and now, and with costs of opportunity so low, that there is nothing to lose in at least trying different forms of expression.

Some new forms of expression will quickly establish themselves as the preferred for certain interactions but, more commonly, all forms require a maturing time. One good example is Virtual Reality, and Augmented Reality in particular, which has been slowly evidencing itself, when done properly, as an highly effective learning tool.

I published my answer as a new post @ https://courses.edx.org/courses/course-v1:HarvardX+DigHum_01+1T2019/discussion/forum/186cebae9711244392f2bad2da5e7ac33d033da6/threads/5d87b8fb84452a07c6002ac8

Intro to Digital Humanities, Day 6, Lesson 1.5

I am a student of the “Introduction to Digital Humanities” course @edx.org.

Today I started lesson 2 and completed lesson 1.5.

Lesson 1.5 ends with two interesting questions.

Question 1
The student is given two texts (biographies excerpts) and has to identify categories of information present in BOTH.

My original answer was: name, gender, profession, and level of education.
The answer considered correct, adds the “birth country” category. I do NOT agree that BOTH texts allow the reader to extract the birth country information. You can judge by yourself later in this post, because I copy/paste the passages.

My reasoning is that for Sir Derek, the reader is given a very complete birth address (but without an explicit country) that allows inferring that the birth country is England, but not with absolute certainty. Is there no other place in this world with that same address? I recon not, but is it safe to assume so, from that small text alone, as instructed?

For Lu Zuquian, the text makes no mention to a birth place nor country. From the very beginning, the reader thinks “China”, and everything that follows reinforces that, but the paragraphs are referring to where the man lived, studied, worked, not to where he was born. Based solely on the given short text, I think one cannot take for granted that Lu Zuiquian birth country is China.

After being told that my answer was not correct, I feared that I was misinterpreting the question and that I should check categories present in either text, so I resubmitted the answer, signaling all categories. Having failed again, I started ticking off the most improbable choices, until my submission was accepted. It was a frustrating first “graded” moment.

Question 2
The student is requested “three words that come to your mind when you think about how the structure or use of a database could have unintended meaning or negative consequences?”
I approached the problem, by first simplifying the question to “how can the use of a database have negative consequences?”
Then I formulated one quick answer, not exactly in single words: “there can be exposure of sensitive information, to unmerited users”.
Finally, I picked three related single words:

  • sensitivity
  • security
  • necessity

It seems that no one approached the problem my way, because my words ranked like this:
sensitivity 0%
security 1%
necessity 0%

What follows are the “biographies”, if you want to reason by yourself.

Biography #1
From the biography of Lü Zuqian in the History of the Song dynasty (China)
Lü Zuqian, whose style name was Bogong, was a grandson of the Right Assistant Director to the Imperial Secretary [Lü] Haowen. His family lived in Wuzhou beginning in his grandfather’s generation. The learning of Zuqian was based on family tradition, and embodied the textual transmission from the Central Plain of the north. When he grew up, Zuqian studied with Lin Zhiqi, Wang Yingchen, and Hu Xian. He also was friends with Zhang Shi and Zhu Xi, and thereby his understanding gained in clarity.
At first he obtained official rank by way of the protection privilege but later he obtained his Presented Scholar degree and also passed the special decree examination for “Erudite Learning and Exceptional Literary Composition.” Then he was appointed as the Instructor at School for the Imperial Clan in the Southern Outer Office of the Hostel for the Imperial Clan.
Song shi, chapter 434, translation by Peter Bol

Biography #2
From the Oxford Dictionary of National Biography (UK)
Wanless, Sir Derek (1947–2012), banker and policy adviser, was born on 29 September 1947 at The Gables, Elswick Road, Newcastle upon Tyne, the only child of Norman Hall Wanless (1911–1980), lorry driver, and later storeman at a Tyneside cement works, and his wife, Edna Mary, née Charlton (1915–2008). Educated at the Royal Grammar School, Newcastle upon Tyne, he had a Saturday job at the Darlington branch of the Westminster Bank, and won a Westminster Bank scholarship to King’s College, Cambridge, in 1967, to read mathematics. In 1970 he was senior wrangler (awarded the top first-class degree in mathematics). On 25 September 1971, at the parish church in Walker, Newcastle, he married Vera West (b. 1949), clerical officer, and daughter of William West, shipyard caulker; they had one son and four daughters.
Wanless joined the National Westminster Bank (formed in 1968 from the merger of the Westminster Bank and the National Provincial Bank) in 1970, rising rapidly to become area director for the north-east, based in Leeds, in 1982. In 1986 he moved to London as director of personal banking, and as such led the team which developed the Switch debit card. Following his appointment as chief executive, UK financial services, in 1990, he was promoted to the position of NatWest group chief executive in 1992. But he was held responsible by the board for problems which developed at NatWest in the 1990s, culminating in a £90 million trading loss in NatWest Markets, the investment arm, in 1997, followed by the failure of proposed mergers with Abbey National, and then Legal and General, in 1999. When, later that year, the share price collapsed, he was forced to resign, and shortly after this, early in 2000, NatWest was taken over by the Royal Bank of Scotland.
Anne Pimlott Baker, January 7, 2016

Intro to Digital Humanities – Day 5

I am a student of the “Introduction to Digital Humanities” course @edx.org.

Today, I learned “Critical Reflections on Digital Humanities” (lesson 1.5).
I consumed the following lessons:

  1. Instructions
  2. DH Timeline
  3. History of Digital Humanities
  4. Critical Code Studies
  5. Digital Humanities and Design
  6. Computation is Not Value Neutral

I made a GIF of the timeline, available at the end of this post.

There were two discussions.

Discussion #1

Think about this timeline.
As you study the items we’ve included on this timeline, we encourage you to raise questions about them and discuss items that you think should be added or removed. You may wish to come back to this throughout the course, but the point we’re trying to make with this timeline is that many different kinds of people, technologies, and other infrastructure have come together to create the practice we know today as digital humanities. Undoubtedly there are many ways to summarize or evaluate the influences that have shaped humanities research. And a timeline is only one way to organize that summary.
We want to hear from you about your impressions regarding the historical events that have led to current digital humanities practice. Which items should be added or removed? Enter your post in the discussion forum below.

My answer (#1) follows.

Title:
I would not include all of the current entries; I reason about “enabler and multiplier” solutions vs. particular projects

Body:
I regard “writing” as humanity’s all-time greatest invention. No writing, no memory, making it much harder (impossible?) for new generations to fully benefit from their antecessors’ progress. So, I do agree with 1440’s printing press inclusion in the timeline, because that represents a significant higher level of “memory technology”, not only for recording purposes, but also for information diffusion.

The 1800’s “humanities” landmark, with a reference to the 15th century, is relevant in the sense that it acknowledges some of the first dedicated and explicit critical studies of human creative works.

1946’s father Roberto Busa + IBM project was unknown to me. I understand its inclusion, but I am hesitant on its relative relevance; I think its relevance is very much lower on the “scale”, compared to “printing press”, for example. One of the reasons I think this, is that at the time, software development was so tightened to the hardware itself, that there weren’t even standards for how to code characters – ASCII (a very relevant standard for encoding characters) is something of the early 1960s. This means that software creators would not agree in details so low, as how to represent an “A” (or any other symbol) in code. The consequence is that whatever tool IBM created for the study, its operation would be/was limited to a very specific IBM machine, requiring highly specialized people to do anything it with. In other words, my perspective is that any digital tool will only be worth mentioning by the time its inputs and outputs have reached a more “open”, or at least “standardized” maturity.
OCR, at least after standards for character encoding, is understandable in the timeline.

“Situationist International”, which I also learned about from this timeline, allowed its contributors experimentation worth the records. I will assume other collectives were doing the same, but they did not achieve the same notoriety.

1960s’ Geographical Information Systems, are the precursors of my favorite “leisure” software: Google Earth. Google Earth is underappreciated. It is so empowering, to be able to virtually travel to anywhere on Earth (and beyond!), and learn more from there! Tomlinson’s system was very different, but was the seed to everything that followed, so I find his contribution highly deserving of the timeline.

“The medium is the message”, is certainly not a consensus, when interpreted from an importance perspective, but it does apply to much of the communications happening today, via all the media. I would NOT include this sentence in the timeline.

Instead of “first two-node” network (a classification I do not agree with), I would pick the underlying key technology as one of the greatest all-time inventions: “packet switching” is about the digitalization of information and dividing/organizing it in digital packets whose sending/receiving order is NOT relevant, contrary to what happens in analogue conversations. Hence the packets can travel different routes, some longer, some shorter, some readily available, some found inaccessible (in case of war, some physical paths can get destroyed, yet it might still be possible to deliver the packets through alt-structures), and get assembled at the destination, according to metadata in each packet – its sequence number. To me, this ranks as high as the “printing press”.

The Internet changed, and will keep changing, everything. We all work on the shoulders of giants, and any technology is only possible as the top layer of a big stack of all the previous supporting technologies; so it can sound unfair to say that the Internet is the most important landmark of them all, in the timeline under discussion, but that is how I see it.

I agree with the PC in the timeline. As a tool, it was the first tool enabling individuals’ access to the Internet.

As with “Situationist International”, specific organizations’ work, no matter how interesting, is only subjectively more or less important, than others’, so I would not include MIT’s “Interactive Cinema Group”, “Index Thomisticus” and the “Dartmouth Dante Project” in the timeline. If they are to be included, what to say of many of Douglas Engelbart’s projects (a fundamental person in multimedia thinking and doing)? And what about Theodore Nelson’s hypertext works (the person who coined many of the hyper* expressions and many related ideas)?
I would instead look to include technologies, even if only conceptualized, as Vannevar Bush’s “Memex”.

TEI and the WWW are foundations, platforms, on which people build; so, as “enablers”, they fit my view of what is most justifiable in the timeline. On the other hand, Google Books and Wikipedia are superb, wonderful projects, built on top of “enabler technologies”, but not exactly at the same multiplier level.

I published my answer (#1) as a new post at the following URL: https://courses.edx.org/courses/course-v1:HarvardX+DigHum_01+1T2019/discussion/forum/9b86718d2363f1a198f41058d31f2dc1a61a4c3c/threads/5d86390c8149fd09370029dd

Discussion #2

Now that you have learned more about some of the critical theory behind digital humanities work, we want to know how your thinking has changed? What topics or ideas surprised you the most? What might have been most relevant to your own research interests?

My answer (#2) follows:

Title:
Confirming the growing reach of Digital Humanities and two “surprising” situations, to be picky

Body:
As I proceed in the course, the more I feel the constant opportunities for Digital Humanities studies in today’s world, a personal growing interest in the field, and how the label might even apply to some projects of mine.
The strong multidisciplinary of the field is not surprising me, nor is its heavy collaboration with digital/computational techniques, technologies and tools. In that sense, my thinking has not changed.
I have to be picky, but I can identify two situations which I still need to understand better: one is the relevance given to specific projects (“Corpus Thomisticum”, “Dartmouth Dante” and “The Complete Writings and Pictures of Dante Gabriel Rossetti); the other is a different perspective I might have of the concept of “scale” and its handling by humans.
Regarding the specific projects, I understand their merits and even their pioneering nature, but I am unconvinced that they are/were engines for the advancement of Digital Humanities. I see them as examples of, but not as the engines for. For this reason, I was expecting complementary sentences, stating “there are other examples” and/or clarifications on their main contributions.
Regarding the perspective of “scale”, I sometimes perceived the one-sided idea that computational approaches scale-up without issue, contrary to humans. In fact, scaling-up is a huge computational challenge for non-linear problems, and humans can be surprisingly good in handling massive sets of data, for a mix of reasons. One pop example is how only very recently Artificial Intelligence was able to beat humans at the game of Go.

I published my answer (#2) as a new post at the following URL: https://courses.edx.org/courses/course-v1:HarvardX+DigHum_01+1T2019/discussion/forum/b89ce8e812f9c61c8b19bb9b161119fe1ad65886/threads/5d865c448149fd0978002b49



02_DigHum_01_v1_119_Critical_Code_Studies_20190410_RC-_edxmstr_v1-en_768.jpg
https://arturmarques.com/wp/wp-content/uploads/2019/09/02_DigHum_01_v1_119_Critical_Code_Studies_20190410_RC-_edxmstr_v1-en_768.jpg (image/jpeg)

02_DigHum_01_v1_119_Critical_Code_Studies_20190410_RC-_edxmstr_v1-en_768.jpg


03_DigHum_01_v1_120_DigHum_and_Design_20190520_RC_v2-_edxmstr-en_768.jpg
https://arturmarques.com/wp/wp-content/uploads/2019/09/03_DigHum_01_v1_120_DigHum_and_Design_20190520_RC_v2-_edxmstr-en_768.jpg (image/jpeg)

03_DigHum_01_v1_120_DigHum_and_Design_20190520_RC_v2-_edxmstr-en_768.jpg


04_DigHum_01_v1_121_Computation_Not_Value_Neutral_20190410_RC-_edxmstr_v1-en_768.jpg
https://arturmarques.com/wp/wp-content/uploads/2019/09/04_DigHum_01_v1_121_Computation_Not_Value_Neutral_20190410_RC-_edxmstr_v1-en_768.jpg (image/jpeg)

04_DigHum_01_v1_121_Computation_Not_Value_Neutral_20190410_RC-_edxmstr_v1-en_768.jpg


dh_animated_timeline.gif
https://arturmarques.com/wp/wp-content/uploads/2019/09/dh_animated_timeline.gif (image/gif)

dh_animated_timeline.gif

Technical Details

Intro to Digital Humanities – Day 4

I am a student of the “Introduction to Digital Humanities” course @edx.org.

Today, I learned “Why Data Matters” (lesson 1.4).
I consumed the following lessons:

  1. What is Data?
  2. What is Digital Scholarship

Today’s discussion was: “In his essay, “How Not to Teach Digital Humanities,” Ryan Cordell writes, “…I have become increasingly convinced that DH will only be a revolutionary interdisciplinary movement if its various practitioners bring to it methods of distinct disciplines and take insights from it back to those disciplines.” (Debates in the Digital Humanities External link, 2016, 463)
Based on what you already know about the humanities and any categories of computing or digital technology, can you identify some of the benefits that digital tools of analysis provide to humanities research?
Based on what you already know, can you identify some of the drawbacks or risks that we should all keep in mind when considering how digital tools, methods, and sources shape our understanding of specific research questions?
Please write your own post and then comment on the posts of a few other learners in the course.”

My answer follows.

Title:
Benefits: automation, scale, self-reinforcing research platforms; eventual drawbacks: quality, trust, IP questions

Body:
One of the easiest to understand benefits that digital tools of analysis can bring to humanities research is the same they can bring to any other area of research: performing repetitive tasks, over quantities of materials of “arbitrary” information size. Of course, the “repetitive” task must be coded – and that is much easier said than done – and the “arbitrary” size is not so irrelevant: depending on many factors, software might not scale up as expected, eventually requiring resources not at the reach of the typical researcher. For example, software might behave without issues with x sources of something to work on, but it might shift to unusable just by doubling that quantity, if something exponential is at play.

To clarify what I mean by “repetitive task”, I could pretend to be researching “what is the most painted fruit on Portuguese paintings of the 18th century?”. Imagine that I have at my disposal, digital pictures corresponding to all those paintings. Now I “only” need a solution to automate fruit identification. Maybe someone already trained a neural network for that. Then, such digital tool will be able to build me a histogram of fruits’ presences; something like: apple: 756, pear: 567, etc. To do this without computer assistance, would be much harder. However, this observation assumes the availability of the files, and the quality of the identification tool. If I had to write the tool alone, starting from zero, including all the fundamentals of the underlying Artificial Intelligence, I would surely have the job done faster by checking the pictures myself.

This repetitive task involves a single identification problem. We can all imagine more challenging research questions, such as “can a link between fruits and social status be derived from Portuguese paintings of the 18th century?”. In this second example, the problem of identifying the presence of people in a picture and – harder than that – assigning the person a social status based on his/her clothes, or hair style, or both, or something else, seems incredibly difficult. Yet, it remains “repetitive” and, to hold credible results, it should be performed over a large quantity of sources.

It might be very complex software, but in the end, the tool I am imagining for the previous examples, outputs simple metadata. For each picture, it produces three types of tags: the fruit tag, the person tag (e.g. “with person”/”no person”), and the social status tag (e.g. “noble”, “undetermined”). This data can be helpful to other projects, so one enormous benefit of digital tools is seeding, contributing to, future projects. Many alternative research questions can be formulated and assisted by data made previously available. It is a self-reinforcing virtuous mechanism. The more tools are made available, the more data is possible, the more questions get support, the more results can potentially be harvested.

I see two main classes of problems/drawbacks:

  • the digital tools themselves have associated intellectual property rights, such as copyright and/or specific licenses, and those can be hard to decipher;
  • the quality of the digital sources is relevant: one bad digital input can subvert results, hence the relevance of many museums digitizing their own collections and not pushing researchers to alt-digital-materials;
  • tools can (and will) fail and if no monitoring is performed, researchers risk having trusted what was not trustworthy.

I could mention the potential neglecting of “originals” as one eventual drawback of the availability and comfort in using digital versions, but that is not intrinsic to the digital resources, but rather one possible behavior.

I published my answer as a new post, at the following URL: https://courses.edx.org/courses/course-v1:HarvardX+DigHum_01+1T2019/discussion/forum/340689f62e7b18aeea76197f3df4ac342e5a5807/threads/5d84f63c8149fd0955002901

Intro to Digital Humanities – Day 3

I am back to the “Introduction to Digital Humanities” course @edx.org.

Today, I learned about “Building and Using Collections” (lesson 1.3).
I consumed the following lessons:

  1. Van Gogh’s Three Pairs of Shoes
  2. Collaboration in Digital Museum Scholarship
  3. Benefits of Technology in Museum Scholarship
  4. Digital Access and Museum Curatorial Practice
  5. New Approaches to Museum Scholarship
  6. Harvard Art Museum and Open Access

At some stage, the student is asked to “recall the way in which Jeffrey Schnapp described two categories of digital humanities projects: those that identify and study large scale trends and those that focus on smaller, specific examples or exceptions. How does his perspective relate to Martha Tedeschi’s study of James McNeill Whistler or Francesca Bewer’s study of Van Gogh’s painting Three Pairs of Shoes?”

My answer follows.

Title:
I see Tedeschi’s study as a “specific example”, and Bewer’s as identifying a “large” scale trend

Body:
Martha Tedeschi’s study of the James Whistler letters, was done on location, at the University of Glasgow, before the letters were digitized and made available online. It is reasonable to assume that her research required an increased relative focus, more time and money, and it did not nurture as many opportunities for comparisons and other forms of bridging to other works, as current conditions potentially allow. Her research conditions absorbed the object of study itself, and its content, but also its tangibility and even properties not yet digitized (such as weight, feel, smell, etc.). In other words: Tedeschi’s study is intensely focused on the Whistler letters, which I consider to fit Schnapp’s “smaller examples or exceptions” category, more than it can fit the “large scale trends” classification.

These days, with many more collections available online and technology assisting in time and costs cutting, maybe with existing or with custom software tools, researchers can find the opportunity to leverage or complement their findings from any particular case study with chapter(s) framing their main object in “alternative perspectives”.

Regarding Bewer’s study of Van Gogh’s “Three Pair of Shoes”, the same argument of fitting Schnapp’s “specific” category could be made, but in this case, the usage of several technologies (X-ray fluorescence analysis, X-radiography, spot analysis, etc.) which when applied to other Van Gogh’s paintings, highlight evidence of a larger scale pattern (materials reutilization) in Van Gogh’s art, and the way such techniques can be used to study any other creators’ creations, makes it closer to a digital humanities project with contributes to the identification of “large scale trends”, with “large” being the more subjective word here.

I published my answer as a new post at the following URL: https://courses.edx.org/courses/course-v1:HarvardX+DigHum_01+1T2019/discussion/forum/19ca7036f6396a5f53af59c9a65c8d6169564b93/threads/5d84e0a584452a07b1002914



02_DigHum_01_v1_112_Collaboration_In_Digital_Museum_Scholarship_20190411_RC-_edxmstr_v1-en_768.jpg
https://arturmarques.com/wp/wp-content/uploads/2019/09/02_DigHum_01_v1_112_Collaboration_In_Digital_Museum_Scholarship_20190411_RC-_edxmstr_v1-en_768.jpg (image/jpeg)

02_DigHum_01_v1_112_Collaboration_In_Digital_Museum_Scholarship_20190411_RC-_edxmstr_v1-en_768.jpg


03_DigHum_01_v1_113_Benefits_of_Technology_20190408_RC-_edxmstr_v1-en_768.jpg
https://arturmarques.com/wp/wp-content/uploads/2019/09/03_DigHum_01_v1_113_Benefits_of_Technology_20190408_RC-_edxmstr_v1-en_768.jpg (image/jpeg)

03_DigHum_01_v1_113_Benefits_of_Technology_20190408_RC-_edxmstr_v1-en_768.jpg


04_DigHum_01_v1_114_Digital_Access_20190408_RC-_edxmstr_v1-en_768.jpg
https://arturmarques.com/wp/wp-content/uploads/2019/09/04_DigHum_01_v1_114_Digital_Access_20190408_RC-_edxmstr_v1-en_768.jpg (image/jpeg)

04_DigHum_01_v1_114_Digital_Access_20190408_RC-_edxmstr_v1-en_768.jpg


05_DigHum_01_v1_115_New_Approaches_Museum_Scholarship_20190408_RC-_edxmstr_v1-en_768.jpg
https://arturmarques.com/wp/wp-content/uploads/2019/09/05_DigHum_01_v1_115_New_Approaches_Museum_Scholarship_20190408_RC-_edxmstr_v1-en_768.jpg (image/jpeg)

05_DigHum_01_v1_115_New_Approaches_Museum_Scholarship_20190408_RC-_edxmstr_v1-en_768.jpg


06_DigHum_01_v1_116_Harvard_Art_Museum_Open_Access_20190411_RC-_edxmstr_v1-en_768.jpg
https://arturmarques.com/wp/wp-content/uploads/2019/09/06_DigHum_01_v1_116_Harvard_Art_Museum_Open_Access_20190411_RC-_edxmstr_v1-en_768.jpg (image/jpeg)

06_DigHum_01_v1_116_Harvard_Art_Museum_Open_Access_20190411_RC-_edxmstr_v1-en_768.jpg

Technical Details

Intro to Digital Humanities, day 2

In my second day studying “Digital Humanities” (DH), I watched two videos and answered two related questions.

#1 – About computational methods in DH

The student is asked to watch this:

Then respond to “How would you describe computational methods applied to humanities research? Can you imagine applying computational methods to your own work in the humanities? How do Jeffrey Schnapp’s comments change or challenge your thinking about Digital Humanities?”

My contribution was:
The computational methods applied to humanities research will depend not only on what the research subject is, but also on a custom research path, and other practical factors, namely resources available (including time). In this sense, the computational methods in the DH will vary as they would in other fields: they adapt or are adapted to the task, in a context that includes the researchers’ own preferences.

In the video, Jeffrey Schnapp presents one perspective, where researchers in DH tend to work either on patterns identification, or on exceptions that may break the monotony. Schnapp seemed to focus his comments on the differences, but as he spoke, he also implicitly hinted the similarities: one cannot point exceptions without a general case.

What I see as distinctive in DH research is the higher probability of the need for a multidisciplinary approach to problems.
Human behavior and human expression, in any form, can eventually be modelled as computational data and logic, and even one day be automatically researched (!) with Artificial Intelligence (A.I.). If that day is to arrive, the A.I. must learn from what I perceive as an infinite pool of different possible questions, different desired visualizations, different sensibilities, different audiences in need of answers, etc. Handling this beautiful diversity may be a strong and appealing characteristic of DH research.

#2 – About what is DH?

The student is asked to watch this:

Then respond to ” In what you’ve seen so far, how do these examples fit with your own work and your own professional interests? What opportunities can you identify that you might like to explore further or learn more about?”

My contribution was:
I enjoy writing, including writing computer software. For years, I felt computers demanded more than what they gave me back; hence, I gained an interested in task automation, including automatic file organization, and several forms of automatic Internet activities. I think that part of my software developer experience can be helpful in DH research; for example in ingesting and processing data from different sources.
But there is a very significant shift going on, towards the use of certain Artificial Intelligence (A.I.) and Machine Learning (M.L.) frameworks, for many potentially DH related tasks.
The effectiveness of that AI/ML approach can be stellar; yet it may come with a “freedom” and “pleasure” cost. Researchers have to abstract ever-greater layers of logic: at this stage, many researchers become mere users of processes that they do not understand, and do not have to, since their focus is the “results”.
In my view, there is this “abstraction frontier” that can be set to a critical level; once the line is crossed, one risks paying a “motivational” and “pleasure” price, factors once too many times not acknowledged as important to do sustainable research.
Suzanne Blier mentions “fun” in the video. Racha Kirakosian mentions “you don’t have to be an expert in everything”, hinting this abstraction now required to handle different and complex computational tools.
Long story short: I would probably have more fun in using digital tools totally developed by myself, but that has become impossible. I should be grateful if I can understand a required minimum to make effective use of what tools are available.

#3 – I also commented on colleague’s (Alex Kashkine) post:

Yes, to me, that also seems significant in DH. Yet, I think DH goes beyond digitalization, statistics, and open access for collaborative work. Tools can produce new data, not directly available in the input documents. For example, one day I watched a NHK Japan documentary about how researchers, after having trained software to reckon ancient calligraphy, were able to “complete” poorly preserved scripts and extract full text from originals with many missing bits. This would be an example based on tangible historical evidence, made “intangible” and subject to a digital interpretation process.
In other applications, totally new data can be created.



edx_idh_computational_methods_and_the_humanities_poster.png
https://arturmarques.com/wp/wp-content/uploads/2019/07/edx_idh_computational_methods_and_the_humanities_poster.png (image/png)

edx_idh_computational_methods_and_the_humanities_poster.png


edx_idh_what_is_dh_poster.png
https://arturmarques.com/wp/wp-content/uploads/2019/07/edx_idh_what_is_dh_poster.png (image/png)

edx_idh_what_is_dh_poster.png

Technical Details

Studying "Digital Humanities" @HarvardX

Studying “Digital Humanities” @HarvardX

Months ago, I enrolled in Harvard’s “Digital Humanities”, via EDX:
https://courses.edx.org/courses/course-v1:HarvardX+DigHum_01+1T2019/course/

Then I procrastinated, other subjects got in the way, and I did not complete a single lesson. The course subscription remained active and on the last possible day to resume my studies with access to a certificate, in case of success, I took the opportunity to retry.

This course has an appellative syllabus, covering what “digital humanities” is; facilitating contact with several related projects worthy of the classification; and – this is my expectation, since I have just begun – exposing methods, approaches and tools that may help students in their own projects.
Eventually I will become better prepared to leverage some of my digital ventures to a research level, answering or posing interesting questions, producing and/or processing valuable data.

Today I adored the first hour I invested in the course, but there is a serious risk that I might “not belong”. In the first interactive moment, the student is asked to say the first four words that come to his mind, related to “digital humanities”. In hundreds of answers already available, I managed to reply two words/expressions with a presence of… 0% (!) and two others with a presence of 1%. Big, big miss!
My words/expressions were:
– “computer assisted” (I was thinking about computer assisted research, based on languages, frameworks, technology stacks, etc.), and this input scored 0%;
– “expression” (I was thinking about humanities in general, and how such subjects study the culture, history, art, and interactions of humans, which I broadly regard as human “expressions”), and this input also scored 0%;
– “human”, just because it honestly came to mind, as it did to 1% of others;
– “social”, for the same reason above, with same 1% popularity.

Shaken, but not deterred, I proceeded to learn about five amazing digital projects:
1) CHINA BIOGRAPHICAL DATABASE (CBDB)
In its essence, this is a database of biographical data of people, available online and offline, upon which many visualization, questions, etc., can be built and answered.

The course asks the student his perception of the “main purpose” of the project and my answer (“create a relational database”) was in accord with the most common answer to date.

I took note of the following resources.

CBDB main site is @
https://projects.iq.harvard.edu/cbdb/home

The standalone DB:
http://projects.iq.harvard.edu/cbdb/download-cbdb-standalone-database

Related video:
https://www.youtube.com/channel/UChgYFvs116M-esBcUfcHKfQ

I also learned the word “Prosopography”, meaning “the investigation of the common background characteristics of a group of actors in history”.

2) The Imperiia Project
I perceive this project as maps of the economic and cultural infrastructures of the Russian Empire.

Project page:
http://dighist.fas.harvard.edu/projects/imperiia/

Interactive version:
https://worldmap.harvard.edu/maps/886

More:
https://dataverse.harvard.edu/dataverse/ImperiiaGIS

Most people stated that the main purpose of this project is to “analyze geography”. I did not answer that. I answered “visualize data”, because the analysis is (mostly?) map-based.

3)
The Neural Neighbors project

Start here:
http://dhlab.yale.edu/projects/neural_neighbors.html

This is an application of neural networks to compute the proximity/similarity of images in sets of images. Very, very interesting.

This project’s main purpose is to “arrange and compare images” – I got that right!

4)
The “Explore the Oxford Friars” project

The main page is at:
https://oxfordfriars.wordpress.ncsu.edu/

I watched a related video and perceived the project as a digital reconstruction of a disappeared building. It is also that, but it is mostly about answering historical questions, regarding its location, architecture, dimensions, etc.

5)
HARVARD LIBRARY SCANNED MAPS

The main page is at:
https://library.harvard.edu/collections/scanned-maps

I cannot wait to write a solution to harvest the maps in this project, whose “main purpose” is to “digitize library holdings”. I got it right.

It was a very well spent one hour, and I hope this post captures the juice of it.