Sunday, March 17, 2019

#el30: The Complexity of Data?

Tony873004 [CC BY-SA 4.0
from Wikimedia Commons
Stephen Downes frames his E-Learning 3.0 MOOC in Connectivism theory, which claims "that knowledge is essentially the set of connections in a network, and that learning therefore is the process of creating and shaping those networks." As Downes demonstrates later in his course, the connections in a network are composed of nodes and edges. I'm guessing, then, that knowledge--especially the intellectual knowledge that makes up the educational economy--is made up of data (nodes) and the connections (edges) among them that result in some pattern that we call knowledge. Knowledge formation, then, is something like selecting a handful of stars, drawing the connections among them, and calling the resulting network Orion, a name that functions as a hashtag pointing to a body of knowledge about a "giant huntsman whom Zeus placed among the stars as the constellation of Orion" and the various stories about this huntsman and his gods (Wikipedia).

If this is so, then it makes sense that Downes begins his MOOC with a discussion of data, but as I read through his own writing and the suggested readings, I don't find a useful definition of what the MOOC means by data. This becomes a problem for me especially when Downes says that the MOOC addresses "two conceptual challenges: first, the shift in our understanding of content from documents to data; and second, the shift in our understanding of data from centralized to decentralized." This imprecise use of data also disturbs me because shared data and shared arrangements of that data, especially in stories, form the basis for most communities, so for me, data is the key term in his course, but it remains undefined. Perhaps Downes assumes that the concept of data is obvious, but this is exactly the issue for me. Data is not obvious.

Data is complex, and recognizing, selecting, analyzing, and utilizing data is not an exercise in the domain of the simple. Lots of conceptualizing has to happen before we can glibly proceed with any discussion about data or use data as a basis for further discussion. Wikipedia offers a short definition of data that might be useful as a starting point for clarifying some of the issues I have with the concept as used in EL30: data is "a set of values of subjects with respect to qualitative or quantitative variables."

This seems simple enough; however, just a little reconsideration of the definition points us to some immediate problems with data. Data are values, or characteristics, of subjects that we associate with both qualitative and quantitative variables such as scales, numbers, pictures, and words. It doesn't take us long to question if the values or characteristics belong to the object observed, belong to the perceptions of the observer, belong to the notational system employed, or belong to some interactions among the observed, the observer, and the notational system. For instance, does a grade such as an A, one data point, belong to the student graded, the teacher grading, the scale used for grading, or the interactions among all these parts of the system? Traditionally, educators have assumed that grades indicate some characteristic of the student herself. Many of us have come to think that grades indicate just as much about the teachers and testing regimes doing the grading. I think that the single data point emerges rather problematically from the interactions of the student, teacher, testing regime, and the general environment of all.

This is a long and rich conversation that highlights why I'm uncomfortable with the use of data in EL30, and though I will not resolve this issue in this post or even clarify my own developing position, I can say a few things.

I find data to be complex, nontrivial, and problematic for a number of reasons, but first because data is always context dependent. The data that we recognize and the meanings we assign that data depend mostly on the context within which we as observers and the data as the observed are interacting. This immediately puts me in conflict with lots of people who seem to define data as a contextless, and therefore meaningless, collection of points that can be processed into information in some context, as this conversation on ResearchGate suggests. Perhaps this distinction between data and information is useful in certain applications, but it seems ultimately to be misleading.

I don't think we perceive data outside of some context. True, we can change contexts and give data new meaning, but I don't know that we ever perceive data without context, even if our context is confused. For example, consider the period at the end of this sentence --> . That single data point, of course, makes sense only because it is appropriately placed within the context of this blog post, but what if your screen suddenly blinked white out with only the period showing. I think it entirely possible that you might not even see the period, or if you did, you'd think it a faulty pixel, because of course, the frame around your computer screen provides a familiar context for that single data point and you will try to interpret the period within that context. You may not be able to give the period useful meaning--in other words, you may be confused by the single data point--but confusion owes as much to context as does meaning. It's quite possible that perception at all depends upon context.

Wikipedia, "Stars in orion constellation (connected)
Attribution-Share Alike 4.0 International
So data is always in some context, and a different context creates different meaning, but data is also dependent on its internal arrangement which is also context dependent. The constellation Orion can be helpful here. As the image to the right shows, Orion looks very much like a graph, a network of data points connected by edges. The stars are the data points, and our imaginations draw the edges to match some story. Of course, we could draw different edges using pretty much the same set of data and match different stories, and in fact, we have done just that in various cultures throughout history. For instance, the ancient Babylonians saw "The True Shepherd of Anu," the ancient Egyptians saw the god Sah, and ancient Indians saw Nataraja, an avatar of Shiva--all by redrawing the connections among the same data points of light. In other words, by changing the arrangement of the data, we get different stories, and by changing the stories, we get different arrangements. Again, the interactions of all the elements yield the meaning of the data, or to say it differently, the meaning of data emerges from the interactions.

Not only do the edges in a data set, or graph, change, but the data points, or nodes, also shift. We want to think that the stars are immutable--after all, they do not noticeably change during our lifetimes--but our high school science class reminds us that all the seemingly immutable stars are moving at near light speeds across unimaginable distances. The little animation at the top of this post shows the calculated shift of Orion's stars between 40000 BC and 52000 AD. The perceived immutability of the stars is due mostly to the idiosyncratic perception afforded, or I could say imposed, by our position and scale in space/time. During any given lifetime of observing the night sky, the stars seem to stay in place because of the great distances in space/time between us, the observers, and them, the data. If we could readily shift our position and scale in space/time, then we could see quite clearly that our data are moving (the animation above captures a neat shift in scale by compressing 92000 years into a few seconds).

If we do a 3D fly around of Orion—as does this nifty Youtube video—we see that our arrangements of our data are totally owing to our position in space/time relative to the data. If we assume that we are starting at 6:00 o'clock facing the hunter, then by the time we move a quarter-way counterclockwise to 3:00 o'clock, we see something more like a flattened kangaroo, not a hunter or shepherd. And then we remember that Einstein told us a hundred years ago that what we see and measure depends a great deal on our position in space/time relative to the data that we are observing and measuring and that, contrary to our everyday intuition, two different measurements can both be true.

So not only do data points move relative to the observer and to each other, but they also morph within themselves. Data ain't immutable. Consider the data points, the points of light, in Orion: "Betelgeuse … is a massive M-type red supergiant star nearing the end of its life" when it will explode in a supernova about a million years from now—thereby erasing Orion's right shoulder, assuming you think he's facing us rather than facing away. Betelgeuse is also a rogue star, racing through space alone and unattached to any galaxy, unlike the Sun nestled comfortably in the Milky Way. Mintaka, the westernmost star in Orion's belt, is not a single star but "a multiple star system, composed of a large B-type blue giant and a more massive O-type main-sequence star." It looks to the naked eye like a single star only because of its great distance from us. Orion's sword contains the Orion Nebula, not a star at all but a giant nursery for new stars.

Our dataset is breaking down. Rather, our dataset is assuming new arrangements and demonstrably, measurably different values as we change our position in space/time. The old values are not lost, but they are certainly expanded, and at times, supplanted as our relation to and use of the dataset changes. I'm convinced that all data are like this: a collection of characteristics to which we attach certain values depending on the configuration of the artifact and the relative position of the observing node. Let's break this down.

Note first that data is a set of qualitative or quantitative variables associated with an object. Data is always about something else, something real. I draw this assumption from Karl Maton's discussion of ontological realism in his book Knowledge and Knowers: Towards a realist sociology of education (2014). Maton relies on Roy Bhaskar's critical realism when he insists that "knowledge is about something other than itself, that there exists an independently existing reality beyond discourse that helps to shape our knowledge of the world" (10). This is important. As I understand Maton, knowledge is a complex system, or network, of real nodes (real means for Maton entities that possess "properties, powers, and tendencies that have effects" [9] on other entities) that interact with other nodes. Moreover, each node is itself a complex system of other real nodes and their interactions, and each system is a node in enclosing complex systems. The data about any given node emerges from the interactions of all the nodes across all the scales of this system. This understanding is largely consistent, I think, with the Connectivism theory of Downes and Siemens.

Think about a student, Maya, in a classroom. Maya is real in the sense that she has "properties, powers, and tendencies that have effects" upon other students, teachers, books, rooms, heating systems, and so forth. Maya is not, however, just a single node, a single student. She is also a complex system herself comprised one scale down or in of organs, tissues, and interactions among all those nodes. One scale up or out, she is a node within her class, which itself is a node within a school, and so on. All of these nodes across all these scales are real. They all have properties which we can observe and measure both quantitatively and qualitatively and which seduce us into the essentialism of the positivists: that these properties are essential to the entity, that they are, in fact, the entity itself.

Not so, says Maton. The data a teacher collects about a student such as Maya emerges not merely from Maya herself but also from the teacher, from the larger and smaller systems to which both Maya and the teacher belong, and from the knowledge systems of both the teacher and Maya. To my mind, the role of knowledge in complex systems is a key component of Maton's argument. Knowledge becomes a real entity in its own right within whatever system it finds itself. Maton says, "Knowledge practices are both emergent from and irreducible to their contexts of production -- the forms taken by knowledge practice in turn shape those contexts" (11). Just like Maya or her teacher, knowledge has properties, powers, and tendencies that have effects upon other nodes across systems. What is known about Maya affects Maya, her teachers, her classmates, her school, her family, and so on. Of course, effects are reflexive; thus, the knowledge about Maya is in turn affected by the interactions of the other nodes across the systems. Thus, data and knowledge are dynamic and variable, which seduces us into relativism.

But not so fast, says Maton. Knowledge is not merely an individual construct; rather, it emerges from the interactions of all the nodes within a system: the things known, the knowers, and the body of knowledge. Maton is arguing against the epistemological dilemma he finds in much of educational research that is trapped between a positivist essentialism on one hand and a subjectivist relativism on the other. For the hard positivist, qualitative and quantitative data are integral, intimate features of the object itself, unmediated by human intelligence. Red Delicious apples really are red, and all normally functioning humans will see the same red. For the subjective relativist, qualitative and quantitative data are constructs of the observer, fabrications of human intelligence. Red Delicious apples are red because I see them that way in this light, and other humans may see, or construct, different colors based on their culture and personal capacities.

Maton argues for a third way and, to my mind, a more complex way. In his book, he says:
Against positivism, knowledge is understood as inescapably social and historical but, against constructivism, knowledge is not reduced to social power alone, as some knowledge claims have greater explanatory power than others. … Knowledge practices are both emergent from and irreducible to their contexts of production—the forms taken by knowledge practice in turn shape those contexts. … Knowledge is not constructed by individuals as each sees fit but rather produced by actors within social fields of practice characterized by intersubjectively shared assumptions, ways of working, beliefs and so forth. (11)
Knowledge and the data that comprises it are not dependent merely on the objects known or the entities that know, but on both, and on the existing body of knowledge with its notational regimes and on the dynamic interactions within this system. Maton says:
Though knowledge is the product of our minds, it has relative autonomy from knowing—knowledge has emergent properties and powers of its own. This can be seen in the ways knowledge mediates: creativity; learning; and relations among knowers. ... Once formulated as knowledge, 'objectified', our ideas can reshape our knowing. We can both improve and be improved by what we create. (12)
It seems to me then that identifying and using data to form knowledge is not so easy a task as we might think. Though we usually think that data are natural, given, somewhat inert characteristics of the objects under consideration, the case is not so clear. Data associated with any system are complex, emergent properties of the interactions within the system, interactions among the system observed, the system observing, and other systems at the same scale, and finally the interactions among the observed system and the enclosing systems. Nothing about this is trivial, or simple, and the complexity of data holds great significance for any discussion of data.

First, the idea that the observer is an integral node in whatever system is being observed is one of the great insights of Twentieth-century science and a necessary corrective of classical science's assumption of objectivity—that scientists can stand apart from their experiments and observe and report without affecting the observed system. Complexity science says that observers are an integral, functioning part of the system being observed and that their relative position in space/time must always be accounted for. In short, observations depend on what both the observed and the observer bring to the observation.

This does not mean, however, as Maton has argued to my satisfaction, that data depend solely on either the mental constructs of the observer or the objective characteristics of the observed. The object observed does really exist in its own right and brings its own agency, powers, and presence to bear on any observation or measurement of it. The data observed, collected, and analyzed about the student Maya depend as much on Maya herself as on the teachers and schools collecting the data. More properly stated, the data emerge from the relative positions and interactions between Maya and her teachers.

But this is not the whole story. Observed and observers alike exist and interact within systems of knowledge that can be complementary and consistent or contradictory and conflicting. These stories, paradigms, and belief systems affect what the observed can reveal about itself and what the observer can see, or know. What Maya reveals about herself to teachers and schools and what the teachers and schools can see of Maya depends not just on Maya and the teachers and their interactions but also on the stories, paradigms, and belief systems that each brings to the observation.

Maton is quite clear about the reality and agency of a system of knowledge when he says:
We do not learn about the world in an unmediated and direct fashion but rather in relation to existing and objectified knowledge about the world. We can 'plug into' existing knowledge and so do not have to start from scratch or attempt by ourselves to recreate what has taken, in the case of 'academic' knowledge, thousands of years and even more minds to develop.
What data teachers can recognize and collect about Maya depends a great deal on the system of knowledge, the paradigms and belief systems, out of which they function. I think EL30 would have benefited from some discussion of data prior to using it so extensively in the class.

Though it now occurs to me that Downes might have assumed that Connectivism provides an adequate context for his use of data. If that's the case, then he could have easily mentioned it, but then I might not have taken the opportunity to look more carefully into it myself.

Sunday, December 16, 2018

#el30: A Communal Experience

This is my community experience for #el30, in which Stephen asks us to "create an assignment the completion of which denotes being a member of the community."

I am still focused on text, so I took one post that mentioned community in either the title or the text from the following #el30 blogs:
I then entered each link into a new analysis space at Voyant-tools to create a collection of #el30 posts about community. For the sake of this particular analysis, I removed the names of months from the word cloud as they were clouding (pun intended) the results. Voyant generated the following word cloud:

The word cloud presents the most common nouns and verbs from all of the posts; however, the word cloud is live, which means you can change it. Click on the Scale drop down in the lower left corner to select a specific post, and slide the Terms slider to include more or fewer words. Your assignment is to play with the various posts and collections of terms to create different word clouds and to see if any meaning emerges for you. Then leave a comment on this post to tell the rest of us what you learned.

My failure to post most weeks during the MOOC does not reflect my interest; rather, I'm at the end of my school term, in the middle of some vexing family issues, and about to leave for two weeks in the Bahamas (so don't feel sorry for me). I just couldn't focus on writing, but I did much of the reading and watched most of the videos. I'll carry this conversation forward for the next half year, I suspect.

Thanks to Stephen and all for doing this.

Saturday, November 17, 2018

#el30: Prepositions on the Edge of Identity

Last week, Stephen Downes assigned an identity graph for those participating in #el30. Like Jenny Mackness and Mathias Melcher, I was initially perplexed that the graph “should not contain a self-referential node titled ‘me’ or ‘self’ or anything similar”. Surely, I thought, any picture of my connections should include me, right? Then I had a light-bulb moment and realized that the web of connections, the graph, was me, and that this view of identity is in keeping with Downes' connectivism theory which says, among other things, that meaning emerges within the network of relationships (edges) among nodes rather than in a single node itself. I subscribe to this belief, but old mental habits are difficult to break. I still want to see me as … well, just me, a single, individual node. So building an identity graph could be a therapeutic exercise for me.

I examined the graphs built by others in #el30 for some clues about how to go about this—I always like a model to use even if I intend to violate the model. Melcher based his identity graph on his Twitter and library interests. Mackness used a variety of life events, roles, and locations. Roland Legrand based his map on his spiritual/philosophical beliefs and life roles. I found them all to be wonderful insights into the people who created them, but none of them clicked for me—not wrong, mind you, just no click.

For one thing, I was troubled by the edges, the links, between the nodes. The nodes at least have labels, but the links are nothing more than a thin line from one node to another. This strikes me as a serious oversight. If the meaning is in the relationships, then the links ought to mean something. In most of the graphs I've examined and the tools I've tried for generating graphs, the links are just skinny little lines. At best, they might have an arrow to indicate directional flow. That did not satisfy me.

Then, I teach writing, and I write. Writing seems to be a solid chunk of reality out of which to build an identity graph, and the chunk is definitely related to how I identify myself. Moreover, writing includes those built-in links (prepositions, conjunctions, commas, and other linking devices) that can add texture and color to the edges. Of course, most language scholars (both poetic and rhetoric) tend to favor the nodes—the nouns and verbs, or actors and actions—of writing and ignore the little words. We don't capitalize them in titles, for instance (Gone with the Wind); yet, it's the little words that connect the big words to each other and create much of the meaning, as I discussed in a handful of posts as part of Rhizo14 four years ago. Prepositions were on my mind because of some remarks by Michel Serres. In his book Conversations on Science, Culture, and Time (1995) with Bruno Latour, Serres suggests that prepositions mean almost nothing or almost anything, which turns out to be about the same thing, but that they do the critical work of arranging and connecting the actors, actions, and settings. It seemed rich at the time, but I did not pursue the ideas very far.

In his earlier essay "Platonic Dialogue" in Hermes: Literature, Science, Philosophy (1982), Serres says, "writing is first and foremost a drawing, an ideogram, or a conventional graph" (65). I do not think that Serres is speaking of graphs as we have this week in #el30—he almost certainly means something like a mark or picture—but I want to play with this connection between writing and graphs. My intent is to build an identity graph using the #el30 posts that I have written thus far. The four posts result in a fairly short 3,147 word document when the text is aggregated. I'm using Voyant tools to analyze the #el30 text, which I also used back in Rhizo14, and you can see my Voyant dashboard here. I also used a dashboard that distinguished each post here. This dashboard has some interesting data about my posts as posts within my blog, but I will not use this dashboard in this post.

Unfortunately, Voyant by default uses a stopword list to eliminate prepositions, conjunctions, and other classes of small words from its tools, deeming those words as irrelevant and mostly meaningless. The documentation for Voyant says it this way: "Typically stopword lists contain so-called function words that don’t carry as much meaning, such as determiners and prepositions (in, to, from, etc.)." I'm dismayed but not surprised. Prepositions get no respect from writers, rhetors, and grammarians. However, I intend to use prepositions as edges in my identity graph. I suspect that the prepositions and other connectors will give the links texture, color, spin, and direction that will enrich the meaning of the local connection and the network of connections.

You can see a word cloud of my posts here:

We are all familiar with word clouds, but I'm thinking now that they are proto-graphs with all the nodes and none of the edges. Thus, they are limited in what they reveal. I do, however, like the different sizes and colors of the nodes, and I think I want a graph tool that keeps the different sizes and colors of the nodes and includes the different edges. Voyant suite of tools does not quite do that—or at least, I have not found the tool that does.

So I'm following Jenny Mackness' lead and also using Matthias Melcher’s think tool – Thought Condensr. I thought I would map the top 5 words in my posts, but I managed to do just one: data, the most common major word in my four posts. The graph looks like this:

My writing over the past month reveals a preoccupation with data (the most common of the big words in my posts at 39 occurrences), and the identity graph above expresses my particular orchestration of nodes and edges that identifies me like a thumbprint. Everyone in #el30 is interested in and thinks about data, but I daresay that none have a print like mine. Yes, they have similar prints, perhaps, but not exactly this one. Just as all fingerprints have lines, arches, loops, and whorls, none have them arranged in the same way. That graph above identifies Keith Hamon—or at least a bit of him at a certain scale. This graph orchestrates drowning in data from backyard (in general, read the node/edge clusters from the blue node on the left, through a green connector, then to the red data node, and on to another green connector and a yellow node) with the other clusters of nodes and links to create a unique yet still recognizable fractal image.

Unfortunately, I cannot identify a given cluster of nodes and edges, so you can actually create new ones by following left into data and then out to any other node. You can also read from right to left to create even more clusters that generate different meanings. I think these are limitations of the graphing tools, my skills with the tools, or both. I need a graphing tool that will allow me to identify both nodes and edges and the resulting clusters and to view them in a 3-D or 4-D space. 2-D is too limiting.

I realize, however, that I've made the same error that the Voyant Tools creators did: I've put the noun (in this case, data) in the center, putting all the focus on it and building all the meaning around it. I should have put the focus on one of the prepositions—say, of with its 104 occurrences. I simply do not have time just now to graph all 104 instances of of, so I did just the first 10, and it looks like this:

Look at what a workhorse this little word of is. Consider how it connects all these nodes to create meaning at various scales, to make this particular arrangement of nodes and edges identifiably me. Consider one cluster: University of Miami. All of us in #el30 have some university node, but I may be the only one with a UM node. Even if another of us has a UM node, all the other nodes stitched together by of quickly identify me. I'm the one with a University of Miami node and a movement of energy, matter, information, and organization node. Add the 102 other of clusters, and you've pinned me to the wall. That's me.

I'm not really satisfied with these graphs, but I think they are a wonderful start to thinking about writing and how it creates an identity. I'm very happy Downes assigned this. I'm even happier that I tried it. Seems it was great therapy and substantial learning for me.

Sunday, November 4, 2018

#el30: Interpreting the Cloud

The point of the computing cloud for me has been the continued abstraction of data and services from the computing platform. I've been using computers since early 1980s (In 1982, I wrote my dissertation on the University of Miami's UNIVAC 1100), and I became a Mac user in 1987, so I am well-versed in the problems with exchanging data on one platform with users on another platform. I'm glad those wars are mostly over. I now use a PC at work, a MacBook Pro at home, an Asus Chromebook on the road, and an iPhone everywhere. The underlying hardware and operating systems are almost transparent windows to my online data, documents, and communications. I'm writing this post at home on my MacBook, but I've written posts on all my devices, including my iPhone. I also no longer ask my students what kind of device they have when I make an assignment as they all have at least a smartphone (again, I don't care which) that will let them access the class wiki and do the work. However, they do need a Google account to do most of the work.

And here is the one more platform layer that I want to remove: Google (or Facebook or Twitter). Some of the technologies that Tony Hirst and Stephen Downes discussed in their video chat (over Google, of course) seem to be taking the first steps toward separating a cloud service (say, video chat) from a monolithic platform such as Google. This continues a long progression in computing: we were freed from particular hardware, then from particular operating systems, and maybe soon from particular cloud platforms. So someday I may be able to fire up a container (made by Tony Hirst and released into the commons) on any of my half-dozen devices and hold a video chat with others peer-to-peer on their different devices and containers. I may even write my own containers for special services and release those containers into the commons where they can be used or remixed into different containers to render different services.


My understanding of complex systems is all about the movement of energy, matter, information, and organization within and among systems. As a complex system myself, I self-organize and endure only to the degree that I can sustain the flows of energy (think food) and information (think EL 3.0) through me. The cloud is primarily about flows of information, and the assumption I hear in Stephen's discussion is that I, an individual, should be able to control that flow of information rather than some other person or group (say, Facebook) and that I should be able to control the flow of information both into and out of me. I find this idea of self-control, or self-organization, problematic—mostly because it is not absolute. As far as I know, only black holes absolutely command their own spaces, taking in whatever energy and information they like and giving out nothing (well, almost nothing—seems that even black holes may not be absolute).

It helps me to walk outside for discussions such as this, so come with me into my backyard for a moment. The day is cool and sunny, so I'm soaking in lots of energy from sunlight. I've had a great breakfast, so more energy. I've read all the posts about the cloud in the #el30 feed, so I have lots of information. Of course, I'm pulling in petabytes of data from my backyard, though I'm conscious of only a small bit. Even with the bright light, I can see only a sliver of the available bandwidth. I hear only a little of what is here, and I certainly don't hear the cosmic background radiation, the echo of the big bang that is still resonating throughout the universe. I'm awash in energy and information. I always have been. Furthermore, I can absorb and process only a bit (pun intended) of the data and energy streams flowing around me, and very little of this absorption is my choice. Yes, if the Sun is too bright, I can go back inside, put on more clothing, or put on sunscreen, but really, what have I to do about the flow of energy from the Sun? And what have I to do with the house to go into, the clothing to put on, or the sunscreen. All of those things are complex systems that came to me through other complex systems (bank loans, retail stores, manufacturing factories, Amazon, and my own income streams). Most of the energy and information streams that I tap into owe little to me, not even the energy and information that I feedback.

In his post "Post-it found! the low-tech side of eLearning 3.0 ;-)", AK quotes George Siemens as saying something like "what information abundance consumes is attention", and this gets me, I hope, to a point about all this: Siemens is talking about only a tiny subset of information available to me, even though it tends to be the information that consumes most of my attention. There are other far more important streams of energy and information that I should attend to, I think.

Ahh ... maybe this is my point: even if I can avail myself of more access to more information, I'm already drowning in data. What I desperately need are better filters for selecting among the data and better models for organizing that selected data into useful, actionable knowledge. This is what my students need. Everyone in the U.S. needs better filters and models, especially with national elections on the horizon. In this sense, we are not so different from all the humans and other living creatures who have existed, except that our social systems are so much more complex and complicated than those that came before. What data do I trust, and after I've determined that, how do I arrange this data into actionable knowledge? Facebook and Google are filtering data for me now, and they are even arranging that data into actionable knowledge, but I don't think I trust them. Can the cloud help me interpret the cloud?

Saturday, October 27, 2018

#el30 Data and Models

I should be grading student documents this morning, but I'm thinking about #el30. I may have an assessment of that next week.

Anyway, as I was reading some posts about Data, I was struggling with our previous discussion about the differences between human and machine learning, when something that AK wrote sparked some coherent ideas (at least dimly coherent for my part). AK said: "This got me thinking about the onus (read: hassle) of tracking down your learning experiences as a learner. ... As a learner I don't really care about tracking my own learning experiences."

I thought, no, I, too, don't want to track all my learning experiences. Tracking all those experiences would take all my time, leaving no time for more learning, much less time for grading my students' papers. So maybe computers can be useful for tracking my learning experiences for me? A computer can attend me--say, strapped to my wrist, in my pocket, or embedded in my brain--and collect data about whatever my learning experiences are. After all, computers can collect, aggregate, and process data much faster than I can, and as Jenny notes, computers don't get tired.

But what data does a computer identify and collect? Even the fastest computer cannot collect all the bits of data involved in even the simplest learning task. How will the computer know when I'm learning this and not that? Well, the computer will collect the data that some human told it to collect. Can the computer choose to collect different data if the situation changes, as it certainly will? Perhaps. But again, it can only ever collect a subset of data. How will it know which is the relevant, useful subset? The computer's subset of data may be quantitatively larger than my subset, but will it be qualitatively better? How might I answer that question?

Turning experience into data is a big issue, and I want to know how the xAPI manages it. Making data of experience requires a model of experience, and a model always leaves out most of the experience. The hope, of course, is that the model captures enough of the experience to be useful, but then that utility is always tempered by the larger situation within which the learning and tracking take place. Can a computer generate a better model than I can? Not yet, I don't think.

If both the computer and I are peering into an infinity of experience, and I can capture only about six feet in data while the computer can capture sixty feet, or even six hundred feet, we are both still damned near blind quantitatively speaking. Reality goes a long way out, and there is still something about constructing models to capture that reality that humans have to do.

I've no doubt that computers will help us see farther and wider than we do now, just as telescopes and microscopes helped us. I've also no doubt that computers will help us analyze and find patterns in that additional data, but I'm not yet convinced that computers will create better models of reality without us. When I see two computers arriving at different views of Donald Trump and arguing about their respective views, then I might change my mind.

The #MeToo Text: From Documents to Distributed Data #el30

This week's Electronic Learning 3.0 task is about distributed data, and it gives me a way to think about the #MeToo document that has occupied me for the past year and that has been the topic of several posts in this blog. In short, I take the #MeToo text (all several million tweets of it and more) to represent a new kind of distributed document that is emerging on the Net. Thus, it may be a manifestation of the kind of shift in how we handle data that Downes discusses.

Downes introduces his topic this way:
This week the course addresses two conceptual challenges: first, the shift in our understanding of content from documents to data; and second, the shift in our understanding of data from centralized to decentralized. 
The first shift allows us to think of content - and hence, our knowledge - as dynamic, as being updated and adapted in the light of changes and events. The second allows us to think of data - and hence, of our record of that knowledge - as distributed, as being copied and shared and circulated as and when needed around the world.
I teach writing--both the writing of one's own and the writings of others--which since the advent of Western rhetoric in Greece some three thousand years ago has focused on centralized documents. By that I mean that the function of a document (this blog post, for instance, or a poem or report) was to gather data, organize that data into a format appropriate for a given rhetorical situation, and then present that data in a single spoken or written text. This is generally what I teach my students to do in first-year college composition. This is what I'm trying to do now in this blog post. This is, at least in part, what Downes has done in his Electronic Learning 3.0 web site. Most Western communications has been built on the ground of individual documents or a corpus of documents (think The Bible, for instance, or the Mishnah or the poems of John Berryman).

This idea of a centralized document carries several assumptions that are being challenged by the emergence of distributed data, I think. First, the Western document assumes a unified author--either a single person or a coherent group of people. Western rhetoric has a strong tendency to enforce unity even where it does not exist (think of the effort to subsume the different writers of The Bible, for instance, under the single author God). The Western notion of author-ity still follows from this notion of a single, unified author, and the value and success of the document depends in great part upon the perceived authority of this author.

Along with a single, unified author, the Western document assumes a unity within itself. A document is supposed to be self-contained, self-sufficient. It is supposed to include within it all the data that is necessary for a reader to understand its theme or thesis. I don't believe that any document has ever been self-sufficient, but this is the ideal. A text should be coherent with a controlling theme (poetic) or thesis (rhetoric). The integrity and value of the text is measured by how well the content relates to and supports the theme or thesis.

And of course, a document should have a unity of content. It should have a single narrative, a single experience, a single argument. Fractured, fragmented narratives bother us, and they never make the best-seller lists. Incoherent arguments seldom get an A or get published.

There may be other unities that I could mention, but this is sufficient to make my point that we have a long history of aggregating, storing, and moving data in documents with their implied unities. And then along comes #MeToo: a million tweets and counting over days, weeks, and months. We have this sense that surely #MeToo is hanging together somehow, but is it really a single text?

Well, not in the traditional sense. It has no unified author. Just when we thought that Alyssa Milano started it, we learn that some other woman, Tarana Burke, used the phrase ten years ago. #MeToo isn't even a unified group. A million women are not a unified group. It has no unified thesis. It isn't even an argument. There is no dialectic or rationale. It has no unified content. We think it does because of the single hash tag, but each woman brings a unique set of experiences to her tweet: some have a leer or catcall, some gropings, others rapes or years of beatings. All of them have something different, something unique. They cover the gamut, the field, the space.

#MeToo is a swarm, and we really don't like swarms. Who's speaking here, to whom, and about what? What's the point? And what kind of document is this? How do I read it? How do I respond?

#MeToo is a rhizome, a fractal, and I'm thinking we will come to write and to read this way. We will think this way. Perhaps we always have, and our documents obscured that for us. #MeToo makes explicit a million neurons firing.

And finally, I must recognize that #MeToo could neither have been written nor read without our technology. This way of knowing, thinking, and expressing is possible only with help--in this case, Twitter to write it and somewhat read it--though reading millions of tweets is rather impossible for a single human to do. We need the data analysis powers of our computers to even approach a comprehensive reading of #MeToo. We need something like Valentina D'Efilippo's reading strategies and tools in her article "The anatomy of a hashtag — a visual analysis of the MeToo Movement".

I'm wondering, then, what happens when not only data is distributed and decentralized, but when documents themselves become distributed and decentralized. Is this fake news?

Monday, October 22, 2018

Being Human among Computers: #el30

With a number of other online colleagues, I'm starting a new MOOC with Stephen Downes entitled "E-Learning 3.0". According to Stephen's introduction:
This course introduces the third generation of the web, sometimes called web3, and the impact on e-learning that follows. In this third generation we see greater use of cloud and distributed web technologies as well as open linked data and personal cryptography.
The first week featured a Google Hangout between Stephen in Canada and George Siemens in Australia. I've posted the video here, starting it about seven-and-a-half minutes in to avoid the setup issues.

As Jenny Mackness notes in her blog post about the conversation, Siemens and Downes wax philosophical in their conversation, centering "around what it means to be human and what is human intelligence in a world where machines can learn just as we do."

While I understand the fascination of such a question as computer technologies increasingly approximate many of our intellectual capabilities, in some ways the question seems moot. For me, part of what it means to be human is to use tools and technologies that enhance our innate human capabilities. Admittedly, most of our early tools enhanced our physical capabilities, making us stronger and faster and warmer, but from the beginning, we created technologies that enhanced our intellectual capabilities. I think of language as a technology, and I am not yet convinced that computers will change us more than language in both spoken and written forms has already done. I can almost see computers as a refinement and extension of language, which started with speech, eventually developed into writing—making marks also led to math and drawings—and is being expressed now through computers. Few things distinguish us from other life forms as much as our tools and technologies do.

Did Shakespeare write Hamlet or did the English language? Well, both actually.

Part of the fascination of this question about human vs. computer intelligence comes from our apprehension that computers will become more powerful than we are. This is an old fear, as the American folk tale of John Henry demonstrates, but for me, the lesson of John Henry is that we will continue to use computers to make us smarter despite our fears. I suppose the fearful prospect is if computers will use us to make themselves smarter or if they will simply come to ignore us, having become so smart themselves that our abilities add nothing to them. I don't think they will destroy us; rather, they'll abandon us. This is a problem mostly if you think that humans are the smartest thing in the universe and that computers will usurp our position. It seems rather chauvinistic to think that humans are the crowning achievement in this wondrously large and varied universe. The odds are surely against it, I think.

Almost all complex systems that I know about can learn: taking in information from the ecosystem, processing that information, making structural adjustments to better fit to their environments, and then feeding back information into the ecosystem, which likewise is trying to make a better fit for itself. I have no doubt that computers will do the same, and if our ecosystem comes to include smart machines, then we and the rest of the ecosystem will have to adapt to those new entities. The universe will manage that adaptation quite nicely and count itself more advanced for it.

But that's the long game. In the short game, I am keen to explore how smart machines can help me and my students learn differently, maybe better.