Representation of information

Posted on June 7th, 2011 in JoelKotarskiPersonalBlog,PatternSmithIndividualBlogs by Joel's Entropic Flow  Tagged

This blog post was sparked by a simple statement I had seen lots of times in my field but it was worded quite differently than I had seen: You can edit the code you type in any text editor — any will do, as long as it is a plain text text editor and not a word processor.

Of course, that makes sense, because the target audience for the code typed is a compiler, a computer that must interpret the source code in plain text terms.  Yet, I thought – what if the code (this information with potential to cause effects) were written in something like a word processor, where a human being could understand the code but with added markup for emphasis but still the basic underlying plain text code the machine demanded was simple, stripped of formatting, etc?

I thought of the versatility we possess in processing information – something I have been thinking about as I teach my soon-to-be-toddler son about the patterns he will leverage in this world and knowing that I will be amazed as he takes that same journey all of us have taken in conceptualizing, seemingly effortlessly, the complexity of the world we all have collectively built – which is always much more complex than that faced by our ancestors 100 years before, in every generation.

Engineers are making the computers we coexist with more adept at processing patterns and with their large data stores (and linked information is rapidly increasing this store), can find connections between patterns that many humans cannot (for instance, see info on perceptual hashing); nonetheless, human minds deal with such a rich world.  We have embedded our thought into language, and we have embedded that language onto many objects, and we have embedded the text on those object in increasingly creative ways.  Examples abound: typefaces (sometimes multiple within one single page), logos (artistic renderings of common text), ideograms (capturing an idea or metaphor with an artistic rendering), wrapping or warping text around the shape of a surface (real or imaginary), etc.  In all of these cases, human consciousness is quick to process all of this complexity, sometimes at blinding speeds without even hesitation.

I’m saying all of this to set the stage and to ponder some deeper questions — see, I am working on a project which is finally coming together this year but I still face an intriguing problem – how best to store representations of people’s thoughts and the things they create out of them or create them with … and persist them in ways that benefit equally well the creator of the thought, those who may optionally (if the creator chooses to share) consume/build upon/be inspired by these thoughts/things,  the algorithms that help make some of the magic possible with this raw material and thus in turn greatly aid those who work with the algorithms.

The deeper questions surround whether to store minute details on every single piece, with lots of injections of extra or meta information potentially surrounding each piece, thus making it a bit expensive to reconstitute the representations either a human or a computer are expecting to see; or to store it in a simple format that is easily searchable or transformable for the intended purpose.

This puzzle of course faced our early pioneers of this massive uncharted territory called the world wide web.  The early spiders made the choice to capture, store, and analyze the raw information just as it is (and then to transform it to the desired formats as needed).  And we have taken the same approach to much of human knowledge – indexes of common words and raw text predominate over more complex storage representations.  It also is of course cheaper, and if the more analytical representation is needed, processing power can be spent to make the more in-depth analysis.

To take this question into the realm of the absurd, a long time ago in computer technology, we chose to assign a unique code to the signs and symbols that make up text.  The words on this blog post are stored as a sequence of characters based on this code and not a graphical representation of each letter.  That is, knowledge can be stored as “75 110 111 119 108 101 100 103 101″ (the ASCII codes for each letter/symbol) rather than the graphical shape of each letter.  And the fact that it is in bold and the shape of each letter can be stored as markup around that sequence indicating the font and the weight of the word rather than again the actual shape used.  To picture the latter scenario seems ludicrous, because this is a commonly known sequence of characters in a well-defined font and degree of weight to it (bold) and to store it as if it were a work of art seems wasteful and inefficient… given this persistence scheme (the letters and the specifications on how to render it), any computer in the world can easily render an identical result.

But what if it were not a commonly known expression – what if it were a unique expression that one did not recognize?  Perhaps a picture of it would suffice?  But what if a computer algorithm did some pattern analysis and found that it was an artistic rendering of something that was well-known and studied halfway across the globe?  Wouldn’t it make sense to make it as simply a reference to the original with metadata about the unique features (the delta) of it?

Also, back to the knowledge word example.  What if a commonly agreed to representation (a codification) of each word of the English language were in place so that the code, rather than the individual letters, could be captured with the necessary metadata to capture the formatting (font and boldness)?  With that extra space we just saved, what if it were possible to infer the context that knowledge was used in and codify that?

The only purpose of this blog post is to introduce some entropic flow around the question of representation and storage and later transformation of information that I am playing with.  If it simply got you thinking of possibilities and alternatives, whether or not any of them makes sense at this moment, it will have served its purpose.  Ultimately, no strong model jumps out at me now so the standard model of storing text is predominant in my mind; however, I am hoping by cracking open this inquiry and pondering it in the back of my head, something innovative and useful may result – and then only if it has good pragmatic value will I consider something counter to the current trend.