NLP, Natural Language Processing, is all about machines trying to make sense of human generated textual content. However the reverse NLG, natural language generation, is also possible, as in generation of human readable text by machines.
Why Natural Language Generation
A large amount of structured data being accumulated, say Sports Scores, Weather Reports, Traffic Conditions, often by machines on a daily basis. To present these raw numbers in a tabular or even a graphical form has limited appeal. As humans we would also like to have a narrative woven around our numbers. Numbers often mask the vitality of life to those who are imaginatively challenged.
Two of the key players in this field are Automated Insights and Narrative Science. They generate text for everything from reports on Financial filings to Real-estate property descriptions.
However the purpose of this post today is different. We will venture into a little experiment with language generation using Markov Chains. While the Wiki definition does a detailed job, for our purposes think of a Markov Chain as having ability to learn probabilities of when certain words/characters follow other words/characters.
Now, as could be expected from minds like ours, we did not want to train this Markov Chain using examples from Western literature, we wanted to venture closer home. What if we trained this on a typical literature around Yoga we wondered. And the first name that popped up for us was The Life Divine of Sri Aurobindo.
While the scope of Life Divine is slight more wider and deeper than what passes for Yoga in contemporary minds, we thought it would be nevertheless interesting to see what emerged from such training data.
Markov Chains on Life Divine
Here are the steps we took:
- Acquired copy of Life Divine from the Sri Aurobindo Ashram website
- Extracted text from the PDF. Done by simple copy of text from PDF to a text file
- Wrote simple script to strip out page headers, numbers, indexes etc. from text file
- Used a simple Markov Chain implementation from web, apologies for not remembering author to whom entire credit belongs for this code
- Trained Markov Chain on LifeDivine text
- Generate sample output with various orders, seed words/characters and iterations.
Some highlights from the output generated with varying order, seed words and iterations. Check how ‘sane’ it becomes over these parameter changes.
- t s, e. anis sthanitioic an t chitstif f ioredearmunthe Devinfurtetes; bin pe mbuse mpetere thth- sut
- Dens s be Titans pel mof imocecacioke he t, ten me t imivinithedwhendels thitatherathe bl r. wenciof
- But sign am earriversal the good; foreoccult, the minable in it amony back of ider so life it is alway
- That whether of the ignorance, becoming only in the hidden pure amalgam of reality pure become by her th
- That while beginning to conscious in Natures the transforms of spiritual deligion has extensition of their motion is his For a modifies a truth, it is no repetities and excurs us existence mind which has
- Heaven and to ourselves. Finally, by the same times a stumble or even one practical cannot either with and efficiency in the mere systematisation in human sounding exceptions afflicting the action, not dependent which appeared; it puts these hidden operation of the discord between them together; there is a Consciousness gross beyond its experience from earth the existence.
- Divine Consciousness, but as they imply leading even to a world is the cast our being intelligent, is this limited ignorance as the Spirit, mind and a full strength of the Inconscient and control, not an artificially perceives to turn in the struggle unity of the individual and impotency, but a perfectly aware of its luminous dimness.
- Divine Being, the Ishwara. Or the nature of life and being which they are in their reality. It is the soul’s self-delight. The Infinite and Eternal, a Permanent, a Perfection to endure there must be an expression, Matter proving in the old poetic sense of identity with the ideal of human life.
And so, I will leave you to savour the last bullet point. When I saw the first few words emerge from this iteration, I shuddered at the words…this is no prophetic voice speaking, or the utterance of a higher knowledge, this was a plain old algorithm which learnt probabilities and spewed words it knew should follow another set of words.
I shall not get into the philosophical implications on whether we are merely complex Markov Chain machines, and if Nature and Time feed us with signals which we learn and react to based on our past lessons. The point is around the utility of science and technology to solve specific problems which confront us.
You can find all code and training content in the GitHub repo Markov Chain & Yoga. The repo is sort of bare, will add details around pre-requisites when I get a moment.
PS: As a bonus check this visual explanation of Markov Chains, makes it easier to understand what is happening in our example.