Showing posts with label Semantic Web. Show all posts
Showing posts with label Semantic Web. Show all posts

Supervised Machine Learning

Introduction

When I was born, I was ignorant but very curious. I always asked
questions : "What? Where? Why? How?" And my parents guided me through
this jungle of knowledge, telling me about things that really mattered
and leaving what was complicated to the future.

As I grew up, I started learning about the very complex things by
cross referencing it with my already existent knowledge. If the new
incidents were not in my knowledge database, I queried the books,
articles and lately Google. Thus from an ignorant child with an empty
knowledge database, I grew up to become a knowledgeable person.

Let me take an example of how I learned English.

I am from India, English is neither my nor my parents' first language.
However, they really wanted me to learn how to read, write and speak
in English. They couldn't teach me by changing their mother tongue.
What they did, instead, was that they taught me the alphabets, a few
easy words and the basic grammar constructs. Then they gave me a
dictionary and a grammar book. I referenced new words that I saw with
the dictionary and new sentence formations with the grammar book. If I
still couldn't understand what was written, I asked my teacher.

Now when I come across anything written in English, I cross reference
it with the vocabulary and grammar that I have learned all these
years. If something new comes up, I check it up online or with my
peers. Thus I build upon my knowledge about English in an ongoing,
continuous process.

Implementation in Machine Learning

Now, building on this very basic pattern how any human child learns
about this physical world, we can model this learning process for the
machines. This is a brief outline of the idea (here I take an example
of the English Language, but this is equally applicable to any other
natural language or any other real world entity):

"In the same way as I learnt about the world, Machines can also be
taught. In the same way that we teach a child how to read and
understand the English Language, a computer can be taught to recognize
patterns of English alphabets as words and can reference its meaning
with its dictionary (which is built by human assistance). The machine
can gradually build up its vocabulary by reading more and finding the
meanings of new words and sentence constructs online. If it finds
something which is complex enough, it can ask its human tutor its
meaning.

Given the amount of cheap storage, memory and computing power we have
in our hands today, I think it is a perfectly doable thing. I just
need time and resources to do it."

Value of this research

Why is it important for a machine to learn a natural language? The
answer is simple, if machine learns a natural language, it can mine
the enormous amount of data on the internet to learn about any other
discipline. We will provide the world with the perfect student. It can
learn financial modelling from a human tutor and apply it to the
streaming financial data from the internet to predict future trends.
Maybe even produce its own models. It can learn biotechnology from its
human tutor and mine the huge available pathological data on the
internet to produce new medicines. The application possibilities seem
to be bound only by human imaginations.

Current Status of my Research

Right now, I am working on a search engine which crawls the
blogosphere and indexes blogposts according to their emotional
quotient. Its a small step in the final scheme of things but this is
what I can do with the time and resources that I have (I have a '9 to
6' day job as a software engineer). I would love to devote more time
to this idea, given the mundane activities (like earning my bread and
butter) is taken care of :-).

I am not sure if this idea has been implemented previously anywhere.
Also, I acknowledge it is a very simple solution to a seemingly very
complex problem. However, I feel that this is the most suitable
solution. If we humans can do it then so can the machines.

Back after a long time...

Well... well... well... look who is back!!

Anyways, there are no readers to complain. So no apologies offered. Anyways, now to the main topic of today's post.

I am converting this blog to the chronicle of my quest for the Semantic Web.

The posts now will be related to Web 3.0 (Semantic Web) and various tools required to build a POC on my previous hypothesis, PHP, Javascript, MySql, CSS and Ajax.

Hope the world doesn't come apart before I complete my POC (What with the 2012 prophecy, you never know what's coming).

WEB 3.0 : The new frontier




I am an avid googler and swear by the Wikipedia, but a few days ago I was let down big time by the Google. I was to go to Ooty and was searching for suitable accommodation over there. Can you believe google gave me some 8470 results, out of which I read the first 20 but still couldn't get what I wanted.

Anyways, I decided that these kinda queries should be answered in person and no google should be given the authority to dictate the beds in which I sleep in. However, all the while I was driving to Ooty, one thought kept troubling me. Why can't someone dig all the wealth of information in the blogs and user reviews out there and provide me with a simple choice of no more than 10 hotels/homestays each in a different price range and with the maximum number of favorable reviews in their range?

This question is the ultimate frontier for Internet searching. Providing relevant search results have been the nightmare for all the search algorithm writers for over two decades now. Since the birth of the Internet, people have been collecting data on the cloud. How nice it would be to go through this huge amount of data and come back with the most relevant piece of information.

All this and some more spiked me to do some more googling on such searching and voila! I came up with a new jargon "WEB 3.0". Well, not actually new, I had heard a lot about semantic and vertical searching and have dirtied my hands trying web page - scraping but had never given a serious thought to this so called "natural language searching".

Now, lets take the most basic question : What is, or rather will be, WEB 3.0? Everybody has got his own opinion about what it will be, I also have mine. For me, WEB 3.0 will be a paradigm shift from what we know of the Internet as of today. I mean, 10 years down the line there won't be any website as we see today. What will be is a huge repository of data, which will be essentially user and community generated and we will be able to access the data in whatever format we like and we won't need a conventional computer to do that (anyways PCs will have shrunk to the size of a laser device mounted on our ears which can project images on any surface or play sounds from their ear buds). This data will be rendered in whatever format we like using our previous preferences and could be changed whenever we want to. Hmmm! Quite futuristic huh. Wait for another 10 years sweetheart.

But before we leap 10 years in the time warp, lets think if we can do anything about what we have here today. Maybe, maybe not. Lets break this huge insurmountable problem into somewhat smaller and manageable issues. As we know most of the searches in future will be like :

a. I want to go to a happy place.
b. I would like to read a sad, romantic story.
c. I would like to have delicious, Chinese, home cooked food.

Now all these questions today are answered by searching for keywords from the existing pages. However, we are talking about a search in which the search engine crawls all through the blogs and forums and other community sites and get the reactions of people for the various options possible and then show the results. Whew! That's a huge requirement in itself. Now let us break this into a further smaller problems.

The biggest issue here is how does the search engine understand the emotions portrayed by the web page? Most of the searches can be made easy if the pages can be ranked according to their EQ. To answer this let me ask another question: How do we gauge the emotional state of something we read? Simple! By looking at the keywords which our parents and teachers taught us to identify with certain emotions. Similarly if we keep a data base where in the search engine can refer to and find out the number of sad or happy keywords, then it can gauge the overall emotional state of the page.

So one problem solved. Similarly I will try to tackle different issues as and when I get time and in the end we will have the model of a basic WEB 3.0 search engine.