Friday, April 22, 2016

Results

Though preliminary, my experiments show promising results. I ran the training program on the training split of Free917 to get the model trained then tested it against the test split:

Total:                276
Total not broken:     206
Total simple:         158
Total found:          97 (61% simple)
Total not found:      61

I only count simple questions (easy object -> property questions) because it's too tedious to generate training data for non simple sentences. It's important to note, however, that my method, however, isn't bound to only simple questions. The real takeaway is that this method for extracting the main entity from the sentence works 61% of the time, which is higher than I was expecting.

Looking at errors:
  • Question pattern lookup fail: The question pattern simply wasn't recorded in the training set. This is solved by adding a couple more questions to the training set to cover the missed cases
  • Freebase search fail: The words extracted from the KDG are insufficient to find the correct entity on Freebase. This problem is much more difficult to solve.
  • Further K-parser errors: This problem is out of my control.
I have two options now: find out how to improve on the 61% as much as possible or start working on the other half of the problem, getting the "property" out of the KDG. The latter is more interesting but I'm not sure if it's even possible.

Thursday, April 14, 2016

A more in-depth look at training

In the previous post, I explained the system I'm using to train a main-entity extraction model, but in very general terms. Here is a more detailed explanation.

The notion of "main entity" of a question is vague. Consider the following question: "How many benches are in Central Park?" Is the question asking about benches or Central Park? The answer comes from how information is structured in Freebase. If there was a topic in Freebase for benches and it had a property that represented the number of benches in Central Park, then it would be logical to call "benches" the main entity of the question. However, it is more likely that there is a topic in Freebase for Central Park and it has a property that represents the number of benches. In this (more plausible) case, the main entity is Central Park.

If you don't leverage some kind of training system and just read questions without any prior knowledge, you will run into this problem. I know that because I did during the earlier stages of this project; I tried to just look at the KDG of the sentence and guess where the main entity was. This turned out to be a terribly inaccurate process. With training, however, my system gets an idea of where the main entity is located based on question structure, so it reduces the amount of "guessing" you have to do.

"Training" is a vague term too, when it comes to computers. When I refer to training, I mean the following: feeding a system a number of (input, desired_output) pairs and getting back a model that can guess desired_output based on input that it hasn't already seen before. Here is a simple example. John is training Bob. Bob knows that when John says a number, he should say a number back; input is a number and desired_output is a number. John tells him that when he hears 2, he should say 4; when he hears 3, he should say 6; when he hears 4, he should hear 8; and continues this for a hundred more numbers or so. By now, Bob has a pretty good idea of what to do even if he doesn't know the exact desired output for the input he is given; just double the input. To test him, John tells him "1" and he correctly responds "2" even though he had never seen that example before. Based on training data supplied by John, Bob has created a model of the system John has designed.

In the scope of this project, input is the KDG of the question being asked and the desired output is the PATH to the main entity in the graph. Remember that since a KDG is a directed tree (acyclic connected graph), there's exactly one path from the root node to every other node. I'll illustrate this with the example from my last post. The sentence is "How many schools are in the school district of Philadelphia" and the KDG looks like this:


The correct main entity of this question is the school district of Philadelphia because there happens to be a topic on Freebase for it with a property that lists the schools in it. The "district-8" node is the node which represents this entity, so the correct path to the node from the root node ("are-4") is just "is_inside_location". If the main entity was, for example, Philadelphia, then the path would be "is_inside_location -> is_part_of".

The pair for this example would look like this: (KDG of "how many schools are in the school istrict of Philadelphia?", is_inside_location). Now imagine there are hundreds of these. How do you use all this information to predict the correct path given a NEW KDG? You have to compare it to all the existing ones and see if the STRUCTURE matches. If the structure matches the KDG in the pair, then there's a really good chance then the path listed in the pair is correct for the new KDG too. Structure in this sense means the vague structure of the graph: "does it have an agent edge?" or "does it have a is_inside_location edge?" and so on. You can't just check if the KDG itself matches any known KDG because you won't have seen it before. Back to our John and Bob example, if Bob hears 1, he can't try to remember what John said the correct answer for 1 because he just doesn't know. Instead, he generalizes based on the patterns he saw.

That's the whole idea. I'd like to know if I explained this well, so leave a comment if I lost you somewhere and you still want to understand.

I've already implemented this system, and I'm preparing to run it on the test cases that Free917 provides to see how accurate it is in predicting the main entity of a question. Resuls from that should be one or two posts from now.

Sunday, April 10, 2016

Main object extraction

As I wrote earlier, the first step of answering a question is finding out what it's asking about. The question "How old do you have to be to play Monopoly?" is asking about the board game Monopoly. I've proposed a method to train a model to be able to do this using K-parser.

I'll illustrate my process with a more complex example: "how many schools are in the school district of philadelphia?" The KDG looks like this (for brevity, I replaced every element with its class and stripped unimportant nodes):

root: be
  agent: schools

    trait: ?
      subclass_of: quantity
  is_inside_location: district
     complement_word: school
     is_part_of: philadelphia
(Take note that the ? with the subclass "quantity" represents the words "How many...?")

I have an operation that takes a node of the graph and returns a list of strings. It builds a list of strings out of the node's name, and the names of any "complement_word", "trait", "is_part_of", etc. nodes that come out of it. For example, the strings for "district" would be {"district", "school", "philadelphia"}. I then can concatenate them ("district school philadelphia") and search for that using Google's Freebase API.

The trick here is finding the correct node to do this on. One node in the graph (and the subgraph that stems from it) usually represents the main object of the question. I think a model can be trained with (graph structure, path to correct node) pairs. These pairs can be extracted from the training set by taking each KDG and using brute force to find the correct path. It works on most of the sentences that K-parser correctly parses; my experiments right now give me 75% but I'm sure it can be improved to around 90%.

The crux of the idea is that the actual entities do not matter in finding the main entity, only the words between them. With a single training example: "how many schools are in the school district of philadelphia," the model can now extract the main object of any question of the form "how many X are in Y." With over 600 examples, this system should be able to generalize to all forms of questions. However, whether that is truly the case remains to be seen, because I haven't implemented this system yet.

Saturday, April 2, 2016

Breaking the problem down further

This past week or two I've been classifying each training example from Free917 based on how "easy" it is to answer. The only criteria is whether outside information is needed to answer the question, and an example where this is evident is the old Monopoly question: How old do you have to play Monopoly? To answer this question, not only does the computer need to know the meanings of all the words in this sentence, but also that "how old do you have to be" denotes that we are looking for some kind of "minimum age" of the Monopoly board game (information that does exist in Freebase). An example of an easy question would be something like "Who is the pitcher for the Chicago Cubs." The only information needed is Chicago Cubs and pitcher, which are both readily available; no extra intelligence is needed.
The label 1 means the question is easy, and 0 means it is not.

My other task is determining how to answer the "easy" questions, and at this point it seems the simplistic system I described a couple of posts ago can be adequate, with some tuning.

Sunday, March 20, 2016

Reading papers

I've been pretty much dry on ideas of how to improve the model I described in the previous post so under the advice of the Ph.D. student I'm working with I've been reading papers that pertain to the same problem. Here are the main ones:

Large-scale Semantic Parsing via Schema Matching and Lexicon Extension: http://cis-linux1.temple.edu/~yates/papers/textual-schema-matching.pdf

Semantic Parsing via Paraphrasing: http://cs.stanford.edu/~pliang/papers/paraphrasing-acl2014.pdf

Semantic Parsing on Freebase from Question-Answer Pairs: http://cs.stanford.edu/~pliang/papers/freebase-emnlp2013.pdf

Enhancing Freebase Question Answering Using Textual Evidence (very recent): http://arxiv.org/abs/1603.00957

Other than that, there's really not much to talk about about the past week. I did spend a lot of time learning about machine learning (partly out of curiosity, but also because I may need it later in this project), specifically artificial neural networks (ANNs). On the surface, the idea sounds simple but I found the details difficult to wrap my mind around even after reading quite a few introductory articles on them, including the Wikipedia page. The one that made it actually "click" was this one.

Friday, March 11, 2016

A simple plan of action

I think I've pretty much figured out what I'm going to do. An important observation when looking at the example questions provided in Free917 is that most end up simply asking for a property of an entity. For example, one of the questions is asking "What was the cover price of X-Men issue 1?" All the computer needs to do here is find the entity that represents "X-Men issue 1" and access the property "cover price." Of course, translating the actual utterance "X-Men issue 1" is a whole other task but once it's been done then the rest is trivial. More than half of the questions in Free917 are similar to this. This means that getting the answer to these questions is really simple: extract the entity from the KDG and then extract whatever property is being asked. Here's an example:

Here is the KDG (in list-form) of the question "What type of tea is gunpowder tea?" (I actually have no idea what gunpowder tea but this question illustrates the example well.)

root: E<is-5>
  agent: type-2
    product_of: tea-4
      instance_of: <tea>
        is_subclass_of: <food>
    trait: ?-1
      instance_of: <?>
        is_subclass_of: <object>
    instance_of: <type>
      is_subclass_of: <cognition>
  recipient: tea-7
    complement_word: gunpowder-6
      instance_of: <gunpowder>
    instance_of: <tea>
      is_subclass_of: <food>
  instance_of: <be>
    is_subclass_of: <stative>

From here, I am easily able to extract {tea, gunpowder} as the words describing the entity in question and {type, tea} as the property being requested of the entity. This can easily be done in the same fashion for all questions.

Of course, there are different kinds of questions and some of them are too difficult to deal with. One other kind, however, that I can deal with is questions that ask "How many ...?" which is simply querying for an entity and counting the number of results. There are, however, some questions, like "What is the average temperature in Sydney in August?" that cannot be answered in this way.

Update to the previous post about named-entity recognition:

I overlooked the actual Freebase website: http://www.freebase.com/. It provides a robust search feature that returns what you'd expect it to, and it also lists properties of each entity nicely. The only question is whether it has a nice API to use. (API stands for Application Program Interface, and it's one way for computer programs to use services provided online.)


Friday, March 4, 2016

The challenge of named-entity recognition

One of the biggest challenges that I face is named-entity recognition. I'll illustrate the meaning of the term with an example.

Imagine you have a list of directions written in a language you do not know. One of them reads:
Gok bok nok
You haven't the slightest clue as to what this means, until someone tells you "gok bok" roughly translates to "get." Now you just have to figure out what "nok" means and retrieve it, so you go to the library and ask the person working there for "nok." They reply with the question "Brok nok ork grok nok?" Your limited capabilities with this language tell you that the clerk is asking you to specify which kind of nok you want, as there are two different meanings of the word, one relating to "brok" and one to "grok."

You have no idea which one you need so you go back to the person who gave you the instructions to get some context. However, he just gives you some more gibberish that you have to translate and somehow relate to the words "grok" and "brok," the two different kind of noks to find out which nok you need.

It should be clear where this is going; you are the computer trying to figure out what an English sentence means and you've been thrown an ambiguous word. In the question "Who is the pitcher for the Chicago Cubs?" the meaning of the word "pitcher" might seem obvious to someone who knows what the Chicago Cubs are, and it's possible for computers to deal with this kind of ambiguity (is it a tool for holding liquids or a baseball player?) but not without significant work.

However, in my case I face another challenge on top of this. in the above example I assumed that the worker had access to a library that knew the meanings of the word "nok." I've been looking for one, and I've come across two serious candidates. My needs include complete search results (I should find what I'm looking for) and each result must be linked to its counterpart in Freebase.

The first is Google's Knowledge Graph (wiki) which Google claims is the successor of Freebase (which I find dubious, for reasons outside the scope of this post). Put mildly, its searching capability is miserable. Here are the results obtained from searching for "First Amendment" (an entity I know for a fact exists in Freebase):
First amendment, Book by Ashley McConnell (score 142.049927)
First Amendment, Song by Silent Civilian (score 141.980042)
First Amendment, Musical Group (score 141.068527)
First Amendment, TV Episode (score 137.776077)
First Amendment, Song by Silent Civilian (score 126.532188)
Where's the actual amendent?! Unless there's something I'm completely missing, Google's Knowledge Graph does not contain the information I need. This is saddening because this is the closest knowledge base to Freebase that is easy to use- each result that I get is directly linked to the corresponding entity in Freebase. The other option I tried has a much better search yet lacks this.

It's called "WordNet" and I am content with the amount of information it has. The search for "First Amendment" returns the result I'm looking for:
  • S: (n) First Amendment (an amendment to the Constitution of the United States guaranteeing the right of free expression; includes freedom of assembly and freedom of the press and freedom of religion and freedom of speech)
However, since I need to go from "First Amendment" to the correct entity in Freebase (something Google's product allows but this one does not), it will be difficult to use this service. Simply knowing the definition will not help.

There is a third option- perform a direct search on Freebase itself, but getting a copy of Freebase up and running is not something I can do with this laptop. Here is an excerpt from the instructions:
make sure you have at least 60GB of memory
Doesn't look like the 8GB I have here is going to do. I'm currently working to get access to a computing cluster at ASU that will be able to put up a copy of Freebase.