Friday, April 22, 2016

Results

Though preliminary, my experiments show promising results. I ran the training program on the training split of Free917 to get the model trained then tested it against the test split:

Total:                276
Total not broken:     206
Total simple:         158
Total found:          97 (61% simple)
Total not found:      61

I only count simple questions (easy object -> property questions) because it's too tedious to generate training data for non simple sentences. It's important to note, however, that my method, however, isn't bound to only simple questions. The real takeaway is that this method for extracting the main entity from the sentence works 61% of the time, which is higher than I was expecting.

Looking at errors:
  • Question pattern lookup fail: The question pattern simply wasn't recorded in the training set. This is solved by adding a couple more questions to the training set to cover the missed cases
  • Freebase search fail: The words extracted from the KDG are insufficient to find the correct entity on Freebase. This problem is much more difficult to solve.
  • Further K-parser errors: This problem is out of my control.
I have two options now: find out how to improve on the 61% as much as possible or start working on the other half of the problem, getting the "property" out of the KDG. The latter is more interesting but I'm not sure if it's even possible.

Thursday, April 14, 2016

A more in-depth look at training

In the previous post, I explained the system I'm using to train a main-entity extraction model, but in very general terms. Here is a more detailed explanation.

The notion of "main entity" of a question is vague. Consider the following question: "How many benches are in Central Park?" Is the question asking about benches or Central Park? The answer comes from how information is structured in Freebase. If there was a topic in Freebase for benches and it had a property that represented the number of benches in Central Park, then it would be logical to call "benches" the main entity of the question. However, it is more likely that there is a topic in Freebase for Central Park and it has a property that represents the number of benches. In this (more plausible) case, the main entity is Central Park.

If you don't leverage some kind of training system and just read questions without any prior knowledge, you will run into this problem. I know that because I did during the earlier stages of this project; I tried to just look at the KDG of the sentence and guess where the main entity was. This turned out to be a terribly inaccurate process. With training, however, my system gets an idea of where the main entity is located based on question structure, so it reduces the amount of "guessing" you have to do.

"Training" is a vague term too, when it comes to computers. When I refer to training, I mean the following: feeding a system a number of (input, desired_output) pairs and getting back a model that can guess desired_output based on input that it hasn't already seen before. Here is a simple example. John is training Bob. Bob knows that when John says a number, he should say a number back; input is a number and desired_output is a number. John tells him that when he hears 2, he should say 4; when he hears 3, he should say 6; when he hears 4, he should hear 8; and continues this for a hundred more numbers or so. By now, Bob has a pretty good idea of what to do even if he doesn't know the exact desired output for the input he is given; just double the input. To test him, John tells him "1" and he correctly responds "2" even though he had never seen that example before. Based on training data supplied by John, Bob has created a model of the system John has designed.

In the scope of this project, input is the KDG of the question being asked and the desired output is the PATH to the main entity in the graph. Remember that since a KDG is a directed tree (acyclic connected graph), there's exactly one path from the root node to every other node. I'll illustrate this with the example from my last post. The sentence is "How many schools are in the school district of Philadelphia" and the KDG looks like this:


The correct main entity of this question is the school district of Philadelphia because there happens to be a topic on Freebase for it with a property that lists the schools in it. The "district-8" node is the node which represents this entity, so the correct path to the node from the root node ("are-4") is just "is_inside_location". If the main entity was, for example, Philadelphia, then the path would be "is_inside_location -> is_part_of".

The pair for this example would look like this: (KDG of "how many schools are in the school istrict of Philadelphia?", is_inside_location). Now imagine there are hundreds of these. How do you use all this information to predict the correct path given a NEW KDG? You have to compare it to all the existing ones and see if the STRUCTURE matches. If the structure matches the KDG in the pair, then there's a really good chance then the path listed in the pair is correct for the new KDG too. Structure in this sense means the vague structure of the graph: "does it have an agent edge?" or "does it have a is_inside_location edge?" and so on. You can't just check if the KDG itself matches any known KDG because you won't have seen it before. Back to our John and Bob example, if Bob hears 1, he can't try to remember what John said the correct answer for 1 because he just doesn't know. Instead, he generalizes based on the patterns he saw.

That's the whole idea. I'd like to know if I explained this well, so leave a comment if I lost you somewhere and you still want to understand.

I've already implemented this system, and I'm preparing to run it on the test cases that Free917 provides to see how accurate it is in predicting the main entity of a question. Resuls from that should be one or two posts from now.

Sunday, April 10, 2016

Main object extraction

As I wrote earlier, the first step of answering a question is finding out what it's asking about. The question "How old do you have to be to play Monopoly?" is asking about the board game Monopoly. I've proposed a method to train a model to be able to do this using K-parser.

I'll illustrate my process with a more complex example: "how many schools are in the school district of philadelphia?" The KDG looks like this (for brevity, I replaced every element with its class and stripped unimportant nodes):

root: be
  agent: schools

    trait: ?
      subclass_of: quantity
  is_inside_location: district
     complement_word: school
     is_part_of: philadelphia
(Take note that the ? with the subclass "quantity" represents the words "How many...?")

I have an operation that takes a node of the graph and returns a list of strings. It builds a list of strings out of the node's name, and the names of any "complement_word", "trait", "is_part_of", etc. nodes that come out of it. For example, the strings for "district" would be {"district", "school", "philadelphia"}. I then can concatenate them ("district school philadelphia") and search for that using Google's Freebase API.

The trick here is finding the correct node to do this on. One node in the graph (and the subgraph that stems from it) usually represents the main object of the question. I think a model can be trained with (graph structure, path to correct node) pairs. These pairs can be extracted from the training set by taking each KDG and using brute force to find the correct path. It works on most of the sentences that K-parser correctly parses; my experiments right now give me 75% but I'm sure it can be improved to around 90%.

The crux of the idea is that the actual entities do not matter in finding the main entity, only the words between them. With a single training example: "how many schools are in the school district of philadelphia," the model can now extract the main object of any question of the form "how many X are in Y." With over 600 examples, this system should be able to generalize to all forms of questions. However, whether that is truly the case remains to be seen, because I haven't implemented this system yet.

Saturday, April 2, 2016

Breaking the problem down further

This past week or two I've been classifying each training example from Free917 based on how "easy" it is to answer. The only criteria is whether outside information is needed to answer the question, and an example where this is evident is the old Monopoly question: How old do you have to play Monopoly? To answer this question, not only does the computer need to know the meanings of all the words in this sentence, but also that "how old do you have to be" denotes that we are looking for some kind of "minimum age" of the Monopoly board game (information that does exist in Freebase). An example of an easy question would be something like "Who is the pitcher for the Chicago Cubs." The only information needed is Chicago Cubs and pitcher, which are both readily available; no extra intelligence is needed.
The label 1 means the question is easy, and 0 means it is not.

My other task is determining how to answer the "easy" questions, and at this point it seems the simplistic system I described a couple of posts ago can be adequate, with some tuning.

Sunday, March 20, 2016

Reading papers

I've been pretty much dry on ideas of how to improve the model I described in the previous post so under the advice of the Ph.D. student I'm working with I've been reading papers that pertain to the same problem. Here are the main ones:

Large-scale Semantic Parsing via Schema Matching and Lexicon Extension: http://cis-linux1.temple.edu/~yates/papers/textual-schema-matching.pdf

Semantic Parsing via Paraphrasing: http://cs.stanford.edu/~pliang/papers/paraphrasing-acl2014.pdf

Semantic Parsing on Freebase from Question-Answer Pairs: http://cs.stanford.edu/~pliang/papers/freebase-emnlp2013.pdf

Enhancing Freebase Question Answering Using Textual Evidence (very recent): http://arxiv.org/abs/1603.00957

Other than that, there's really not much to talk about about the past week. I did spend a lot of time learning about machine learning (partly out of curiosity, but also because I may need it later in this project), specifically artificial neural networks (ANNs). On the surface, the idea sounds simple but I found the details difficult to wrap my mind around even after reading quite a few introductory articles on them, including the Wikipedia page. The one that made it actually "click" was this one.

Friday, March 11, 2016

A simple plan of action

I think I've pretty much figured out what I'm going to do. An important observation when looking at the example questions provided in Free917 is that most end up simply asking for a property of an entity. For example, one of the questions is asking "What was the cover price of X-Men issue 1?" All the computer needs to do here is find the entity that represents "X-Men issue 1" and access the property "cover price." Of course, translating the actual utterance "X-Men issue 1" is a whole other task but once it's been done then the rest is trivial. More than half of the questions in Free917 are similar to this. This means that getting the answer to these questions is really simple: extract the entity from the KDG and then extract whatever property is being asked. Here's an example:

Here is the KDG (in list-form) of the question "What type of tea is gunpowder tea?" (I actually have no idea what gunpowder tea but this question illustrates the example well.)

root: E<is-5>
  agent: type-2
    product_of: tea-4
      instance_of: <tea>
        is_subclass_of: <food>
    trait: ?-1
      instance_of: <?>
        is_subclass_of: <object>
    instance_of: <type>
      is_subclass_of: <cognition>
  recipient: tea-7
    complement_word: gunpowder-6
      instance_of: <gunpowder>
    instance_of: <tea>
      is_subclass_of: <food>
  instance_of: <be>
    is_subclass_of: <stative>

From here, I am easily able to extract {tea, gunpowder} as the words describing the entity in question and {type, tea} as the property being requested of the entity. This can easily be done in the same fashion for all questions.

Of course, there are different kinds of questions and some of them are too difficult to deal with. One other kind, however, that I can deal with is questions that ask "How many ...?" which is simply querying for an entity and counting the number of results. There are, however, some questions, like "What is the average temperature in Sydney in August?" that cannot be answered in this way.

Update to the previous post about named-entity recognition:

I overlooked the actual Freebase website: http://www.freebase.com/. It provides a robust search feature that returns what you'd expect it to, and it also lists properties of each entity nicely. The only question is whether it has a nice API to use. (API stands for Application Program Interface, and it's one way for computer programs to use services provided online.)


Friday, March 4, 2016

The challenge of named-entity recognition

One of the biggest challenges that I face is named-entity recognition. I'll illustrate the meaning of the term with an example.

Imagine you have a list of directions written in a language you do not know. One of them reads:
Gok bok nok
You haven't the slightest clue as to what this means, until someone tells you "gok bok" roughly translates to "get." Now you just have to figure out what "nok" means and retrieve it, so you go to the library and ask the person working there for "nok." They reply with the question "Brok nok ork grok nok?" Your limited capabilities with this language tell you that the clerk is asking you to specify which kind of nok you want, as there are two different meanings of the word, one relating to "brok" and one to "grok."

You have no idea which one you need so you go back to the person who gave you the instructions to get some context. However, he just gives you some more gibberish that you have to translate and somehow relate to the words "grok" and "brok," the two different kind of noks to find out which nok you need.

It should be clear where this is going; you are the computer trying to figure out what an English sentence means and you've been thrown an ambiguous word. In the question "Who is the pitcher for the Chicago Cubs?" the meaning of the word "pitcher" might seem obvious to someone who knows what the Chicago Cubs are, and it's possible for computers to deal with this kind of ambiguity (is it a tool for holding liquids or a baseball player?) but not without significant work.

However, in my case I face another challenge on top of this. in the above example I assumed that the worker had access to a library that knew the meanings of the word "nok." I've been looking for one, and I've come across two serious candidates. My needs include complete search results (I should find what I'm looking for) and each result must be linked to its counterpart in Freebase.

The first is Google's Knowledge Graph (wiki) which Google claims is the successor of Freebase (which I find dubious, for reasons outside the scope of this post). Put mildly, its searching capability is miserable. Here are the results obtained from searching for "First Amendment" (an entity I know for a fact exists in Freebase):
First amendment, Book by Ashley McConnell (score 142.049927)
First Amendment, Song by Silent Civilian (score 141.980042)
First Amendment, Musical Group (score 141.068527)
First Amendment, TV Episode (score 137.776077)
First Amendment, Song by Silent Civilian (score 126.532188)
Where's the actual amendent?! Unless there's something I'm completely missing, Google's Knowledge Graph does not contain the information I need. This is saddening because this is the closest knowledge base to Freebase that is easy to use- each result that I get is directly linked to the corresponding entity in Freebase. The other option I tried has a much better search yet lacks this.

It's called "WordNet" and I am content with the amount of information it has. The search for "First Amendment" returns the result I'm looking for:
  • S: (n) First Amendment (an amendment to the Constitution of the United States guaranteeing the right of free expression; includes freedom of assembly and freedom of the press and freedom of religion and freedom of speech)
However, since I need to go from "First Amendment" to the correct entity in Freebase (something Google's product allows but this one does not), it will be difficult to use this service. Simply knowing the definition will not help.

There is a third option- perform a direct search on Freebase itself, but getting a copy of Freebase up and running is not something I can do with this laptop. Here is an excerpt from the instructions:
make sure you have at least 60GB of memory
Doesn't look like the 8GB I have here is going to do. I'm currently working to get access to a computing cluster at ASU that will be able to put up a copy of Freebase.

Friday, February 26, 2016

Agent-hoisting

Before I get to the main point of this post, here's a link to the GitHub page where I'll be putting up the code that I write for this project. For the unfamiliar, GitHub is a place to upload source code, and it is an integral component of the Open Source movement, a trend in the last decade of making source code public. I use GitHub frequently, as you can see if you look at my profile where I have listed several other projects.

"Hoisting" is a term I just invented to describe a transformation that, when used on the correct node of the knowledge description graph (KDG), I propose will simplify the task of converting between KDGs and other logical forms (and I'll get to this later). It applies to acyclic directed graphs, which KDGs are. Imagine the following directed graph:

A->B->E
|
v
C->D

This graph satisfies the acyclic condition because you can't start at any node and get back to it by following arrows. If we envision each arrow as a "step down," then A is the "top" node because everything is "down" from it. To "hoist" a node, we change the arrows to make it the top node.

For example, let's hoist E:

A<-B<-E
|
v
C->D

Now E is the "highest" node in the graph. However, while doing this one needs to remember that the arrows have names that will not be
the same when the arrow is reversed. Look at an example of a KDG (previous post) and you'll see things like "agent" and "instance_of." These describe the relation between nodes. For example, if node A (in our above example) is an event and B is the entity taking part in the event A, then the edge connecting A to be would be called "agent." This is simply convention for denoting the entity that is performing the action. The "instance_of" edge is used in classifying entities. Node E could be the general category (called a "Class") that B falls into, so the edge going from B to E would be an "instance_of" edge, telling us that entity B is an instance of class E. When reversing these arrows, we need to "invert" the meaning of the edge as well. If A to B is an agent edge, B to A would have to be something like a "performed_action" edge, telling us that B performed the action A.

The motivation for this transformation can be easily seen with an analogy. When translating between two spoken languages, one cannot simply map words from one language onto the other; sentence structure among other factors must be taken into consideration. When translating knowledge description graphs into Lambda DCS queries, there is no direct formula, so some steps must be taken to modify one side to make it easier to translate. I noticed that Lambda DCS queries focus more on the actor rather than the event occurring while KDGs focus on events; most graphs that are generated have events at the top. By hoisting the agent of an event to the top, I can make KDGs focus a little more on the objects that carry out the event rather than the event itself. It's a small step, but a necessary one.

Here is a real example of a KDG and its agent-hoisted version, both of the question "Who killed Abraham Lincoln?" A quick primer on notation: The data is presented in outline format which should be simple to understand. Any "word-5" pretty much just means "word," you can ignore the number. E<something> means "something" is an event. <something> means "something" is a category. Pretty much everything else is an entity. The question mark denotes what is unknown, and in this case, it's who killed Abraham Lincoln.

No hoisting:

root: E<killed-2>
  recipient: Abraham_Lincoln-3
    semantic_role: :corpse
    instance_of: <Abraham_Lincoln>
      is_subclass_of: <person>
  instance_of: <kill>
    is_subclass_of: <contact>
  agent: ?-1
    semantic_role: :killer
    instance_of: <?>
      is_subclass_of: <person>

Agent-hoisting:

root: ?-1
  semantic_role: :killer
  instance_of: <?>
    is_subclass_of: <person>
  performed_action: E<killed-2>
    recipient: Abraham_Lincoln-3
      semantic_role: :corpse
      instance_of: <Abraham_Lincoln>
        is_subclass_of: <person>
    instance_of: <kill>
      is_subclass_of: <contact>

In other news, I will be going to the lab on Monday to find out where I can set up a copy of Freebase, a task that requires far more computing resources than I have.

Friday, February 19, 2016

The advantage of using Kparser

To answer questions, you need to know what they mean. For example, to answer the question "Where did the battle of New Orleans start?" you need to know that "battle of New Orleans" represents a specifc event, "start" represents a more specific event (a specific time during the battle) and "where" represents the location of the event. Kparser is a tool that gets this information out of the text, and it models it as a graph. Each "object" is a node and each relation between objects is a connection between nodes, otherwise called an edge. Here is how the graph of the question above would look:
You can use Kparser too: just go to http://kparser.org and put in a question and see what it gives you. It's not perfect yet; some things don't parse correctly, but really there's no perfect semantic parser yet.

The research I'm doing is motivated by Percy Liang's research into the same idea, answering questions using a semantic parser and Freebase. You can find his main paper on the topic here. At the bottom of this paper in the section "Error analysis" he notes that the semantic parser that he used has difficulty on certain phrases and he notes the example I showed above as one. Kparser handles this sentence just fine, as it does other sentences that did not work with the style of semantic parse Liang used, and this is why Kparser might be better suited for this task, the motivation of my research.

Tuesday, February 9, 2016

Lambda DCS Logical Forms

As I dove into the problem, I needed to learn the basic concepts surrounding it, and this involved learning a way of expressing knowledge base queries: Lambda DCS. I'm converting plain-text questions ("How often does the Washington Post come out?") to a "targetFormula," which is a lambda DCS logical form (DCS stands for dependency-based compositional semantics) and this looks like this:
(!fb:book.periodical_frequency.issues_per_year (!fb:book.periodical.frequency_or_issues_per_year fb:en.the_washington_post))

For some examples, this can get very confusing and isn't what it seems. It might look like simple function application but it is far from that, and I had been staring at these for too long before realizing this. This page explains them in full detail and makes them very easy to understand.

A single entity such as fb:en.the_washington_post is what it is: the Washington Post. More complicated are entities such as fb:book.periodical.frequency_or_issues_per_year store a set of key-value pairs, and in this case, stores each "periodical" along with its corresponding "frequency or issues per year," using the terminology provided. When they're put together inside the parentheses, the second part is matched against the right hand part of the first, and any matching tuples are returned. In this case, because there is a ! before the first part, each tuple it represents will be reversed. It's like saying "give me all the periodicals and their frequency and tell me the frequency of the one called 'the washington post'." Note that this is just the inner section of the logical form, but the outer section does essentially the same thing.

Lambda DCS is an interesting tool to use because it handles data in a novel way, at least to me. I'm excited to see how well Kparser-generated graphs can map onto them.