Thursday, April 14, 2016

A more in-depth look at training

In the previous post, I explained the system I'm using to train a main-entity extraction model, but in very general terms. Here is a more detailed explanation.

The notion of "main entity" of a question is vague. Consider the following question: "How many benches are in Central Park?" Is the question asking about benches or Central Park? The answer comes from how information is structured in Freebase. If there was a topic in Freebase for benches and it had a property that represented the number of benches in Central Park, then it would be logical to call "benches" the main entity of the question. However, it is more likely that there is a topic in Freebase for Central Park and it has a property that represents the number of benches. In this (more plausible) case, the main entity is Central Park.

If you don't leverage some kind of training system and just read questions without any prior knowledge, you will run into this problem. I know that because I did during the earlier stages of this project; I tried to just look at the KDG of the sentence and guess where the main entity was. This turned out to be a terribly inaccurate process. With training, however, my system gets an idea of where the main entity is located based on question structure, so it reduces the amount of "guessing" you have to do.

"Training" is a vague term too, when it comes to computers. When I refer to training, I mean the following: feeding a system a number of (input, desired_output) pairs and getting back a model that can guess desired_output based on input that it hasn't already seen before. Here is a simple example. John is training Bob. Bob knows that when John says a number, he should say a number back; input is a number and desired_output is a number. John tells him that when he hears 2, he should say 4; when he hears 3, he should say 6; when he hears 4, he should hear 8; and continues this for a hundred more numbers or so. By now, Bob has a pretty good idea of what to do even if he doesn't know the exact desired output for the input he is given; just double the input. To test him, John tells him "1" and he correctly responds "2" even though he had never seen that example before. Based on training data supplied by John, Bob has created a model of the system John has designed.

In the scope of this project, input is the KDG of the question being asked and the desired output is the PATH to the main entity in the graph. Remember that since a KDG is a directed tree (acyclic connected graph), there's exactly one path from the root node to every other node. I'll illustrate this with the example from my last post. The sentence is "How many schools are in the school district of Philadelphia" and the KDG looks like this:


The correct main entity of this question is the school district of Philadelphia because there happens to be a topic on Freebase for it with a property that lists the schools in it. The "district-8" node is the node which represents this entity, so the correct path to the node from the root node ("are-4") is just "is_inside_location". If the main entity was, for example, Philadelphia, then the path would be "is_inside_location -> is_part_of".

The pair for this example would look like this: (KDG of "how many schools are in the school istrict of Philadelphia?", is_inside_location). Now imagine there are hundreds of these. How do you use all this information to predict the correct path given a NEW KDG? You have to compare it to all the existing ones and see if the STRUCTURE matches. If the structure matches the KDG in the pair, then there's a really good chance then the path listed in the pair is correct for the new KDG too. Structure in this sense means the vague structure of the graph: "does it have an agent edge?" or "does it have a is_inside_location edge?" and so on. You can't just check if the KDG itself matches any known KDG because you won't have seen it before. Back to our John and Bob example, if Bob hears 1, he can't try to remember what John said the correct answer for 1 because he just doesn't know. Instead, he generalizes based on the patterns he saw.

That's the whole idea. I'd like to know if I explained this well, so leave a comment if I lost you somewhere and you still want to understand.

I've already implemented this system, and I'm preparing to run it on the test cases that Free917 provides to see how accurate it is in predicting the main entity of a question. Resuls from that should be one or two posts from now.

No comments:

Post a Comment