Crash Course into AI

Subject: Crash Course into AI Mon Nov 26, 2012 7:33 pm

I make this post to teach you about AI, how to implement it and what it can and can't do.

First of all, let me define some term i will use during this thread:

Agent: An agent is any entity that can make decissions. We diferenciate two types of agents, the one(s) we control, and the rest, that we cannot control. In our case, any organism that can perform actions.
World: It is everything around the agent. As with the world being simulated, we will need to keep a balance between realism/accuracy and simplicity. for example, we probably dont care where the closer supernova is.
State: It is the configuration of the world in a given time, where and how everything is.
Action: In a given state, an Agent can take actions that will develop into a diferent state. However, we understaund action by the thing we intend to do, not what we finally do, as it can fail, leading into a diferent state.
Policy (Behavior): In a given state, what the Agent prefers to do.
Utility: How much an state is worth for the agent. In our example, it is something like the sum of health, fatigue, hunger, comfort... (each one with its own coeficient) and what can we expect from the future.

Basically, a rational agent (what AI aims for) is an agent that given a state, can decide what action to take in order to maximize its utility.

There are many many algorithms that can be used to make AIs, but given our problem, only few can be used. Here, I will explain how to implement a model-free reinforced learning algorithm.

We will control only one agent at a time, but multiple ones can be controled just by taking turns. Just remember that every agent will take its own action, and try to maximize its own utility, and not care what the other agents utilities are (societies are much harder).

Every time we take an action, we recieve a reward. Most of the times, the reward is negative (as hunger and fatige increases), but some times we recieve positive rewards (like eating). As such, we need to look at the far future, as in order to eat once, we will need to have walked a lot and maybe fought another agent, which are actions that dont seemvery good at first. We could do this using a search tree (analizing all the possible acions, predicting all possible resulting states, analizing all the actions from that state... and take the path whose rewards are bigger) but that only works with world with few possible actions (like board games). With this restriction, we will need to a 1 step lookahead (analizing which action leads to the best looking inmediate state). But now we will need to know how good a state is, and how good are the things to expect from the state.

Even worse, we have absolutely no idea how will other agents affect us, how will we interact. In order to solve this, I present you the features. Features are functions we can apply to states, and will return a number that will represent how much weight that exact thing has in the state. Then we will need to weight those features depending on how important we think they are. that way, the expected value of a state can be calculated by V(s)=W1*F1(s)+W2*F2(s)...+Wn*Fn(s). Then, the only step we have remaining is deciding the weights of the features.

The features are something that we (the programmers) give to the agent, and it doesent know how important they are. For example, if one feature is 1/distance to creature of species A, then, depending on what is the species A it will want to maximize it (if it is a prey) or minimize it (if it is a predator). Basically we are reducing the state into functions.

In order learn about the world, we will need to update these W depending on what we experience. The way we will do it is by corrections. If we recieve a positive reward, we increase all W depending on how much weight its feature were: being close to that freshly killed corpse you are feeding on is more important than that fruit tree 500 meters away. Wi = Wi + alpha*(correction)*Fi(s), with the correction being the difference between the predicted reward and the actual reward, and alpha being a factor to control learning the higher alpha is, the faster it will learn, but the stepier the learning will be.

If you got to here, either you skipped all the post to the last paragraph or you readed through all of it. Anyway, thanks for reading. There are some lies in there, but i did them in order to make it easier to understaund. Still, i will try to improve this, and in a latter post, point out the lies and complete the information. Leave below wich parts you didnt understaund or want me to explain further.

Subject: Re: Crash Course into AI Mon Nov 26, 2012 8:13 pm

Could you explain how you got the equation below, and maybe explain what the variables stand for?

Quote :: V(s)=W1*F1(s)+W2*F2(s)...+Wn*Fn(s)

Subject: Re: Crash Course into AI Mon Nov 26, 2012 8:34 pm

V(s) is how much the state s is worth.
The W are the weights of every feature. We learn them when trying to maximize rewards.
The F are the features itself, the state that we broke into pieces.

We calculate V(s) by getting every feature from the state s, multiplying that by its weight, and then adding all that values. The numbers, or N or I, are just there to differenciate between all the features.

For example: we have two features, the inverse of the distance to food (F1) and the inverse of the distance to the closer predator (F2). Reinforced learning taught the agent that F1 is worth 1 (being closer to the food is relatively good) and that F2 is worth -20 (being closer to a predator is really bad). If any potential predator is far enough away, the agent will take actions that will get him closer to food, but if any predator gets close by, the much bigger W will make the agent want to go straight away from the predator, not caring about the food.

Even more, we can share this weights among entire species, as they all confront the same circumstances, and it will make sense in the long run as it will mean that new born creatures have a natural instinct to go towards food and away from predators, just as instincts are heredated.

Subject: Re: Crash Course into AI Mon Nov 26, 2012 8:39 pm

Ahh okay. That helps ease out any confusion I had. Also, your example with the predator and the food really displays this model in action well. I can see it is quite an ingenious model.

Newcomer Posts : 24 Reputation : 3 Join date : 2012-09-23

Would it be possible to tie different traits each species or agent has to influence some of the weights in the algorithm?

I.E. aggressiveness, socialization, etc.

I guess I know it would be possible. But in your opinion, what would be some of the better traits to use so that we don't end up having tons of them for every little detail?

Subject: Re: Crash Course into AI Tue Nov 27, 2012 3:14 am

You could tie them, but the algorithm is suposed to have control about them so that it act as optimally as possible, and it will search the optimal strategy to survive (eventually, it will need many deaths before it learns suiciding is bad). However, we can initialize the values to any amount we want, so hopefully it learns that being aggressive is good, but if it is not, it will stop being aggressive. This will also speed up the learning process.

Sadly, choosing the features is the part left to us to do. They are specific to the problem, so there is no list of what we can do. However, if we add a feature that is not usefull to the agent (like distance to fruit to a carnivore), it is smart enough not to use it and give it a very low W (unless it learns that being closer to fruit means being closer to more prey).

Newcomer Posts : 24 Reputation : 3 Join date : 2012-09-23

Daniferrito wrote:: You could tie them, but the algorithm is suposed to have control about them so that it act as optimally as possible, and it will search the optimal strategy to survive (eventually, it will need many deaths before it learns suiciding is bad). However, we can initialize the values to any amount we want, so hopefully it learns that being aggressive is good, but if it is not, it will stop being aggressive. This will also speed up the learning process.

Sadly, choosing the features is the part left to us to do. They are specific to the problem, so there is no list of what we can do. However, if we add a feature that is not usefull to the agent (like distance to fruit to a carnivore), it is smart enough not to use it and give it a very low W (unless it learns that being closer to fruit means being closer to more prey).

Ok, I understand. I guess my thoughts were to try and tie this into the auto-evo functionality for the behavioral side. (Because creatures with more "favorable" behavior are more likely to survive, driving evolution) Although the learning process sounds nice, and is fairly robust, it also seems like it will take a long time and have to go through a lot of trial and error to start establishing good realistic behaviors. This may or may not be true though.

This may be skipping ahead, but how would the weights be modified in the algorithm?

Subject: Re: Crash Course into AI Tue Nov 27, 2012 2:18 pm

gdt1320 wrote:: This may be skipping ahead, but how would the weights be modified in the algorithm?

Actually it is there, althrough i recognize it needs some cleaning up to clear things up.

Once we decide the action, we do it, and after the action is complete, we see what happened. With the action, we will recieve a reward (which can be possitive or negative). If our model was completely right, the reward would be the same reward we expected, and we shouldn't make any corrections in the weights. If what we expected was off, we move the weights to compensate for that error.

Remember, weights are W and features are F

correction = recieved reward - expected reward

Then, we go through all the weights, and we appli the correction this way:

Wi = Wi + alpha*correction*Fi

We are simply adding to every weight the correction multiplied by some factors. We mutiply by the F because if one feature is big, it has more impact in the state that one that is low, and so it should be modified more. The alpha is there to control how fast the agent learns. With a high alpha, it will learn thing fast, and forget about the past easier. With a low alpha, it will learn slower, but will keep more time the things that it learned. Ideally, if the world doesent change, we would want to lower the alpha over time, but i think it should be a static value.

Usually the agents learn things quite fast. For example, in a pacman learning agent i did, the first time it ate a dot, it learned that going towards them was good, and started to go for all the dots inmediatly. The first time it saw a ghost, it tried to go through him, not knowing what would happen and dying. The second time, it prefered the part of the maze that had no ghosts. After one death more, it alredy knew that geting close to them was bad, and avoided them actively.

To speed the process up, we can initialize the W to some pre calculated values depending on the niche the creature is in and the other creatures are. That will make sure they know the basics and they only have to adjust the weights. Maybe with many features it will take longer to learn, as it doesen't know which feature was the good one, but we could also run some simulation beforehand with a high alpha in order to let them learn.

Subject: Re: Crash Course into AI Tue Nov 27, 2012 10:58 pm

This is an excellent thread, and you've explained the process here quite well. the next question is logically one of implementation- how do we go from this learned bhavior AI structure to a general innate structure which can be applied to any member of a species. Remember that our goal is to simulate ecological interactions, and many behaviors (most, for less intelligent organisms like the protists we're starting out with) are innate.

Subject: Re: Crash Course into AI Wed Nov 28, 2012 7:10 am

sciocont, I'm not sure if i understood you well, but I think you want some kind of racial behavior.

I mentioned that briefly a few posts above. Instead of each agent having its own W, we can share them between the entire species, with all agents modifying them as they take actions and all taking from the same numbers.

Another option would be to have two sets of W. One particular to the agent, and the other being still the whole species. Both will act at the same time, and both will be modified at the same time (of course the W that is particular to that agent will change much faster), so we still have that racial behavior but the individual creatures have unique behavior depending in what it lived through.

If that wasn't your concern, let me know exactly what it was so we can try to solve it.

Subject: Re: Crash Course into AI Wed Nov 28, 2012 7:32 pm

Daniferrito wrote:: sciocont, I'm not sure if i understood you well, but I think you want some kind of racial behavior.

I mentioned that briefly a few posts above. Instead of each agent having its own W, we can share them between the entire species, with all agents modifying them as they take actions and all taking from the same numbers.

Another option would be to have two sets of W. One particular to the agent, and the other being still the whole species. Both will act at the same time, and both will be modified at the same time (of course the W that is particular to that agent will change much faster), so we still have that racial behavior but the individual creatures have unique behavior depending in what it lived through.

If that wasn't your concern, let me know exactly what it was so we can try to solve it.

Having a global w value solves it. We wouldn't really need individual W values, because organisms (well, animals, plants are a different story) that are not in your immediate vicinity don't technically need to exist- the game populates the space around you with instances of other organisms based on the probability of them being there, and once they pass out of that zone, the game forgets about the individual actually being there, and will just pop a new one out if you pass by again and the odds are in favor of its existence. The problem is that specific individual interactions between any species are only going to happen when you're around, so there will need to be a significantly high degree of pre-programmed behavior.

Subject: Re: Crash Course into AI Wed Nov 28, 2012 8:11 pm

Yes, individual W values only work if we keep agents, but if we delete them, then it doesen't make any sense.

About individual behavior, the player is only going to notice it if it is usual, and in that case, then the creatures would have had time to learn about the particular interaction. If some interaction has a very low chance of happening, then the agents wont know much about about it, but the player wont probably notice it anyway.

Subject: Re: Crash Course into AI Wed Nov 28, 2012 10:18 pm

Daniferrito wrote:: Yes, individual W values only work if we keep agents, but if we delete them, then it doesen't make any sense.

About individual behavior, the player is only going to notice it if it is usual, and in that case, then the creatures would have had time to learn about the particular interaction. If some interaction has a very low chance of happening, then the agents wont know much about about it, but the player wont probably notice it anyway.

So you'd agree that it's best to only use global W values?

Subject: Re: Crash Course into AI Thu Nov 29, 2012 5:19 am

~sciocont wrote:: So you'd agree that it's best to only use global W values?

Yes

Subject: Re: Crash Course into AI Sat Dec 08, 2012 1:58 am

I hate to say this after such a fruitful discussion on such a useful thread (great work Daniferrito!), but I recommend you either post the next episode to revive the discussion or someone writes down what was explained and agreed upon in this thread. I don't want to see this thread, like so many others, be a temporary burst of inspiration that disappears into the sea of new threads as discussion dies out. Is there somewhere where the concept on the AI is being stored or recorded? Is there a coder reading this and inputting it into the game engine?

Really, all I want to do is funnel this brilliance into the developing the game. It is something I believe we need to strive to do more with all of our useful threads. Also, please don't let my post kill any discussion here. Please, this is a very good thread so don't be shy.

Subject: Re: Crash Course into AI Sat Dec 08, 2012 8:54 pm

Yes, sorry, I've been neglecting this. The problem I have is that I went way too fast in the first post, so I'm not really sure how to do this.

Anyway, here I go on the second part, where I improve a bit the way we calculate the best action to take. The third part will be about the problems I see we could have, and some possible options.

Last time, we had a state s, from where we took an action a to end in state s' (the notation for the successor state). Then, we would choose wich action to take depending in the values we calculated of s'. The problem is that we need to know this s', which we can't fully simulate because all the random and unknown variables in the equation. What we can do is taking all the calculations "half a step backwards", to a point were we know all the variables. That point is still in state s (the one we are on), but to the point were we have comited to taking a specific action. Then instead of having V(s') (expected reward from state s'), we have Q(s,a) (expected reward from state s if we commited into taking action a).

Spoiler:

This makes choosing the action to take easier, as we only have to take the Q(s,a) that is bigger. But the equations change a bit. In order to calculate Q(s,a), we do Q(s,a)= W1*F1(s,a)+W2*F2(s,a)+...+Wn*Fn(s,a) (I would use the sumatory symbol, but I dont think it is suported here, and I think it could work better, altrough harder to code, if they are not all simply added up). The other thing we need is to update the weights as we recieve the rewards. For any given F(s,a), once we recieve the reward r and land in the state s', we have: W = W + alpha*(correction)*F(s,a), with the correction = (r+max(Q(s',a')))-Q(s,a). The correction is the difference between the reward we just got and the one from future states, starting from s' (r+max(Q(s',a'))) the expected reward Q(s,a). max(Q(s',a')) means that we put there the maximum expected reward from s', and it assumes that we act optimally afterwards. The rest is the same as last time.

Recapping, this changes allow us to compute this algorithm without the agent knowing the exact output of its actions (althrough it still needs some knowledge about them). You can also note that this time I could write the exact way of computing the correction to apply to the weights, as only using the states makes it harder. Now we can also use the action we are taking in the features.

Again, thanks for reading and please ask for whatever you didnt understaund or want me to explain further.

Subject: Re: Crash Course into AI Tue Dec 11, 2012 12:05 am

Excellent post.

Subject: Re: Crash Course into AI Tue Dec 11, 2012 12:08 am

Great work Daniferrito. Unfortunately this is where my knowledge of maths ends, so I can't really comment, but wonderful job nonetheless.

Subject: Re: Crash Course into AI Tue Dec 11, 2012 12:03 pm

Ok, third part, and the first one I will need help on. We are going to talk about the functions that describe the state-action pair.

The basic gist on them is that the more important they should be to the agent, the higher they should be. It doesen't matter if they are good or bad, if they are possitive or negative big, the algorith will handle that, but once they are not important, they should go towards 0. They shouldn't go to infinity either at any point either (Actually, any function will work, and the agents will do their best to maximize rewards, but choosing good functions makes it easier to the algorithm to e closer to optimality).

Lets look at an example, to see what I mean. For this example we'll use distance to something. The function will fe refered as F(s,a), and d will be the distance from the agent to the thing after the action has been aplied. It is easy to see that the closer something is, it is more important to us. The easiest thing to do would be to use F(s,a) = d. Lets see how that looks:

For all graphs, X axis (horizontal) represents d, with the the possitive side to the right, and the Y axis (vertical) represents F(s,a), with the possitive side going up. The axis are in black, and the values in blue. Where the axis connect is (0,0).

Spoiler:

As you can see, when the agent is close to the thing, the function will have a low value, and as it gets further from it, it goes up. That doesen't look like what we wanted, as it doesen't go to 0 when it is no longer important. Lets try F(s,a) = 1/d:

Spoiler:

Now it looks better. When we get closer to the thing, the function goes up, meaning we should care more about it. However, as it goes to infinity as the distance goes to 0, the agent will prefer to get as close to it as possible (if it is maximizing it, as with food), overriding any other function. It would still work, but it would behave funny when getting really close to the thing. We can solve this by doing F(s,a) = 1/(d+1):

Spoiler:

Now, we have a bigger F(s,a) when it is more meaningfull to the agent, and we dont get to infinity ever (at least in the section of d that is fisically possible, as we can't have negative distances).

Not all the functions need to be expressed as a continuous function. For example, we can have a function that contains if we are eating, and it has a value of 1 if the agent is eating and a value of 0 if it isn't.

So what needs to be discussed is the set of functions that we need to encode the state-action pair, that is, all meaningfull aspects need to have a function. I suggest to brainstorm them here.

Edit: Remember, we need to be able to code them.

Subject: Re: Crash Course into AI Tue Dec 11, 2012 5:37 pm

A word about distances, using the raw distance (a straight line) won't work too well with agents, we need to use real walking distance (or an aproximation of it). Let's see an example of what i mean:

This is a pacman board (althrough it looks lke a maze) that i made in paint. The yellow dot on the bottom represents the food pellet we are trying to get to. If we use staight distance, the agent (pacman) would prefer to go to the positions with lower numbers on them:

Spoiler:

That means that from its starting possition, it would go left or right, as they both have a 2. From there, it would go back to its starting possition, as it would prefer the 1 over the 3. And it would stay there forever.

However, if we use real maze distance, we would have this situation

Spoiler:

Now, it has 2 possible places to go to, a 8 and a 10. It would choose 8, from there, the 7, then 6,... until it finally gets to the food and wins the game.

Calculating the real distance is much harder than just using the straight distance, but it is mandatory if you dont want agents walking towards a wall because there is sonething they are interested in on the other side.

Subject: Re: Crash Course into AI Wed Jan 16, 2013 2:13 am

How would we define the paths that the agent takes in a dynamic environment?

Also, what else is there to cover in terms of AI. I want to try to revive this thread, as it is very useful and has seemed to die down recently.

Subject: Re: Crash Course into AI Wed Jan 16, 2013 7:55 am

For pathfinding, i was thinking of something similar to this video. At least for the 3-d world. For microbe, nothing is needed.

The second problem is what exact path to go. In a grid world, we only have 4 possible directions. In 2d, we have the whole circumference. We have to give the AI agent a few options to choose from, so we have to limit its valid options.

Other than that, the other thing i can think of are the functions they need to maximize, as i explained a few posts befor this one. Or at least, what things should AI care about.

Subject: Re: Crash Course into AI Wed Jan 16, 2013 10:05 am

Are you sure we need to use too much math... For example we can use simple stuff:
Like radius, we can give the cell a certain radius, and that we can just do something like the following (But Much more complex!) (Pseudo-code):

Code:: if (CellDistenceToFood < 6)
{
int EatOrNot = rand() % 100 + 1;

if (EatOrNot > 75)
{
CellOmNomNoms_Object();
}

else
{
CellNaaa2();
}
}

else
{
CellNaaaa1();
}

Subject: Re: Crash Course into AI Wed Jan 16, 2013 11:54 am

We could use that for simple scenarios, but that would fall flat when dealing with multiple things at the same time. And yes, we need to use maths.

The problem with your way is that you can't acount for anything you dont know at programming time. And as we are aiming for procedurally generating everything, there is very little we know at programming time.

Subject: Re: Crash Course into AI Wed Jan 16, 2013 12:07 pm

Daniferrito wrote:: We could use that for simple scenarios, but that would fall flat when dealing with multiple things at the same time. And yes, we need to use maths.

The problem with your way is that you can't acount for anything you dont know at programming time. And as we are aiming for procedurally generating everything, there is very little we know at programming time.

First, that could be used in a prototype.
Second, I don't know much about AOP, but I believe we can use it for more than one thing like you plan to use the "Eat Or not Function", Basicly, I believe that that function and my if/else do basicly the same thing.

By saying that I think we don't need to use to much math I meant that I don't think we need mathematical functions in here... your function basicly says that when the getting closer to the food it the chances it will eat it are bigger, speaking of that, if we really think about it, our ideas a more or less the same, after all, your function will probably use a bunch of lf/else or case/switch statements and will act as an agent.

Also, can you be more specific about the last sentence?

» Crash Course Economics
» Crash Course Evolution
» Organism Editor Concept