Quick and sparse notes on discriminative vs generative classifiers

  • Discriminative classifier are more easily biased, thus regularization (Bouchard and Triggs, 2004)
  • Generative models are converging faster while discrimative models can converger closer to the “real” limit/boundary (Ng and Jordan, 2001)
  • Discriminative models whose structure does not match/fit the problem perfectly are more easily biased (Lafferty et al., 2001)

In the case of sparseness or scarcity of data or structure, generative models have to be considered seriously (thus, compressive sensing).

The Trade-off Between Generative and Discriminative Classifiers, Bouchard and Triggs, 2004
On Discriminative vs. Generative Classifiers: A comparison of Logistic Regression and Naive Bayes, Ng and Jordan, 2001
Conditional Random Field: Probabilistic Models for Segmenting and Labeling Sequence Data, Lafferty et al., 2001

Two Amusing Side Channel Attacks

Side channel attacks usually call up timing attacks and electromagnetic (TEMPEST) attacks. But there are different, less and more exotic, forms. I recount two amusing stories that Adi Shamir told during an invited talk in early 2011 at the Computer Security course at Collège de France (Paris).

1) The first story was about ultrasonic waves. Adi and one of his student bought an ultrasonic microphone, like the ones used to study bats. They recorded the sonic spectrum up to 48Khz near a computer performing RSA encryption and looked at graphs like that GNU PG
sonic spectrum. They were amazed by the fact that patterns appeared, but they wanted be sure that they were picking real ultrasonic sound and not electromagnetic noise of the PC (or surrounding equipement) and so they tested against the muffled (soundproofed) microphone. Indeed it was real ultrasonic noise. Ultrasonic sound was not particularly disturbed by the PC fans and background sound. So they wrote an assembly test program and were able to match ultrasonic patterns with actual CPU operations: CPU operations vs ultrasonic
spectrum From there, they were able to break RSA by knowing the CPU operations performed during encryption. So the last question was: where does this ultrasonic noise, highly correlated with CPU operations, comes from? They found it came from (1500 μF) capacitors designed to smooth the power delivered to the CPU which two plates were vibrating (Piezo effect) differently depending on the instant power consumption of the CPU. How did they found that? By applying “Quick-Freeze” (-48ºC) spray to various part of the motherboard/CPU! So it all boiled down to a power analysis attack.

2) The second story was about USB devices. Basically, they plugged a very precise voltmeter into an USB port and started recording the very small variations between 4.999V and 5V. With the same assembly-test-program-pattern-matching approach, they broke RSA again. Better yet, they cut off the USB power from the OS USB controls, and they were able to perform exactly the same side channel attack through residual power in the USB port. It seems the only solution is either not have USB ports, or physically disable their +5V power output by short-circuiting them and so blowing their specific fuses.

References:
– Invited talk of Adi Shamir part of the Computer Security 2011 course by Martin Abadi at Collège de France (Paris)
http://tau.ac.il/~tromer/acoustic/
http://www.lsec.be/upload_directories/documents/AdiShamir.pdf
– it all boils down to http://en.wikipedia.org/wiki/Power_analysis

Cross-post from #AltDevBlogADay

I totally agree with the Three Laws of (Game) AI here: http://altdevblogaday.com/2011/07/14/an-ai-curriculum/ and I mentioned:

IMHO, the targets for funnier game AI, in:
- Quake-like games: more teamplay and longer time goals/planning ("I will do A because it will allow me to do B next"), less aim/scout/hp cheats.
- RTS games: more strategy and tactical thinking (it's _hard_), dynamic adaptation to what the (human) player does and more human like playstyle entailed by less econ/scout cheats.
- MMO games: pets that are more than 3-states-machines, mobs that cooperate (group aggro is not cooperation), NPCs (allies or foes) with character, and (long-term :)) persistent players characters with bots that play like their human master*.
- Infiltration games: more sensing, more human-like behavior, more terrain interaction, less cheating, less scripting.

All in all, it's about more skill with less cheats, more replayability and more fun!

StarCraft Opening Prediction: Preliminary Results

Today, I completed the first step of validation of my StarCraft: Broodwar opening prediction model. It uses knowledge of enemy buildings seen as observations (at some time) to infer possible tech trees and then the distribution on openings. This probabilistic model is used to infer which opening (reaver drop, fast expand, fast dark templar, etc.) the opponent is performing when playing StarCraft, as a bot of course! It can have many other uses: infer probabilities of tech trees given observations and time, commentary assistant with probabilities of openings in real-time, and perhaps even infer probabilities of tech trees given opening(s) and time, so that my bot can infer which build order to follow to perform an opening at a given time. For those who want to learn more, here is the basic model, and its update with opening filtering. The others can skip to the pretty pictures. Even if it is a (very) ill-posed problem (for two reasons going with the dataset and inference), I am going to give a number: it has an overall ~77% recognition rate, on 7 different openings, in full information, and with 10-fold cross validation at the end of opening inference (I set it to 10 minutes for the time being, then I switch to tech tree prediction) and it can deal with absence of observations or noise (no scouting / late scouting respectively). More on that later (see “preliminary” in the title).

Here is the basis vector corresponding to the “ReaverDrop” opening in Protoss versus Zerg in the openings vector space in Ben Weber’s dataset semantics. I am not so happy with this dataset for reasons I will explain in a future post, but it’s part of fine tuning to modify / change this dataset. It was a handy bootstrap, thanks and kudos to Ben! 3D Time and TechTree knowing
ReaverDrop

And with a black and white, log-scaled heatmap: BW heat map Time and TechTree knowing
ReaverDrop

Each “TechTree” line corresponds to a bell-shape (discrete normal distribution) like this one: Some TechTree knowing
ReaverDrop

Inference, with filtering of openings over the buildings that we have seen / observations (at each time), in a PvZ (we are Zerg against Protoss) match-up, for a given game (replay): PvZ openings replay
14

Without filtering: PvZ openings replay
14

Stay tuned.

Of course, as usual, all source code is available on my github.

Ripples during sleep help consolidate memories by replaying activity

I thought about making that post reading this one. I am wondering about the level of correlation of this research with the following.

I am hosted in a neurobio / cognitive laboratory, and a Ph.D student there works on ripples (high frequency field oscillations) in the hippocampus that replay previous awake activity in a temporally compressed manner. She showed that selectively suppressing them in rats “resulted in performance impairment in rats trained on a hippocampus-dependent spatial memory task” (Selective suppression of hippocampal ripples impairs spatial memory, Nature neuro.). She has done further experiments to show that “ ripples and associated neuronal reactivations play a causal role in memory consolidation during sleep and rest” (Hippocampal ripples and memory consolidation, Curr Opin Neurobiol.).

Two definitions of a kernel: why "kernel trick"?

Kernel has so many meanings (Wikipedia). I will not speak about the core of your OS, nor about the atomic nucleus or the fictional characters.

We will try to grasp the general concept of “what a kernel is”, and “why is it called like that” in machine learning. Let’s examine the two “large meanings” of kernel in mathematics).

First meaning: a measure of injectiveness

In algebra (kernel of a linear transform, of a homomorphism, of a matrix), a kernel is the set of elements that map to the neutral element. T : V → W,
ker(T) = {v ∈ V | T.v = 0}
Edit: for the readers not familial with algebra, I made a picture.

In category theory (that studies categories of mathematical objects and morphism), the kernel of a morphism f is more general: it is the most general morphism k such that: k∘f return the neutral element for whatever input.

In set theory, the kernel of a function f is the partition D¹…Dⁿ of the domain D such that:
∀ (x, y) ∈ D ∈ {D¹…Dⁿ} : f(x) = f(y), so: ker(f) = {(x,y) | f(x) = f(y)}

All these bound the kernel to a measure of the degree to which the transform/morphism/function fails to be injective (how much elements from the domain doesn’t have separate images).

Second meaning: defining an integral transform

An integral kernel, or kernel function, defines an integral transform, such as the function k in:
(T∘f)(x) = ∫k(x,x’).f(x’).dx
If we take
k : x ↦ exp(-i.u.x)/√(2π), we have a Fourier transform, whereas k : x ↦ exp(-u.x)* leads to a Laplace transform.

When an integral kernel has the property of:
∫k(x).dx = 1 and ∀ x, k(x) = k(-x) this kernel can be used for density estimation of a random variable. Perhaps the most known kernel in this regard is the Gaussian kernel: k : x ↦ exp(-x²/2)/√(2π), but there are many others (commons)).

When the integral kernel depends on the difference between its arguments, we have a convolution kernel:
*(T∘f)(x) = ∫k(x – x’).f(x’).dx
As the probability distribution of the sum of two independent random variables is the convolution of their individual distributions, kernels are central for probabilistic models.

So, why “kernel machines/methods” for SVM and Gaussian processes (and PCA etc.)? Because kernels are central to such methods. An SVM is basically a maximum margin linear classifier. Linear classifier means that, given 2 sets of points in a p-dimensional space, it just build an hyperplane (dimension p-1) to separate them. Maximum margin means that it maximizes the margin between the points of the 2 sets closest to the hyperplane (the support vectors). So, how can SVM separate clusters that are not linearly separable? Using a kernel trick.

Kernel trick

A kernel trick is a way to increase the dimensionality of the observations so that, in this higher dimension space, they are linearly separable. For that, we use a kernel, as defined in the above part, as an inner product of the higher dimension space. Observations from S are mapped to V by ϕ : S → V, the kernel used is k : x,y ↦ ⟨ϕ(x),ϕ(y)⟩, the inner product of V. And we don’t need to know ϕ, it suffices to know that k in a inner product in V and that k follows the Mercer’s condition.

There is a cool video to vizualize that (forgive them the Comic Sans) on YouTube.

To conclude, the name “kernel machines” comes from the fact that without kernel tricks, SVM and other linear classifiers would be very limited. It enables them for non-linear classification.

StarCraft AI introduction

I am writing this because I feel like addressing some misconceptions seen here. If it stimulates people to get their hands dirty working on a StarCraft AI, it’s icing on the cake. I am the author of BroodwarBotQ, a shitty and bugged bot with adequate micro-management.

For those of you wanting to develop a Starcraft AI and who are not very familiar with StarCraft: Brood War, I recommend Liquipedia and the advanced guide on BWAPI’s wiki.

How do we interact with the game?

BWAPI, BroodWar API, is the standard. You can read the state of the game through mainly:

  • Units that you see and that I’m sure you’re gonna keep track…
  • Game the game information: static map, players…
  • Events all the possible events you would have to react to as a player.

In short, BWAPI works by injecting your DLL + BWAPI’s in StarCraft: it reads memory from the game, has its own data structures to present it to you, and allows you to perform the same actions as the human player could do. You have mainly 2 types of entry points to get your code running: callbacks from events and onFrame() (update()) called 24 times per second.

StarCraft is an interesting problem

Real time strategy (RTS) games in the large are interesting problems because they yield problems of all sorts:

  • pathfinding: it is not a solved problem. Overmind did threat aware A*, I did local perceptions (with threat as one of them) modifications, and there are yet many problems to be solved there. Consider a 1-(walk)tile wide corridor with a 2-tiles wide unit in front and a 1-tile unit wanting to go through. Supreme Commander and StarCraft 2 solved this by having the units push themselves as in fluid mechanics. A marine can push away a tank. :) Consider an oscillating door, you find a path through it, it closes (same as facing someone in the street and stepping aside to dodge on the same side at the same time). I know there are many ways to deal with it, but we, as humans, solve it differently.
  • hierarchical reasoning (and so, to me, modeling): you can’t consider atomic unit movements as global strategy, you need filtering and hierarchy. For instance, for Overmind:

    • Strategy: win by harassing with mutalisks and taking the economical lead
    • Tactics: find interesting enemy positions and weak spots
    • Micro-management: retreat units from high damages zones, flee when needed, focus fire on important targets. Emphasis learned from reinforcement learning.
  • planning: planning in StarCraft is hard (see the number of possible tech trees). It involves many parameters and have to be done under uncertainty about the opponent. Also, computing all the possibles productions, positioning and moves is intractable. So we have either to prune them a lot of to have a hierarchical approach.

  • a lot more

And you get to study those not in your own lab grown simulator, but on an existing simulation from which you can’t control the implementation and parameters. Better, you can even be confronted to real human players!

Starcraft really is a canonical RTS, and particularly a fast paced / tactical one. StarCraft gameplay is often described in the combination of macro-management (economy, production, technology) and micro-management (optimal units control). It can also be seen as the combination of strategy (long term goals, army compositions) and tactics (army positioning, micro-management gimmicks). RTS are games that diverges, fast. The first minutes is mostly always the same, but by the 8th minute, each race could have taken between 5 (plausible) to 15 (some ridiculous) tech paths. And that’s just a “human player gross estimation”. Rigorously, there are:

  • Terran, 2212 tech trees
  • Protoss, 480 tech trees
  • Zerg, 720 tech trees

Considering only one building for each type (code here). (Terran tech trees count is so high because of add-ons + half of the tech trees without a supply depot, ridiculous.) In addition to the “classic RTS”, StarCraft brings:

  • a balanced gameplay human-wise.
  • competitive play and the best players (the best StarCraft players are also among the best RTS players).
  • (a huge load of) replays on which to apply data-mining / machine learning.
  • other robotic opponents to fight against (IMHO, that’s important).

Bots fighting each others

This was the goal of StarCraft AI Competition (AIIDE 2010) and will be in the upcoming competition(s) (AIIDE 2011 at least). Bots completely shift the gameplay. As we saw during the last competition, Mutalisks total ubiquity (macro/micro) + 6k actions per minutes (APM) really is very powerful: what could have an other Terran (robotic or human) player done in place of krasi0 bot? But the game is fair in mirror matches. And even with all it’s abuse, the game is still playable because of all the possibilities. What if your bot makes Overmind too comfortable and rush it when he least expects it, just before Mutalisks? What if the pressure is so intense his Mutalisks are forced back? Good human players have the same problem, and pro-gamers still compete, not just betting on a micro/control mistake of their opponent, we can’t blame a perfect control for crushing totally the gameplay. And who said Overmind did no mistakes? It was programmed by humans.

Bots fighting humans?

There are 2 possible approaches to that:

  • forcing the AI on equal footing control-wise: the best human players reach around 300 APM. Try and fight with very limited APM (for a bot), like (at least) 600 APM for instance. We have to take into account that selecting 12 units and ordering them to go somewhere is 2 actions for a human and 12x2 for a bot (no group selection). To win against a good player (ICCUP rank > C), the AI would need high level strategy and concepts as well as efficient control, it would mean that we would have groked / broken RTS AI.
  • unleashed (non cheating) AI: Kasparov has less computational power than Deep Blue, and Deep Blue isn’t restraining its computational power. If Deep Blue had access to raw computational power, min-max (alpha-beta) and heuristics, why can’t a StarCraft bot? Because it breaks the gameplay? The foundations of the gameplay are not only micro-management, they are strategy, prediction, adaptability, surprise. Raw computational power and 10k APM won’t make the game boring. These perks can save 12 annoying mutalisks from one storm, they can’t save them from 5 storms, and good army positioning.

It’s 5am here.