Architecture
Architecture
The following figure illustrates the software architecture of a trading agent for the workshop:
The Control module combines functionality from all other modules to implement a complete trading agent. The Control module interacts with the game server and provides the other modules intra-game information such as configuration parameters and daily reports. The Control module also allows training of the other modules by providing inter-game information in the form of past-games logs.
The functionality, interfaces, and suggested internal structure of each module is detailed in the following sections. The sample agent includes a baseline implementation of each module and may be used as a starting point.
Model
This module' task is to provide for every query and every possible bid an estimate of the resulting cost per click and number of impressions. For the former, the module has to learn the competitors' behavior (since the resulting cost depends on the bids of the competitors) and provide the resulting cost and position for a given query and bid - this may be done using reinforcement learning (to 'reverse engineer' the competitor's actions in ) and by using the position information as provided in the game daily reports. For the latter, the module has to model the users' population distribution over states (since the number of impressions depends on the size of the population in 'transacting' states) - this may be done using the method of particle filtering.
Estimation
This module' task is to provide an estimate of profit and conversions (sales) given a bid set for all the queries and a baseline capacity level (the amount of capacity that was consumed up to the given time of estimation). To fulfill its task, the module has to use the Model and estimate the click-through-rate and the conversion rate - this may be done using statistical regression over current and past games information. Also, burst notification (identified by the users model) may be used to adjust the estimated conversion rate. The estimation module also gives access to direct estimtes of click-through-rate and conversion rate of every query for a given bid and baseline capacity.
Optimization
This module provides the bid set for the next day of the game. To optimize for the next day only, the 'singleDay' component chooses the best bid set to achieve (no more than) a predefined number of sales given the current level of capacity used. However, since the target is to optimize profits over the whole duration of the game, the 'multiDay' component has to find the optimal sequence of bid sets (and the bid set for the next day is the fist bid set of that best sequence found).
There are many implementation alternatives for the singleDay component. One team of researchers suggests (among others) a 'Multiple Choice Knapsack' formulation that may be solved using a greedy algorithm, while others suggest dynamic programming (or a greedy approximation for faster computation) to solve a very similar setting, or a simpler heuristic that iteratively adjusts a target ROI (such that the ROI of all queries is equal - by the equimarginal principle, the marginal ROI of all queries is the same at an optimal bid), and bids for each query at the cost that would result in the targeted ROI.
The 'multiDay' optimization may be done using dynamic programming (as mentioned, but not implemented due to time constraints in both papers referenced above, or by successively applying the 'singleDay' optimizer as part of a 'gradient ascent' search.