Artificial intelligence (AI) is a leading trend among all high tech and the most cutting edge technology. It is already present in numerous aspects of our lives and is showing promising opportunities in the near future. Constant developments in the AI and neural networks field show us just how big the gap between human and machine capabilities is going to get.
Today’s world is the world of big data. Thanks to AI and neural networks, we can make data-driven decisions and derive vast volumes of information in terms of seconds. With this profound knowledge delivered by AI, development in numerous fields is increasing at an exponential rate. At WEBSENSA we work with AI, both machine learning and deep learning. We implement AI tech in various areas, including projects for clients as well as our own.
ARTIFICIAL NEURAL NETWORKS
Neural networks, more properly referred to as artificial neural networks (ANN), are inspired by the way biological neural networks operate. ANNs consist of elements called neurons, and each can make minor mathematical decisions.
Neural networks are a mathematical construct, an algorithm that consists of layers, and when it’s given an input, it provides an output. They are applied in various problems, including pattern recognition, optimization, associative memory, and prediction. This text provides a short explanation of two types of ANNs and layers that they consist of.
Convolutional neural network
Convolutional neural network (CNN) specializes in recognizing and classifying images and in performing any computer vision task, and can also be used for NLP tasks. CNN mirror they way in which the human brain processes visual information. Putting it simply, CNN takes advantage of the fact that an image is composed of elements. It creates a mechanism for analysing each of the elements on its own so it can classify an image as a whole.
3 layers of CNN:
- Convolutional layer – an image is analysed a few pixels at a time. This analysis creates a feature map that predicts the class to which each feature belongs.
- Pooling layer – it creates down samples or pooled feature maps by reducing the volume of information of each feature (while still keeping the most crucial information). (a convolutional layer is usually used in turns with a pooling layer)
- Fully connected layer – this layer drives the final classification decision of an image.
The pooling layer is a part of the CNNs. This layer is also called down-sampling, and it acts on each feature map to create a new set of the same number of pooled feature maps. Pooling operation needs to be smaller than the size of the feature maps to reduce its size (the number of pixels/values). The obtained results, the down-sampled maps, are useful because of the sensitivity to the location of the output feature maps. Furthermore, they are made more resilient to changes in the position of the feature on the image.
Fully connected layer is, as mentioned before, an important part of CNN (and is present in all types of ANNs). It does the final classification of images for computer vision. To put it even more simply, its purpose is to classify the image into a label.
After the convolution and pooling layer has done its job, the output is flattened into a single vector of values, where each reach represents a probability that a particular feature belongs to a label. Then the backpropagation process is responsible for computing the gradient for the fully connected.
Backpropagation is an algorithm that brings the error functions to a minimum by searching for the optimal weight values. It helps the output come closer to the known correct result. During this process, each neuron is assigned with a weight that prioritizes the most appropriate label. Then the neurons can come to a classification decision.
Recurrent neural network
Recurrent neural network (RNN) is a multi-layer neural network that implements deep learning on language and sequential data to classify or predict. RNNs process series and search for dependencies. Unlike in feedforward neural networks, the output of some layers is sent back to the input of the previous layer. This addition allows the analysis of sequential data, which a traditional neural network cannot.
Implementation of RNN:
- temporal analysis – e.g. detecting time-series anomalies
- natural language processing – e.g. text generation, speech recognition and machine translation
- computer vision – e.g. image description generation and language modeling
Drawbacks of neural networks
Even though neural networks provide incredible opportunities and open many doors for us, they have certain limits. What might cause issues are problems with data. When there is not enough, or it’s difficult to comprehend, then the neural network cannot draw any conclusion. It can’t learn from the provided dataset when it cannot find a pattern or a correlation.
Another issue might come up when there is a vast neural network. The bigger it is, the more data is needed. To conclude their limits, the neural networks require training to operate, ‘correct’ data input, and high processing time for large neural networks.
OUR PROJECTS BASED ON AI
1. WEBSENSA Product Photo Recognition System for McDonald’s
Together with the OMD Poland Media Agency and McDonald’s Poland, we did an Image Recognition system. Its purpose was to detect which McDonald’s products were presented on Instagram photos with the hashtag #mamsmakanamaka.
We trained neural networks to recognise and classify McDonalds products on the photos. We were able to tag and distinguish burgers, fries, coffees, cold drinks, ice creams and desserts.
As an outcome, we were able to see which types foods or drinks people were most eager to take pictures with, and which are not so popular. McDonalds used this data to optimize their marketing communication to be more aligned with customers expectations.
For this project, we have received a bronze medal at a 2018 Innovation contest in category of Media, algorithms and optimisation tools.
2. WEBSENSA OpenRTB Bidder
The product is part of our AdTech OpenRTB (real-time bidding) product suite. Its goal is to choose and serve the most suitable ad for particular user visiting the particular page, from the set of all active ad campaigns.
The complexity of OpenRTB Bidder is based on:
- There is tens of thousands of active ad campaigns and creatives to choose from.
- Users are described by hundreds of features and segments.
- Chosen ad must be suitable for page that is displayed on.
- There are thousands of bid requests per second that we have to serve.
- Each bid request (decision and serving of the ad) must be completed in less than 50 milliseconds. Only then it can be taken into consideration by the Bid Exchange.
- Chosen ad must be the one with highest predicted value of CTR (click-through rate), predicted possibility of conversion, or predicted value of conversion depending on the type of campaign.
The last problem can be solved only by using machine learning algorithms or neural networks. During more than a year of constant development and testing of newer versions of WEBSENSA OpenRTB Bidder we ran through a handful of implementations of machine learning algorithms and neural networks architectures.
Technologies used in OpenRTB Bidder
We started with logistic regression, but it didn’t gave us results much better than simple heuristics. Then we started exploring neural networks: we tested factorisation machine based on neural networks with fully connected layers. Next, we tried the recurrent neural networks and started to include users’ session history what improved a lot performance of the bidder. At last, we moved to special kind of RNN – long short-term memory network, which helped even better.
Currently WEBSENSA OpenRTB Bidder uses custom designed neural networks based on previous experience. Thanks to them we are able to surpass competition in a realm of openRTB. Our Bidder is constantly developed. We are hoping that testing new architectures of NN will give us even better results in near future.
3. WEBSENSA Freeeze App
It’s a project that we took on, as a part of our research and development projects, to try something completely different. Freeeze is an application with the sole purpose of entertainment. Its suppose to imitate bullet-time photography (otherwise known as a frozen moment or time slice).
Freeeze is an app which suppose to imitate bullet-time photography using only 2 smartphones.
The key to achieving a good bullet time is to have 24 to 150+ reflex cameras. They should be places on rails around the object that’s being photographed. Since it’s hardly accessible for the majority of us, in Freeeze, 2 phones with the app installed are enough to create the effect of a time slice. The application synchronises those 2 photos taken by the phones to generate the rest time frames with AI and get our frozen moment. Generating those in-between photos has been the biggest challenge, and that’s where neural networks came unreplaceable.
To make this possible, we had to first obtain a huge amount of training data – photos representing different scenes seen from many, specially selected viewing perspectives. Then we needed to learn neural networks to reproduce the scene from any other perspective, based on the 2 given photos.
Collecting data in Freeeze
Our needed data set is the compilation of pictures that present bullet-time photography. It was essential to gather numerous exemplary photos so that the data set is big enough for the neural networks to generate satisfactory results. To collect needed data, we used two methods.
In the first one, we took the photos ourselves. We have installed three cameras on a tripod and placed them apart from each other, all facing the same spot in the middle. With such construction, we took videos of objects and then cut them into frames to get pictures at different angles from different positions. We collected this material at our office and in the city space at different locations to gather a lot of diversified data.
Our second method was taking videos with the effect of bullet-time photography and cutting them into frames. With both methods, we have collected enough materials to train the neural networks.
Neural networks in Freeeze
We used two types of neural networks. One being a convolutional neural network that has constructed and perfected computer vision with deep learning. We used CNN, a deep learning algorithm, which takes in an input image, assigns importance to various objects in the image (or its aspects), and is able to differentiate one from the other.
The other type of neural network we used is generative adversarial network (GAN). It belongs to the set of generative models, so it means that this network can generate new content (in our case – images). GAN has two sub-models: a generator and a discriminator. Generator, as its name would suggest, generates new examples. The discriminator tries to distinguish between real examples (from the dataset) and the fake ones (generated).
There are two competing neural networks in Freeeze project working against each other and learning simultaneously
These are two competing neural networks. One generates the in-between pictures (output) from the data that we have provided (input). The other has to distinguish between real images and those generated by the first neural network. GAN trains neural networks to work against each other, which means they learn simultaneously. As one is becoming better at detecting the fake picture, the other needs to create more realistic photos to avoid detection.
Current development of Freeeze
At the time of writing this article, the project is still under development. We are able to generate images of slightly lower quality than the quality of the input images. This is not satisfactory for us, so we are looking for better methods to solve this problem.