Google And Machine Learning – Bond Towards Safety.

The Introduction of the Internet generated a wave of change in the whole world. In the last 3-4 decades, the Internet has literally ruled the business world. So much so, that it nurtured the hopes of many aspiring young individuals by creating employment opportunities in abundance. However, the Internet hasn’t been just about getting that best job alone. It’s been about Data and Information in general. With Google and the Internet forming a terrific and indispensable association, the sea of knowledge and information is just at your doorstep. Just a click away!

Having said that, every activity we perform, whether physical or logical, is not short of challenges or threats. The Digital World is no exception to this either. As much as the Internet could be seen as a boon, it could well be turn out to be a bane too. This is due to potentially harmful applications which could post a grave threat to the very hardware and software infrastructure, which is instrumental in helping you reach out to the data/information, you are seeking. However, for every problem, there is a solution in sight. In this case, it is Machine Learning Techniques which addresses critical issues. How does it do? This article is focussed exactly on that. Read on.

Potentially Harmful Applications (PHA)

Before we get into our core subject of how Google addresses the issues of potentially harmful applications, let us get a very brief understanding of what PHAs actually are. Any Android developer, in simple terms, would refer to potentially harmful applications as a Malware. However, over the years, PHAs have changed with the ecosystem and is expected to continue to change. Here’s a look at some of the PHAs,

  1. Software which facilitates operations through remote locations on mobiles and smartphones
  2. A commercial spyware which transmits sensitive information from any device without users consent
  3. An application which may harm devices which operate on platforms other than Android. These are also referred to as Non-Android threats.
  4. Potentially harmful virus software like Trojan
  5. Ransomware which can take either partial or total control of any device.

Influence of Machine Learning

The introduction of machine learning and Artificial Intelligence saw quite a percentage of issues, related to data security and Integrity, being effectively addressed. The earliest problem that Machine Learning detected and solved was Spam detection. Before ML came up, email providers would use rule-based techniques to filter out spam. With the introduction of spam filters, networks can now easily identify phishing messages, junk emails or viruses, across a huge network of computers. Machine Learning continues to use innovative methods to identify and classify potentially harmful applications. Technology Giants like Google have endorsed the significance of Machine Learning techniques in classifying these PHAs. Let’s now get back to our core subject on how this happens.

Google and Machine Learning Techniques

Google has been rightly seen as a revelation. Google has made life easy for every existing commercial activity or upcoming commercial activities, across sectors. However, PHAs have been posing threats and challenges and even Google finds it as a challenge to detect and classify these PHAs. With costly electronic communication devices like Smart Phone and iPhones, being at increased risk, Android developers have felt the need for additional resources, to tackle these threats.

Hence, Android developers at Google analyze complex signals to find out this potentially harmful software. Google Play Protect, an application from Google, does this to a commendable extent.

Using Machine Learning helps in detecting PHAs faster and at a larger scale.

Google Play Protect’s Machine Learning Resources can be categorized into two parts;

Data sources

With information or data, being crucial to the successful functioning of any system, Google’s multi-dimensional system uses two methods, to detect and classify PHAs.

  1. Analyzing through App data

Each and every app on the internet is keenly analyzed by Google Play Protect. As a part of this process, Google Play Protect conducts a thorough study of particular features and behaviors, of the app, that could be relevant to the PHA categories in scope (for example, SMS fraud, phishing, privilege escalation). In the end, this analysis produces information about the app’s characteristics, which serve as a fundamental data source for machine learning algorithms.

  1. Analyzing data pertaining to User’s experience on using the App data

The End User’s feedback and opinions are of paramount importance. Google, as an organization of International repute, leaves no stones unturned, in ensuring customer faith and satisfaction. User feedback (such as the number of installs, uninstalls, user ratings, and comments) collected from Google Play immensely helps in identifying problematic apps. At the same time, information about the developer (such as various certifications they have achieved) contributes valuable knowledge that can be used to identify PHAs. This information helps Google to understand the quality, behavior, and purpose of an app so that new PHA behaviors can be identified, detected and classified.

Efficient Model

It is a monumental task to arrive at the best service model to tackle PHAs. Even Google endorses this fact. But where Google is far ahead from the rest is by possessing good datasets and features which form the pillars of machine learning and are most important. Equally important is a great algorithm, which can analyze and detect the behavior patterns of these PHAs effectively. Google uses a diverse range of modeling techniques to modify the machine learning approach, including supervised and unsupervised ones.

Google has a few good techniques in its armory which has been widely accepted by the Industry. One of the proven techniques is the Logistic Regression technique which has a user-friendly structure and can be trained on, very quickly. This technique helps in clearly analyzing the different Potentially Harmful applications which are active

The second technique, widely used is the Deep Learning Technique which is used to analyze complex cases wherein capturing complicated interactions between different features is required.

In addition to the above techniques, Google facilitates the use of unsupervised machine learning methods. Many PHAs are similar in character and activity. As such, they look almost identical to each other. An unsupervised approach helps define a group of applications that look or behave similarly, which allows Google and Machine learning to detect and classify PHAs more effectively.


PHAs are constantly evolving, so Google’s models need constant updating and monitoring. Hence, to stay with the present trend, in production, models are fed with data from the most recent apps, which help them stay relevant. However, when it comes to new behaviors, hitherto which were unseen and unknown, they need to be continuously detected and fed into Google’s machine learning models, to be able to catch new PHAs and stay on top of recent trends. Google’s Play Protect has been a conscious answer to Malware. Recent studies indicate that close to 70% of potentially harmful application threats were identified by Google Play protect alone, with the help of machine learning algorithms. The potential of machine learning is so phenomenal that it can allow the Play Protect, to scan close to 50 billion applications per day. However, this is a continuous cycle of model creation and updating that also requires up gradations, to ensure that the precision and coverage of the system as a whole, is on track with Google and Machine Learning’s detection goals.

This Article Originally Published at ello

Thank you

Your Name (required)

Your Email (required)


Your Message