Monitoring network behaviors of mobile applications, controlling their resource access and detecting potentially harmful apps are becoming increasingly important for the security protection within todays organizational, ISP and carriers. For this purpose, apps need to be identied from their communication, based upon their individual traffic signatures (called imprints in our research). Creating imprints for a large number of apps is nontrivial, due to the challenges in comprehensively analyzing their network activities at a large scale, for millions of apps on todays rapidly-growing app marketplaces. Prior research relies on automatic exploration of an apps user interfaces (UIs) to trigger its network activities, which is less likely to scale given the cost of the operation (at least 5 minutes per app) and its effectiveness (limited coverage of an apps behaviors). In this paper, we present Tiger (Traffic Imprint Generator), a novel technique that makes comprehensive app imprint generation possible in a massive scale. At the center of Tiger is a unique instantiated slicing technique, which aggressively prunes the program slice extracted from the apps network-related code by evaluating each variables impact on possible network invariants, and removing those unlikely to contribute through assigning them concrete values. In this way, Tiger avoids exploring a large number of program paths unrelated to the apps identiable traffic, thereby reducing the cost of the code analysis by more than one order of magnitude, in comparison with the conventional slicing and execution approach. Our experiments show that Tiger is capable of recovering an apps full network activities within 18 seconds, achieving over 98% coverage of its identiable packets and 0.742% false detection rate on app identication. Further running the technique on over 200,000 real-world Android apps (including 78.23% potentially harmful apps) leads to the discovery of surprising new types of traffic invariants, including fake device information, hardcoded time values, session IDs and credentials, as well as complicated trigger conditions for an apps network activities, such as human involvement, Intent trigger and server-side instructions. Our findings demonstrate that many network activities cannot easily be invoked through automatic UI exploration and code-analysis based approaches present a promising alternative.
More detail can easily be written here using Markdown and $\rm \LaTeX$ math code.