iOS Privacy Manifest: Data Collection and Usage in Top Free iOS Apps

Since May 2024, the inclusion of an iOS Privacy Manifest has been a requirement for app submissions with newly added third-party SDKs. We analyzed first results about data collection practices, compliance issues with Apple’s guidelines, and privacy risks posed by SDK providers.

Apple mandates that all app submissions with specific newly added third-party SDKs have to include a comprehensive iOS Privacy Manifest. This manifest outlines the data collection and usage practices of apps, providing users with transparency and empowering them to make informed decisions about their digital privacy. As outlined in our previous blog post, Appicaptor correlates the privacy manifest contents with additional privacy findings through static analysis.

In our evaluation, we analyzed the iOS Privacy Manifests definitions of our app sample set, that consists of the most popular 2,000 free iOS apps available on the German Apple App Store in June 2024.

Apple mandates that apps with newly added third-party software development kits (SDK) of one of the 86 listed third-party SDKs have to include a iOS Privacy Manifest (see Apple Developer Support). Our analysis of the app sample set revealed the presence of all 86 SDKs in at least one app. We evaluated whether apps integrating these SDKs adhered to the privacy manifest requirement: Our findings indicate that only 47 out of the 86 SDKs were associated with at least one app that defined a privacy manifest. Conversely, the remaining 39 SDKs were utilized at least within one app within the sample set, but no app employing these SDKs defined a privacy manifest. This discrepancy may stem from the following factors: either the SDK developers may have not provided a privacy manifest template for their SDKs, or the app developers have not incorporated SDK versions that include privacy manifest template or explicitly deleted the manifest entry, or the app update does not include a newly added third-party SDK from the list. So our analysis can only investigate the Privacy Manifest declarations of about half of the libraries that Apple has selected. However, these first results already provide interesting insights in the data analytics industry.

What data is requested in real apps and why?

As first evaluation on the manifest content, we evaluated the data types declared within the iOS Privacy Manifests of our app sample set. Our analysis revealed that 50% of the apps declare in the manifest that they collect data types such as OtherDiagnosticData, CrashData, and DeviceID. Especially the data type DeviceID is privacy-sensitive as it provides the possibility to correlate actions of users across the sandbox boundaries of individual apps.

The full list of collected data types and the number of apps in which such data is collected according to the privacy manifest is given in the following graph. It can be seen, that a wide range of data types are defined in the app set, including very sensitive data types as Health (Health and medical data), PhysicalAddress, PreciseLocation and SensitiveInfo (racial or ethnic data, sexual orientation, pregnancy or childbirth information, disability, religious or philosophical beliefs, trade union membership, political opinion, genetic information, or biometric data).

Although Apple clearly specifies how data type classes should be defined in the iOS Privacy Manifest, we found 205 apps within our app sample set that included malformed or self-defined data types in iOS Privacy Manifests. Consequently, these do not align with Apple’s specified requirements and the discrepancy highlights issues in the implementation and adherence to Apple’s guidelines.

Following the evaluation of data types, we proceeded to assess the given purposes. The next graph presents the distribution of purposes defined in the iOS Privacy Manifests per app. Our analysis indicates that collecting data is declared for the purposes of AppFunctionality or Analytics in over 60% of the apps of the sample set. Consistent with our findings in the data type evaluation, we observed that a substantial number of apps (235 apps) which included purpose definitions deviate from Apple’s expected values. These are summarizes in the category Malformed / Self-defined Purpose in the chart below.

Do the declared data type and their reasoning align?

Further analysis was conducted to determine which data types are collected for which specific purpose and in how many apps of the app sample set this occurs. For this evaluation, we analyzed the iOS Privacy Manifest definitions of the 2,000 apps in our sample set, focusing on tuples of data types and associated purposes. The following graphic illustrates which data types are accessed for which purposes. The size of the circles corresponds to the number of apps in which that specific data type-purpose tuple was defined in the iOS Privacy Manifest. It is interesting to see, that collecting certain sensitive data types such as SensitiveInfo, PhysicalAddress, PreciseLocation and Health are declared for purposes besides AppFunctionality and almost all data types are used also for the purpose of Analytics, which could have an effect on user’s privacy.

Are there component providers that stand out in terms of data type and usage?

Data types and purposes are defined for specific components in the iOS Privacy Manifest. For that reason, our next focus was to investigate if specific purposes to collected data types are specific to certain component providers. Therefore, the following analysis examines the purposes for collected data types grouped for each component provider within the iOS Privacy Manifest. The following graph highlights the relationship between purposes and the SDK provider. To do so, we associated the providers to their SDKs and extracted the 20 most frequently contained SDK providers in the Privacy Manifests within the evaluated app set. We analyzed in how many apps each purpose is defined for these top 20 providers within the app set. In the diagram it can be seen that certain providers, such as Firebase, concentrate on a few specific and targeted purposes. In contrast, others, like Adjust, request data for a wide array of purposes. As the purposes relate to the activities of the SDK providers, this can be seen as summary on the functionality aspects provided by the SDK components and their providers.

Similar to the evaluation of the specific purposes related to component providers, we expanded the view to cluster the information according to the data types accessed. To do so, we again took the 20 most frequently mentioned SDK component providers in the Privacy Manifests within the evaluated app set and counted how many apps declare a data type and purpose tuple for these top 20 component providers. In the following graph the size of the circles in the graph corresponds to the number of apps in which each specific data type-purpose tuple was declared in the iOS Privacy Manifest. Like the former evaluation, this graph shows, that certain component providers (like Firebase) focus on certain data type and purpose tuples, whereas other component providers define various data type and purpose tuples (e.g, Google and Facebook).

The data type definition may additionally contain boolean processing flags, that should specify external usage of the data. The processing flag linked specifies that the data type is linked to the user’s identity, whereas tracked specifies that the data type is used to track users. Our final evaluation of the Privacy Manifest data of the app sample set focus on the aspect whether certain data providers specify data types with these processing flags or not.

Data types flagged with tracked can be shared with a data broker. Therefore, if the processing flag tracked or linked/tracked is set, this may threaten the user’s privacy significantly. Therefore, we examined what processing flags are set by different component providers for requested data types. We took the ten most frequently mentioned SDK component providers in the Privacy Manifests within the evaluated app set and analyzed how many apps declare a data type, processing type and purpose tuple for these top ten component providers. The size of the circles in the following graph corresponds to the number of apps in which each specific data type-processing flag-purpose tuple was defined in the iOS Privacy Manifest. The graph groups the data types in relation to the requested purpose in circle groups. The circle group’s label states which processing flags are set to true: If the circle group is labeled with linked, then only the data type is linked to the user’s identity. If circle group is labeled with linked/tracked, then the data type is linked to the user’s identity, and it is used to track users. Elements of a circle group that has no label are neither linked nor tracked.

A significant difference in the processing flag usage can be seen when comparing the results for Google and Facebook. Google defines the processing flag linked or no processing flag for most data types and purposes. In contrast, Facebook sets the processing flag with the most possible extent linked/tracked for most data types and purposes.

Conclusion

The analysis of iOS Privacy Manifests for the 2,000 most popular free iOS apps on the German Apple App Store in June 2024 reveals several insights about data collection practices and compliance with Apple’s guidelines.

About half of the Privacy Manifests in apps have declared to collect data type OtherDiagnosticData, CrashData or DeviceID, with various sensitive data types also being collected. However, the presence of apps with malformed or self-defined data types indicates inconsistencies in adhering to Apple’s guidelines. Additionally, the observed entries for data collection purposes were found to violate Apple’s specification. This undermines the effectiveness of the privacy manifest and hopefully will be addressed with checks by Apple during the app review process.

However, even in this preliminary analysis with a lot of data missing for SDKs, the benefit of the Privacy Manifests can be seen. This way, it is possible to inspect the relations between collected data types, purposes and components, showing that in some apps sensitive data types are collected for purposes not related to the app’s functionality.

The examination of specific data type-purpose tuples and their association with component providers revealed that certain SDK providers, focus on targeted purposes, while others, request data for a broader range of purposes. Notably, the processing flags for data types, particularly those flagged as tracked, pose significant privacy risks. The contrast between providers, which primarily uses the linked flag, and those which extensively use the linked/tracked flag, may underscore the varying levels of privacy impact across different SDK providers.