VU#253266: Keras 2 Lambda Layers Allow Arbitrary Code Injection in TensorFlow Models

Overview

Lambda Layers in third party TensorFlow-based Keras models allow attackers to inject arbitrary code into versions built prior to Keras 2.13 that may then unsafely run with the same permissions as the running application. For example, an attacker could use this feature to trojanize a popular model, save it, and redistribute it, tainting the supply chain of dependent AI/ML applications.

Description

TensorFlow is a widely-used open-source software library for building machine learning and artificial intelligence applications. The Keras framework, implemented in Python, is a high-level interface to TensorFlow that provides a wide variety of features for the design, training, validation and packaging of ML models. Keras provides an API for building neural networks from building blocks called Layers. One such Layer type is a Lambda layer that allows a developer to add arbitrary Python code to a model in the form of a lambda function (an anonymous, unnamed function). Using the Model.save() or save_model() method, a developer can then save a model that includes this code.

The Keras 2 documentation for the Model.load_model() method describes a mechanism for disallowing the loading of a native version 3 Keras model (.keras file) that includes a Lambda layer when setting safe_mode (documentation):

safe_mode: Boolean, whether to disallow unsafe lambda deserialization. When safe_mode=False, loading an object has the potential to trigger arbitrary code execution. This argument is only applicable to the TF-Keras v3 model format. Defaults to True.

This is the behavior of version 2.13 and later of the Keras API: an exception will be raised in a program that attempts to load a model with Lambda layers stored in version 3 of the format. This check, however, does not exist in the prior versions of the API. Nor is the check performed on models that have been stored using earlier versions of the Keras serialization format (i.e., v2 SavedModel, legacy H5).

This means systems incorporating older versions of the Keras code base prior to versions 2.13 may be susceptible to running arbitrary code when loading older versions of Tensorflow-based models.

Similarity to other frameworks with code injection vulnerabilities

The code injection vulnerability in the Keras 2 API is an example of a common security weakness in systems that provide a mechanism for packaging data together with code. For example, the security issues associated with the Pickle mechanism in the standard Python library are well documented, and arise because the Pickle format includes a mechanism for serializing code inline with its data.

Explicit versus implicit security policy

The TensorFlow security documentation at https://github.com/tensorflow/tensorflow/blob/master/SECURITY.md) includes a specific warning about the fact that models are not just data, and makes a statement about the expectations of developers in the TensorFlow development community:

Since models are practically programs that TensorFlow executes, using untrusted models or graphs is equivalent to running untrusted code. (emphasis in earlier version)

The implications of that statement are not necessarily widely understood by all developers of TensorFlow-based systems.The last few years has seen rapid growth in the community of developers building AI/ML-based systems, and publishing pretrained models through community hubs like huggingface (https://huggingface.co/) and kaggle (https://www.kaggle.com). It is not clear that all members of this new community understand the potential risk posed by a third-party model, and may (incorrectly) trust that a model loaded using a trusted library should only execute code that is included in that library. Moreover, a user may also assume that a pretrained model, once loaded, will only execute included code whose purpose is to compute a prediction and not exhibit any side effects outside of those required for those calculations (e.g., that a model will not include code to communicate with a network).

To the degree possible, AI/ML framework developers and model distributors should strive to align the explicit security policy and the corresponding implementation to be consistent with the implicit security policy implied by these assumptions.

Impact

Loading third-party models built using Keras could result in arbitrary untrusted code running at the privilege level of the ML application environment.

Solution

Upgrade to Keras 2.13 or later. When loading models, ensure the safe_mode parameter is not set to False (per https://keras.io/api/models/model_saving_apis/model_saving_and_loading, it is True by default). Note: An upgrade of Keras may require dependencies upgrade, learn more at https://keras.io/getting_started/

If running pre-2.13 applications in a sandbox, ensure no assets of value are in scope of the running application to minimize potential for data exfiltration.

Advice for Model Users

Model users should only use models developed and distributed by trusted sources, and should always verify the behavior of models before deployment. They should follow the same development and deployment best practices to applications that integrate ML models as they would to any application incorporating any third party component. Developers should upgrade to the latest versions of the Keras package practical (v2.13+ or v3.0+), and use version 3 of the Keras serialization format to both load third-party models and save any subsequent modifications.

Advice for Model Aggregators

Model aggregators should distribute models based on the latest, safe model formats when possible, and should incorporate scanning and introspection features to identify models that include unsafe-to-deserialize features and either to prevent them from being uploaded, or flag them so that model users can perform additional due diligence.

Advice for Model Creators

Model creators should upgrade to the latest versions of the Keras package (v2.13+ or v3.0+). They should avoid the use of unsafe-to-deserialize features in order to avoid the inadvertent introduction of security vulnerabilities, and to encourage the adoption of standards that are less susceptible to exploitation by malicious actors. Model creators should save models using the latest version of formats (Keras v3 in the case of the Keras package), and, when possible, give preference to formats that disallow the serialization of models that include arbitrary code (i.e., code that the user has not explicitly imported into the environment). Model developers should re-use third-party base models with care, only building on models from trusted sources.

General Advice for Framework Developers

AI/ML-framework developers should avoid the use of naïve language-native serialization facilities (e.g., the Python pickle package has well-established security weaknesses, and should not be used in sensitive applications).

In cases where it’s desirable to include a mechanism for embedding code, restrict the code that can be executed by, for example:

disallow certain language features (e.g., exec)
explicitly allow only a “safe” language subset
provide a sandboxing mechanism (e.g., to prevent network access) to minimize potential threats.

Acknowledgements

This document was written by Jeffrey Havrilla, Allen Householder, Andrew Kompanek, and Ben Koo.