TensorFlow Eager Execution v.s. Graph (@tf.function)

Eager execution is highly promoted in TF 2. It makes coding and debugging easier. But that is not necessarily the mode that TF suggests us to run in production. In this article, we talk about the eager execution mode and the graph mode, as well as its pro and con.

By default, TF operations in 2.x are run in eager execution mode. For example, tf.matmul (matrix multiplication) below executes immediately and returns a tf.Tensor object containing the result [[4.]]. This is what we usually expect in running Python codes. Codes are executed line by line with computation results returned immediately.

However, the graph mode paints a different picture. Instead, tf.matmul returns a symbolic handle to a node in a computational graph. The multiplication execution is deferred.

Source: TensorFlow doc

Eager Execution Disadvantage

In graph mode, tf.matmul adds computation node(s) (tf.Operation) onto a computational graph (tf.Graph). In TF v1 API, we call session.run later to compile and execute the computation graph. This deferred execution allows the TF Grappler to run automatically in the background. It applies a long list of graph optimizations to improve execution performance. For example, node operations can be combined or removed for efficiency. To take advantage of this optimization in 2.x, we need to run the code in the graph mode, instead of the eager execution. TF internal benchmark indicates 15% performance improvement on average. But for the computational-heavy models, like ResNet50, eager execution mode with a GPU is comparable to the graph mode. But this gap will increase when there are lots of small operations. And it decreases with fewer expensive operations like CNN. In reality, your mileage varies depending on your model.

It is pretty easy to change the code to run in graph mode. We simply annotate a function with @tf.fuction such that the whole function will be compiled, optimized, and run as a single computational graph. No extra code is needed. And @tf.fuction will extend its coverage over all the methods that it calls and create a single graph.

The graph mode creates a dataflow graph out of the Python code. This graph creates a portable solution. We can be restored the model without the original Python code or deployed it into a device without the Python environment. Indeed, this is required for saving models to files with SavedModel. This portability has a great advantage in production deployment. By export a SavedModel including data preprocessing, we eliminate possible mistakes in re-create the data preprocessing logic in production. These data preprocessing logic may be sensitive to training data and can be error-prone during deployment, in particular for NLP problems.

However, there is a major catch for graph mode. The key reason for using eager execution, as the default for TF 2, is to make coding and debugging easier. TF 1 APIs are tedious and hard to debug. In graph mode, tf.matmul adds node(s) to the computational graph rather than returning the computation results immediately. The graph mode will not allow the debugger to stop at a breakpoint where tf.matmul is in TF 2.x. It is hard to trace the code.

So during early development and debugging, we may comment out the annotation temporarily. Or we can use tf.config.run_functions_eagerly(True) to turn on eager execution. By setting True before the square function below and False later, we can break inside the @tf.function method.

In eager execution, we can use the regular Python constructs to program our logic. This makes the code Python friendly and easier to code and debug. But, in the graph execution mode, we want these constructs to be part of the graph so we can use the same code for both modes. AutoGraph (discussed later) will help us to transform some Python flow controls into TF operations automatically. But being said, there are other irregularities with no easy to understand rationalities or rules. These irregularities may result in an exception or the constructs will simply be ignored. Otherwise, it may create unexpected side effects. In the next few sections, we will go through issues that we may encounter in the graph mode.

Some Python constructs are simply not supported in the computational graph. For example, Python “assert” within a @tf.function function will throw an exception. Use tf.debugging.assert_{condition} instead for both modes.

But how the Python and TF code are transformed into a graph in the graph mode. When an annotated tf.function method is first invoked, it will be traced first to convert the function into a computation graph. In this process, TF operations will be converted into nodes in the graph. Once the graph is done, it is executed automatically. Let’s illustrate it with a very simple example first.

Python “print” prints its parameters to the console. But when it is inside a @tf.function method, its operation executes in the tracing phase only (the graph generating phase). It does not add any nodes to the graph and it is just absent in the graph mode. Therefore, this operation is phrased as a Python side effect. On the contrary, tf.print is a TF operation. Inside a @tf.function method, it adds a node to the graph in the tracing. The intended print operation is not executed until the graph is executed. For this reason, these 2 print operations are good for troubleshooting the trace and the execution.

We can view the trace as the execution of the method f and we can categorize its operations as Python operations and TF operations. Of course, every operation is executed by Python. But the TF operations do not perform the real operations. It simply adds nodes to the graph. Thanks to the @tf.function annotation, once the trace execution is done, the graph is executed.

(I run the code in debugger line-by-line so the display will follow the chronicle order of the code. Otherwise, the output statement can be displayed slightly out-of-order.)

When f(1) is called for the first time in line 23, it will trace the method first to build a graph. In tracing, print outputs ① and tf.print just adds a node to the graph with no printout. Once the graph is created, it will be executed. “print” is absent in the graph and tf.print outputs ②. So line 23 outputs the first 2 lines in the console.

When we call f(1) again in line 25, the method is already traced and the graph can be reused. Therefore, it goes to graph execution directly with tf.print output ③.

So how do TF handle Python constructs like the “if” and “for” loop inside the trace?

The eager execution allows us to use Python control flow like “while”, “for”, “if”, “break” and “continue”. To make it works with graph mode, AutoGraph converts some of these Python flow controls into TF operations automatically. So they will be treated as TF operations instead of Python operations. This allows us to reuse the more natural Python control syntaxes for both modes and the code will be much easier to read. Below are examples of converting Python flows to TF operations by AutoGraph.

AutoGraph also converts the iterations of the dataset to TF operations.

trace (“if”)

These conversions will be made if the condition in “while” or “if” is a Tensor. But what happens during the trace below may surprise you. n is a Tensor so the statement “if n==0” will be converted to the equivalent TF operation tf.cond.

But, why there are three trace printouts above (“trace value …”). To have the computational graph to work with different values of the input Tensor, TF actually traces all the branches. Therefore, all three branches are called and each branch prints out one output. This mechanism helps us to reduce tracing. When we make the second call on f, no trace is required since the graph created can handle all the conditions for n.

What will happen when the input n is a scalar, as in f(1)? The Python “if” will not be transformed into tf.cond. The trace runs the “if” statement as-is. Therefore, only the branch “elif n==1:” is traced, and no “if” operations are added to the graph.

Therefore, the method is simply traced as:

So what will happen when we call f(2) which needs code in another branch? Fortunately, the code will be traced again to create another graph for f(2). And we will discuss when a method will be traced again in a later section.

trace (“while” and “for”)

Let’s repeat it with the “while” loop. If a Tensor is used in the conditional, it will convert into tf.while_loop and its content will be traced once. If it is not a Tensor, it runs as Python “while” loop. As shown in the example below, it will loop 3 times, and each time, it adds its content into the graph.

So effectively, it traces the following method. If it is invoked with f(4), f will be re-traced.

Again, we should expect the same behavior in the “for” statement. When the expression in “for i in expression” is evaluated to be a Tensor, it will be replaced by tf.while_loop. tf.range returns a Tensor and therefore, the for loop below will be replaced.

The code below shows how the trace is done differently for a Tensor and a scalar expression.

If the training procedure is tf.function-ized, like the one below, it is sometimes important that the “in” expression of the “for” loop to be a dataset (tf.data.Dataset), not a Python or Numpy structure. In the latter case, every iteration will add nodes to the graph during the trace. Therefore, hundreds of thousands of nodes may be added. But if the dataset is used, a combination of tf.Data.Dataset ops are added to the graph once only, not for every iteration. TensorFlow loop traces the body of the loop and dynamically selects how many iterations to run at execution time. The loop body operations only appear once in the generated graph.

Python list is poorly supported in graph mode. In particular, when the list is modified inside or outside the @tf.function method. I experience too many catches that I would suggest not to use a Python list within the annotated method.

For example, the l.append operation is handled by the Python runtime and does not create any node in the graph. This is one of the Python constructs that will be badly ignored in the graph execution with unexpected behavior in tracing. If you need a list-like data structure that adds items in runtime, use TensorArray instead. This is particularly common in RNN where we may accumulate the hidden states for every timestep.

Python is not a strongly-typed language. It permits method parameters to have different types in different calls. It is up to the callee on how to handle it. TensorFlow on the other hand is quite static. Parameters’ datatype and shape information are required to build the graph. Indeed, it builds a different graph when it is invoked with parameters of different data types or shapes for more efficient execution.

f.get_concrete_function returns the ConcreteFunction — a wrapper around the tf.Graph that represents the computational graph. As shown below, f1 is not the same as f2 because f1 parameters have different shapes as f2. They have two different graphs wrapped by two different ConcreteFunction. Gladly, this is wrapped around a Function (python.eager.def_function.Function) that manages a cache of ConcreteFunction. And the callers work with the Function object and the internal differences are hidden from them.

If you want to force them to use the same graph, we can add an input_signature with a TensorSpec that have a more general shape. For example, by specifying the shape as None, it can use the same graph for the vector and matrix below. But the graph can be less efficient.

A None dimension below is a wildcard that allows Functions to reuse traces for variably-sized input.

When a method is retraced because the parameters have un-encountered data types or shapes, it adds overhead. In particular, this can be a concern when the input parameter is a scalar. TF triggers retrace whenever the scalar values are different. As shown below, f3 has a different scalar input and therefore, the method is retraced and its graph is different from f1 and f2. Ironically, this mechanism allows TF to handle the scalar conditions in the “if” and “while” statements as discussed before. To avoid the overhead, design the method correctly. For example, developers may pass in a scalar parameter for the training step number. This can trigger many retraces. It slows down performance.

To avoid that, we can use Tensor objects instead of a scalar.

We take a snapshot of the function when it is traced to create a graph. Hence, even the list l below is changed before calling f again for the second time, it still sees the old l values. (But just listen to me, avoid the use of a list.)

Many Python features, such as generators and iterators, rely on the Python runtime to keep track of the state. As run in graph mode, that becomes unreliable. As shown below, the iterator does not advance upon multiple calls.

You can create tf.Variable variables only when it is first invoked. Without line 15, line 16 may create a variable other than the first call. This will get an exception. This operation intends to modify the graph once it is created. TF does not allow that since TF graph is intended to be quite static. Instead, if applicable, we may create non-model variables outside the function and pass them as parameters.

We can also configure and turn off eager execution in model.compile. When model.fit is processed, the model will be traced and run in graph mode.

With eager execution, Numpy operations can take tf.Tensor as parameters.

Vice versa, tf.math operations convert Python objects and NumPy arrays to tf.Tensor objects. To convert tf.Tensor objects into Numpy ndarray explicitly, use numpy().

A python function can be executed as a Graph without the annotation. tf_function below converts a function into a python.eager.def_function.Function — the same class discussed in the @tf.function annotation.

In graph mode, we want to transform all operations into a Python-independent graph for execution. But in case we want to execute Python code in the graph, we can use tf.py_function as a workaround. However, the portability benefit of the graph will be lost and it will not work well with distributed muli-GPU setups. To work with the graph, tf.py_function casts all inputs/outputs to tensors. Here is the code on the Python list that does not work previously.

While we should avoid its use, the most common use cases are the data augmentation of images using an external library like the scipy.ndimage,

Here, we use the arbitrary rotation in scipy.ndimage to augment the data.

There is also an experimental feature that can reduce retracing for scalar input. For example, an input argument annotated with tf.Tensor is converted to Tensor even when the input is a non-Tensor value. So f(scalar) will not be retraced even for different values. As shown, f_with_hints(2) below will not trigger a retracing.

Many issues discussed in this article can be classified as the current TF implementation limitations rather than as the golden design rules to be followed, for example, different input scale values trigger a retracing. Sometimes, it is hard to justify or explain why it is done in a particular way. Nevertheless, because of the constant changes in TF, check with the latest documents when implements the code — in particular for areas that you feel odd. But most model codes are much simple and you will not deal with those nasty issues.

Introduction to graphs and tf.functions

Better performance with tf.function

tf.function

Footer