What are Iterators and few interesting things about them?

We use Iterators almost everyday do we understand them enough.

Tue, 06 Aug 2019

Banner

Story behind this posts

Recently i started reading a book about CSharp by John Skeet “CSharp in depth”. While reading through a section about Iterators I realized i used Iterators almost everyday in my programmer life but that book taught me a thing or Two about iterators that i did not about them and Joseph Joubert once said To Teach is to learn Twice. So decided to write a blog about my learnings with the hope that someone someday will benefit from it.

Iterators - Introduction

Iterators are something that we almost use everyday as a CSharp programmer. Very simple example could be using foreach over a list of Items. The have some special features or behaviors that we are going to discuss in this post. Let’s first start with an attempt to explain what an Iterator is.

If i have to explain in layman terms an Iterator is just block of code of code that will produce a sequence that can be Enumerated using Csharp construct like foreach or using while loops.

How we can write an iterator using CSharp is very simple. It is simply a block of code with yield return or yield break statements.

yield return will produce values for the sequences where as yield break is used to signal termination of sequence.

Below is very simple example Iterator. This example is from book “C# in Depth”

static IEnumerable<int> CreateSimpleIterator()
{
    yield return 10;
    for (int i = 0; i < 3; i++)
    {
        yield return i;
    }
    yield return 20;
}

If we iterate this Iterator using foreach it will produce something like this.

foreach (int value in CreateSimpleIterator())
{
    Console.Write($"{value}, ");
}
//will produce
//10, 0, 1, 2, 20,

If we have to consume the Above iterator without using foreach it will be a 2 step process.

  • Get Enumerator for the iterator
  • Then Keep calling move next until it returns false and use the value of the Enumerator.

MextNext() returning false indicates the end of the sequence returned by the iterator.

IEnumerable<int> enumerable = CreateSimpleIterator();//Create Iterator
using (IEnumerator<int> enumerator = enumerable.GetEnumerator())// Get Enumerator
{
    while (enumerator.MoveNext())//Keep calling MoveNext until it return false
    {
        int value = enumerator.Current;//Use the current value
        Console.WriteLine(value);
    }
}

We used GetEnumerator inside the using block why it is important to always use Enumerator with Using. We will discuss that later in the post.

At this point it may look very obvious and we could have created a list and added all the items to the List and return the List and will produce the similar output. But iterators have something special about them it’s called Lazy Execution

Iterators - Lazy Execution

In this example we created a method that Raise the first parameter to the power of second parameter.

This is fairly common scenario that we come across while writing applications. In this scenario we have a method that will return a collection. In the first sample we used the List to collect the results and return the list.

[TestMethod]
public void TestWithoutYield()
{
    // Display powers of 2 up to the exponent of 8:
    foreach (int i in PowerWithoutYield(2, 8))
    {
        Debug.Write("{0} ", i);
    }
    
}

public static IEnumerable<int> PowerWithoutYield(int number, int exponent)
    {
        Console.WriteLine("Power Without Yield Called");
        int result = 1;
        var results = new List<int>();
        for (int i = 0; i < exponent; i++)
        {
            Console.WriteLine("Calculating an Exponent");
            result = result * number;
            results.Add(result);
        }
        return results;
    }

If we refactor the above code snippet to use Iterators instead of creating collection and returning all the result in the form of a collection.The above code sample will change to something like below.

[TestMethod]
public void TestWithYield()
{
    // Display powers of 2 up to the  of 8:
    foreach (int i in PowerWithYield(2,8)
    {
        Debug.Write("{0} ", i);
    }
   
}

public static IEnumerable<int> PowerWithYield(int number, int exponent)
    {
        Console.WriteLine("Power With Yield Called");
        int result = 1;
        for (int i = 0; i < exponent; i++)
        {
            Console.WriteLine("Calculating an Exponent");
            result = result * number;
            yield return result;
        }
    }

Both pieces of code functionally do the exactly same thing but let’s how using Iterators can change the flow of the execution.

Let’s change our Test Function to something like below.

foreach (int i in PowerWithYield(2,8)
{
    Debug.Write("{0} ", i);
    if(i> 20) break;
}

foreach (int i in PowerWithoutYield(2,8)
{
    Debug.Write("{0} ", i);
    if(i> 20) break;
}

Now let’s try to predict How many times Console.WriteLine("Calculating an Exponent") will be executed for both PowerWithYield and PowerWihtouYield?

This what the results of execution results looks like. Lazy Execution

So lazy execution means values are produced/calculated only when they are needs to be consumed by the calling function.

Very Interesting thing about Iterators is if you look at the output above you will notice in case of Iterators the execution of code is paused after a yield return until the Enumerator method requests a new Value from the sequence.

** Now it raises another concern will this code will stay paused inside the Iterator after yield statement. If Enumerator never ask for the next value?**

Let’s try to answer that question in next section about what flow of execution looks like for an Iterator.

Flow of execution for Iterator Functions

When an Iterator block start execution it goes only as far as it need to go. It will stop execution if any of these occurs:

  • A exception is thrown
  • It reached end of function
  • Yield break is encountered
  • yield return is encountered and ready to return the value

If we look at the piece of code below.

public static IEnumerable<int> PowerWithYield(int number, int exponent)
    {
        Console.WriteLine("Power With Yield Called");
        int result = 1;
        for (int i = 0; i < exponent; i++)
        {
            Console.WriteLine("Calculating an Exponent");
            result = result * number;
            yield return result;
        }
    }

The execution for the function will go like this.

  1. Write to console “Power with Yield called”
  2. Declare the Local Variable result
  3. Go into for loop
  4. Write to console “Calculating an Exponent”
  5. Calculate new value of result
  6. Go to yield return and pause the execution and return the result value.

Now when the foreach wants to consume the next value the function execution will not start from scratch. On second iteration it will look like

  1. Write to console “Calculating an Exponent”
  2. Calculate new value of result
  3. Go to yield return and pause the execution and return the result value.

Iterator function execution can pause and resume from where it left. CSharp compiler actually generated the code in background to maintain the State where we left the execution and also take care of generating code to start execution from where we left. CSharp compiler achieve this using something called StateMachine. StateMachine is beyond the scope of this post but i can write a post about it if there is interest from the readers. This State Machine works in very similar manner to async/await feature of C#5. In past i wrote a post about the state machine for async/await interested readers can have a read.

Let’s comeback to the question what will happen if the Enumerator never ask for the next value will the Iterator stay paused for infinite time?

Lets try to answer this question with an example. Let change our function for PowerWithYield a little bit.

  • CASE 1: Then consume it using a foreach loop but exit early don’t let the function finish.
  • Case 2: Consume the same function using for loop and finish early don’t let the function finish.
public static IEnumerable<int> PowerWithYield(int number, int exponent)
    {
        try
        {
            Console.WriteLine("Power With Yield Called");
            int result = 1;
            for (int i = 0; i < exponent; i++)
            {
                Console.WriteLine("Calculating an Exponent");
                result = result * number;
                yield return result;
            }
        }
        finally
        {
            Console.WriteLine("Finally is executed");
        }
    }

//CASE 1: Consume the method but exit early don't let it finish using foreach
public static void Case1()
{
    //CASE 1: Consume the method but exit early don't let it finish using foreach
    foreach (int i in PowerWithYield(2,8))
    {
        Console.Write("{0} ", i);
        if(i> 20) break;
    }
}


//CASE 2: Manually consume the method using for loop and finish early
public static void Case2()
{
    //CASE 2: Manually consume the method using for loop and finish early
    IEnumerable<int> enumerable = PowerWithYield(2, 8);//Create Iterator
    IEnumerator<int> enumerator = enumerable.GetEnumerator();// Get Enumerator we are not disposing it yet
    while (enumerator.MoveNext())//Keep calling MoveNext until it return false
    {
        int value = enumerator.Current;//Use the current value
        Console.Write("{0} ", value);
        if(value>20) break;
    }
}
//Other code 

What do you think The Finally block will be executed in Both CASE 1 and CASE 2 or it will be executed in only CASE 1 or in CASE 2 and when finally block will be executed? Try to have a think about before you proceed further with this post.

Link to GIST for IL code generated for the Case1 and Case2

Let’s first start with Case1 and see what IL code is generated by the Compiler for it.

    //This is the code generated by Compiler for Case1 method in form of Intermediate Language
    public static void Case1()
        {
            IEnumerator<int> enumerator = PowerWithYield(2, 8).GetEnumerator();
            try
            {
                while (enumerator.MoveNext())
                {
                    int current = enumerator.Current;
                    Console.Write("{0} ", current);
                    if (current > 20)
                    {
                        break;
                    }
                }
            }
            finally //This block will be called when we hit Break
            {
                if (enumerator != null)
                {
                    enumerator.Dispose();// Trigger the dispose on Enumerator that was generated above
                }
            }
        }        

The code generated for Case1() where we used foreach is very similar to the ones we have written for the Case 2.Only difference is it wrapped the MoveNext() section with finally block so when we will break out of foreach and Iterator sequence is not completed it will try to dispose of the Enumerator that was generated. So the execution flow will look like

  • Compiler wrapper the MoveNext() section with finally
  • When we hit break; finally block will be executed
  • In finally the Dispose of enumerator will be called

But still how the Finally block will get called? If we look at the StateMachine generated for the PowerWithYield it got below code for the Dispose method.

//This method will be called when the Enumerator is disposed
[DebuggerHidden]
void IDisposable.Dispose()
{
    int num = <>1__state;
    if (num == -3 || num == 1)
    {
        try
        {
        }
        finally
        {
            <>m__Finally1();
        }
    }
}

//and <>m_finally1() look like below
private void <>m__Finally1()
{
    <>1__state = -1;
    Console.WriteLine("Finally is executed");
}

So the execution flow will be:

  • Case1() will call the Enumerator.Dispose()
  • Dispose() method inside the StateMachine will call <>m__Finally1()
  • <>m__Finally1() got the code for the finally block of Iterator we created.

As we can see above how compiler generate the codes to make sure even if you stop enumerating an Iterator early and do not consume all the value it still should be able to call the finally block of the Iterator function.

Full discussion about the state machine is out of the scope of this post. But for interested reader here is the Gist for the code above with IL Generated by Compiler.

So the key is we have to dispose the Enumerator correctly to call the finally block if we are not using foreach and manually Enumerating an Iterator Sequence so to fix Case2 we can add using on the enumerator that will call dispose.

public static void Case2()
        {
            //CASE 2: Manually consume the method using for loop and finish early
            IEnumerable<int> enumerable = PowerWithYield(2, 8);//Create Iterator
            //Added using which will internally call dispose on Enumerator
            using (IEnumerator<int> enumerator = enumerable.GetEnumerator())
            {
                while (enumerator.MoveNext())//Keep calling MoveNext until it return false
                {
                    int value = enumerator.Current;//Use the current value
                    Console.Write("{0} ", value);
                    if (value > 20) break;
                }
            }
        }

This may seems like a subtle thing but it is really important that Iterator finally block is called at the right time. Because if you are opening some IO connections/handles in the Iterators and disposing/releasing those connections in your finally block of Iterator if finally block is not executed correctly it can result in resource leak or atleast teh delayed release of the high demand resources.

Summary

In this post i tried to share my knowledge about the Iterators. I do not expect this post to be completely accurate as i am learning everyday but definitely it is best of my knowledge. Lot of information in this post was inspired from the Book C# in Depth by John Skeet it’s a great book and i highly recommend it. In this post we did not discussed much about the StateMachine because that topic deserves a post on its own. If there will be enough interest from the readers and I can try to blog about that also. It’s time for me to say goodbye will see you soon with some other post. Please let me know in comments if this post helped you at all or things i can improve on. Enjoy your life and Be humble :)

SHARE
Ranjeet Singh

Ranjeet Singh Software Developer (.NET, ReactJS, Redux, Azure) You can find me on twitter @NotRanjeet or on LinkedIn at LinkedIn