What are Iterators and few interesting things about them?
We use Iterators almost everyday do we understand them enough.
Story behind this posts
Recently i started reading a book about CSharp by John Skeet “CSharp in depth”. While reading through a section about Iterators
I realized i used Iterators almost everyday in my programmer life but that book taught me a thing or Two about iterators that i did not about them and Joseph Joubert
once said To Teach is to learn Twice
. So decided to write a blog about my learnings with the hope that someone someday will benefit from it.
Iterators - Introduction
Iterators are something that we almost use everyday as a CSharp programmer. Very simple example could be using foreach
over a list of Items. The have some special features or behaviors that we are going to discuss in this post. Let’s first start with an attempt to explain what an Iterator
is.
If i have to explain in layman terms an Iterator is just block of code of code that will produce a sequence that can be Enumerated using Csharp
construct like foreach
or using while
loops.
How we can write an iterator using CSharp
is very simple. It is simply a block of code with yield return
or yield break
statements.
yield return
will produce values for the sequences where as yield break
is used to signal termination of sequence.
Below is very simple example Iterator
. This example is from book “C# in Depth”
static IEnumerable<int> CreateSimpleIterator()
{
yield return 10;
for (int i = 0; i < 3; i++)
{
yield return i;
}
yield return 20;
}
If we iterate this Iterator using foreach
it will produce something like this.
foreach (int value in CreateSimpleIterator())
{
Console.Write($"{value}, ");
}
//will produce
//10, 0, 1, 2, 20,
If we have to consume the Above iterator without using foreach it will be a 2 step process.
- Get Enumerator for the iterator
- Then Keep calling move next until it returns false and use the value of the Enumerator.
MextNext() returning false indicates the end of the sequence returned by the iterator.
IEnumerable<int> enumerable = CreateSimpleIterator();//Create Iterator
using (IEnumerator<int> enumerator = enumerable.GetEnumerator())// Get Enumerator
{
while (enumerator.MoveNext())//Keep calling MoveNext until it return false
{
int value = enumerator.Current;//Use the current value
Console.WriteLine(value);
}
}
We used GetEnumerator inside the using block why it is important to always use Enumerator with Using. We will discuss that later in the post.
At this point it may look very obvious and we could have created a list and added all the items to the List
and return the List
and will produce the similar output. But iterators have something special about them it’s called Lazy Execution
Iterators - Lazy Execution
In this example we created a method that Raise the first parameter to the power of second parameter.
This is fairly common scenario that we come across while writing applications. In this scenario we have a method that will return a collection
. In the first sample we used the List
to collect the results and return the list.
[TestMethod]
public void TestWithoutYield()
{
// Display powers of 2 up to the exponent of 8:
foreach (int i in PowerWithoutYield(2, 8))
{
Debug.Write("{0} ", i);
}
}
public static IEnumerable<int> PowerWithoutYield(int number, int exponent)
{
Console.WriteLine("Power Without Yield Called");
int result = 1;
var results = new List<int>();
for (int i = 0; i < exponent; i++)
{
Console.WriteLine("Calculating an Exponent");
result = result * number;
results.Add(result);
}
return results;
}
If we refactor the above code snippet to use Iterators
instead of creating collection and returning all the result in the form of a collection.The above code sample will change to something like below.
[TestMethod]
public void TestWithYield()
{
// Display powers of 2 up to the of 8:
foreach (int i in PowerWithYield(2,8)
{
Debug.Write("{0} ", i);
}
}
public static IEnumerable<int> PowerWithYield(int number, int exponent)
{
Console.WriteLine("Power With Yield Called");
int result = 1;
for (int i = 0; i < exponent; i++)
{
Console.WriteLine("Calculating an Exponent");
result = result * number;
yield return result;
}
}
Both pieces of code functionally do the exactly same thing but let’s how using Iterators
can change the flow of the execution.
Let’s change our Test Function to something like below.
foreach (int i in PowerWithYield(2,8)
{
Debug.Write("{0} ", i);
if(i> 20) break;
}
foreach (int i in PowerWithoutYield(2,8)
{
Debug.Write("{0} ", i);
if(i> 20) break;
}
Now let’s try to predict How many times Console.WriteLine("Calculating an Exponent")
will be executed for both PowerWithYield
and PowerWihtouYield
?
This what the results of execution results looks like.
So lazy execution means values are produced/calculated only when they are needs to be consumed by the calling function.
Very Interesting thing about Iterators
is if you look at the output above you will notice in case of Iterators
the execution of code is paused after a yield return
until the Enumerator method requests a new Value from the sequence.
**
Now it raises another concern will this code will stay paused inside the Iterator after yield statement. If Enumerator never ask for the next value?
**
Let’s try to answer that question in next section about what flow of execution looks like for an Iterator
.
Flow of execution for Iterator Functions
When an Iterator block start execution it goes only as far as it need to go. It will stop execution if any of these occurs:
- A exception is thrown
- It reached end of function
- Yield break is encountered
- yield return is encountered and ready to return the value
If we look at the piece of code below.
public static IEnumerable<int> PowerWithYield(int number, int exponent)
{
Console.WriteLine("Power With Yield Called");
int result = 1;
for (int i = 0; i < exponent; i++)
{
Console.WriteLine("Calculating an Exponent");
result = result * number;
yield return result;
}
}
The execution for the function will go like this.
- Write to console “Power with Yield called”
- Declare the Local Variable
result
- Go into
for
loop - Write to console “Calculating an Exponent”
- Calculate new value of
result
- Go to
yield return
and pause the execution and return theresult
value.
Now when the foreach
wants to consume the next value the function execution will not start from scratch.
On second iteration it will look like
- Write to console “Calculating an Exponent”
- Calculate new value of
result
- Go to
yield return
and pause the execution and return theresult
value.
Iterator
function execution can pause and resume from where it left. CSharp compiler actually generated the code in background to maintain the State
where we left the execution and also take care of generating code to start execution from where we left. CSharp compiler achieve this using something called StateMachine
. StateMachine is beyond the scope of this post but i can write a post about it if there is interest from the readers. This State Machine works in very similar manner to async/await
feature of C#5
. In past i wrote a post about the state machine for async/await
interested readers can have a read.
Let’s comeback to the question what will happen if the Enumerator never ask for the next value will the Iterator stay paused for infinite time?
Lets try to answer this question with an example.
Let change our function for PowerWithYield
a little bit.
- CASE 1: Then consume it using a
foreach
loop but exit early don’t let the function finish. - Case 2: Consume the same function using
for
loop and finish early don’t let the function finish.
public static IEnumerable<int> PowerWithYield(int number, int exponent)
{
try
{
Console.WriteLine("Power With Yield Called");
int result = 1;
for (int i = 0; i < exponent; i++)
{
Console.WriteLine("Calculating an Exponent");
result = result * number;
yield return result;
}
}
finally
{
Console.WriteLine("Finally is executed");
}
}
//CASE 1: Consume the method but exit early don't let it finish using foreach
public static void Case1()
{
//CASE 1: Consume the method but exit early don't let it finish using foreach
foreach (int i in PowerWithYield(2,8))
{
Console.Write("{0} ", i);
if(i> 20) break;
}
}
//CASE 2: Manually consume the method using for loop and finish early
public static void Case2()
{
//CASE 2: Manually consume the method using for loop and finish early
IEnumerable<int> enumerable = PowerWithYield(2, 8);//Create Iterator
IEnumerator<int> enumerator = enumerable.GetEnumerator();// Get Enumerator we are not disposing it yet
while (enumerator.MoveNext())//Keep calling MoveNext until it return false
{
int value = enumerator.Current;//Use the current value
Console.Write("{0} ", value);
if(value>20) break;
}
}
//Other code
What do you think The Finally block will be executed in Both
CASE 1
andCASE 2
or it will be executed in onlyCASE 1
or inCASE 2
and when finally block will be executed? Try to have a think about before you proceed further with this post.
Link to GIST for IL code generated for the Case1 and Case2
Let’s first start with Case1 and see what IL code is generated by the Compiler for it.
//This is the code generated by Compiler for Case1 method in form of Intermediate Language
public static void Case1()
{
IEnumerator<int> enumerator = PowerWithYield(2, 8).GetEnumerator();
try
{
while (enumerator.MoveNext())
{
int current = enumerator.Current;
Console.Write("{0} ", current);
if (current > 20)
{
break;
}
}
}
finally //This block will be called when we hit Break
{
if (enumerator != null)
{
enumerator.Dispose();// Trigger the dispose on Enumerator that was generated above
}
}
}
The code generated for Case1()
where we used foreach
is very similar to the ones we have written for the Case 2.Only difference is it wrapped the MoveNext()
section with finally
block so when we will break
out of foreach
and Iterator
sequence is not completed it will try to dispose of the Enumerator
that was generated. So the execution flow will look like
- Compiler wrapper the
MoveNext()
section withfinally
- When we hit
break;
finally block will be executed - In
finally
theDispose
of enumerator will be called
But still how the Finally block will get called? If we look at the StateMachine
generated for the PowerWithYield
it got below code for the Dispose
method.
//This method will be called when the Enumerator is disposed
[DebuggerHidden]
void IDisposable.Dispose()
{
int num = <>1__state;
if (num == -3 || num == 1)
{
try
{
}
finally
{
<>m__Finally1();
}
}
}
//and <>m_finally1() look like below
private void <>m__Finally1()
{
<>1__state = -1;
Console.WriteLine("Finally is executed");
}
So the execution flow will be:
Case1()
will call theEnumerator.Dispose()
Dispose()
method inside theStateMachine
will call<>m__Finally1()
<>m__Finally1()
got the code for thefinally
block ofIterator
we created.
As we can see above how compiler generate the codes to make sure even if you stop enumerating an Iterator
early and do not consume all the value it still should be able to call the finally
block of the Iterator
function.
Full discussion about the state machine is out of the scope of this post. But for interested reader here is the Gist for the code above with IL Generated by Compiler.
So the key is we have to dispose the Enumerator correctly to call the finally block if we are not using foreach
and manually Enumerating an Iterator Sequence so to fix Case2 we can add using on the enumerator that will call dispose.
public static void Case2()
{
//CASE 2: Manually consume the method using for loop and finish early
IEnumerable<int> enumerable = PowerWithYield(2, 8);//Create Iterator
//Added using which will internally call dispose on Enumerator
using (IEnumerator<int> enumerator = enumerable.GetEnumerator())
{
while (enumerator.MoveNext())//Keep calling MoveNext until it return false
{
int value = enumerator.Current;//Use the current value
Console.Write("{0} ", value);
if (value > 20) break;
}
}
}
This may seems like a subtle thing but it is really important that
Iterator
finally block is called at the right time. Because if you are opening someIO
connections/handles in theIterators
and disposing/releasing those connections in yourfinally
block ofIterator
iffinally
block is not executed correctly it can result in resource leak or atleast teh delayed release of the high demand resources.
Summary
In this post i tried to share my knowledge about the Iterators
. I do not expect this post to be completely accurate as i am learning everyday but definitely it is best of my knowledge. Lot of information in this post was inspired from the Book C# in Depth
by John Skeet
it’s a great book and i highly recommend it. In this post we did not discussed much about the StateMachine
because that topic deserves a post on its own. If there will be enough interest from the readers and I can try to blog about that also. It’s time for me to say goodbye will see you soon with some other post. Please let me know in comments if this post helped you at all or things i can improve on. Enjoy your life and Be humble :)