Skip to content

The PHP CachingIterator

Dear Reader,

(Sample code for those too dang lazy to cut ‘n paste)

How I got here

In the course of writing my next book, “Iterating PHP Iterators”, I found something very interesting.

I have a short chapter on the CachingIterator. One of the flags in the CachingIterator is FULL_CACHE. It was during my experiments with tha, that I found…an anomaly.

Note: As of yet, I have not reported this as a bug in PHP because it may just be a situation of “I’m doing it wrong”. I’m putting this out here mainly so someone can point me in the right direction. If no one can, then I’ll file a bug.

The proof of error code

The example I am using in my book is the 7 Dwarfs. Here is the code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<?php
$dwarves = [1=>'Grumpy',
            2=>'Happy',
            3=>'Sleepy', 
            4=>'Bashful', 
            5=>'Sneezy', 
            6=>'Dopey', 
            7=>'Doc'];
$it      = new CachingIterator(new ArrayIterator($dwarves), 
                               CachingIterator::FULL_CACHE);
foreach($it as $v);
 
$it->offsetUnset(4);
$it->offsetSet('Cal','Kathy');
$it[5]='Surly'; 
 
foreach($it as $offset=>$value) {
	echo 'Original: '.$offset.' == '.$value."\n";
}

That code actually works, even if it doesn’t work the way I would expect it to. I would expect that iterating over $it would give me the modified version, not the original “cached” version. Note that Bashful is still in the list and Kathy is not. It is the original list as we loaded it into the ArrayIterator. Also, line 11 is very important, if a bit silly. Yes, you have to spin through the entire array if you pass it in on the constructor, otherwise, the cache doesn’t get loaded.

Now let’s add a little more to it.

1
2
3
foreach($it-&gt;getCache() as $offset=&gt;$value) {
	echo 'Cache: '.$offset.' == '.$value."\n";
}

This now outputs:

$ php ../examples/test.php 
Original: 1 == Grumpy
Original: 2 == Happy
Original: 3 == Sleepy
Original: 4 == Bashful
Original: 5 == Sneezy
Original: 6 == Dopey
Original: 7 == Doc
Cache: 1 == Grumpy
Cache: 2 == Happy
Cache: 3 == Sleepy
Cache: 4 == Bashful
Cache: 5 == Sneezy
Cache: 6 == Dopey
Cache: 7 == Doc

Ok, so now, even when we pull the cache, we still get the original list. I’m not sure how that is right, ever. I know a few of you are saying “but Cal, you have to rewind().” It is to those of you who I say “read my book”. :) But just for grins and giggles, let’s rewind the iterator.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<?php
 
$it = null;
 
$dwarves = [1=>'Grumpy',
            2=>'Happy',
            3=>'Sleepy', 
            4=>'Bashful', 
            5=>'Sneezy', 
            6=>'Dopey', 
            7=>'Doc'];
 
$it      = new CachingIterator(new ArrayIterator($dwarves), 
                               CachingIterator::FULL_CACHE);
foreach($it as $v);
 
$it->offsetUnset(4);
$it->offsetSet('Cal','Kathy');
$it[5]='Surly'; 
 
foreach($it as $offset=>$value) {
	echo 'Original: '.$offset.' == '.$value."\n";
}
 
$it->rewind();
 
foreach($it->getCache() as $offset=>$value) {
	echo 'Cache: '.$offset.' == '.$value."\n";
}
</code>

Now when we run it we get this:

$ php ../examples/test.php 
Original: 1 == Grumpy
Original: 2 == Happy
Original: 3 == Sleepy
Original: 4 == Bashful
Original: 5 == Sneezy
Original: 6 == Dopey
Original: 7 == Doc
Cache: 1 == Grumpy

Hmmm…well that ain’t right.

Here is what DID work. I am not entirely sure why at this point, I’m still investigating.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<?php
$dwarves = [1=>'Grumpy',
            2=>'Happy',
            3=>'Sleepy', 
            4=>'Bashful', 
            5=>'Sneezy', 
            6=>'Dopey', 
            7=>'Doc'];
 
$it      = new CachingIterator(new ArrayIterator($dwarves), 
                                   CachingIterator::FULL_CACHE);
foreach($it as $v);
 
$it->offsetUnset(4);
$it->offsetSet('Cal','Kathy');
$it[5]='Surly'; 
 
foreach($it->getCache() as $offset=>$value) {
	echo 'Cache: '.$offset.' == '.$value."\n";
}
 
foreach($it as $offset=>$value) {
	echo 'Original: '.$offset.' == '.$value."\n";
}

Now we are through the looking glass. The order in which the loops appear in your code makes a difference? Technically, this code outputs the list correctly if you ignore the fact that the cache version should be the immutable one and that $it itself should reflect the changes.

$ php ../examples/test.php 
Cache: 1 == Grumpy
Cache: 2 == Happy
Cache: 3 == Sleepy
Cache: 5 == Surly
Cache: 6 == Dopey
Cache: 7 == Doc
Cache: Cal == Kathy
Original: 1 == Grumpy
Original: 2 == Happy
Original: 3 == Sleepy
Original: 4 == Bashful
Original: 5 == Sneezy
Original: 6 == Dopey
Original: 7 == Doc

BONUS ROUND:

Take the above code, now swap the two foreach statements. See what I mean? The order that the foreach statements are executed in should have absolutely no effect on the output. If this is expected behavior then we kinda need to put it in the manual.

Sooooo…TIL. don’t use the FULL_CACHE flag on the CachingIterator. I am not sure what the FULL_CACHE flag is supposed to do, but it doesn’t seem to do anything useful at the moment.

Summary:

So today I learned, don’t use the FULL_CACHE flag on the CachingIterator. I am not sure what the FULL_CACHE flag is supposed to do, but it doesn’t seem to do anything useful at the moment. Also, it can screw things up for you.

Here are 3 takeaways.

  1. The ‘cached’ version of the iterator should be the one that does NOT change. The iterator itself should reflect the changes made.
  2. Calling rewind() should never cause the cache to forget everything except the last element.
  3. If you pass in the ArrayIterator in the constructor, it does not get loaded into the cache, you have to put an empty foreach loop in your code to load the cache.

I hope this helps someone along the way.

Until next time,
I <3 |<
=C=

6 thoughts on “The PHP CachingIterator

  1. Hi Cal,

    Thanks for your blogpost about the CachingIterator. I think the SPL is still a very much uncharted territory in the PHP world and the more explorers we have, the better.

    First of all, what you are experiencing is not a bug, but it’s also not the behavior you might be expecting from something called an CachingIterator. I’ve written a book about the SPL, [Editors Notes: “Mastering the SPL”] and I do talk about the FULL_CACHE flag in it, but let me explain here a bit more in detail on what’s going on.

    It will make your life (and everybody else’s) much easier if we think of the CachingIterator not so much as an iterator that caches, but as a “look ahead” iterator. This is namely it’s primary purpose with the caching part dangling on the side a bit which causes a lot of confusion.

    The lookahead part is awesome, because it means that we have a hasNext() method. Many times you are doing loops over data, and you need to handle the last (and/or first) item as a special case. With the hasNext() you can provide a much cleaner way to handle those cases. (mental note: don’t run in combination with the infiniteIterator, for obvious reasons :) ).

    Anyway, i’m trying to explain what’s going on, following the text in your blogpost.

    First, you create a cachingIterator (did you read it as LookAheadIterator in your mind yet?), which encapsulates an arrayIterator full of dwarfs. You provide the FULL_CACHE flag which allows you to use the getCache() method (if you DONT set that flag, the getCache() method will throw an exception).

    What many people think, but luckily didn’t fall in that trap, is that the cachingIterator fills it’s internal cache DURING the iteration process. This means that when initializing the iterator, there will be no cache (actually, just an empty cache). This is why you need to iterate over the data first, which you do at line 11 (foreach ($it as $v);). Even though it doesn’t do anything, you are internally filling the cache. This cache can be seen (or actually IS) just an array() that is located inside the iterator.

    A detail that most people find out, (that is until you stop thinking about it as a cachingIterator), is
    the fact that during the NEXT iteration of the iterator, it WONT use the cache itself. You are reading
    this right: the cachingIterator doesn’t use it’s own cache. It’s “current” method is using the same code
    as many other iterators. No special cases are there that checks if the current item is found in the cache,
    and if so, use that one.

    So, after your iteration and filled cache, you are doing some setting and unsetting of “something”. As the
    cachingIterator uses the ArrayAccess interface, you can do things like offsetSet|Get, and using the standard
    indexing [] notation, but it’s up the cachingIterator to implement this functionality.

    The offsetGet|offsetSet and offsetUnset methods of the cachingIterator are actually updating the values directly in the iterator’s internal cache-array. This is what you are doing on line 13-15: changing values in the internal cache-array.

    Let’s take a look what you are doing in the next lines (17-19): another iteration over the elements. I said before that this cachingIterator doesn’t use its own caching when iterating data: so in this case it makes sense that you don’t see the updated cache-values. However, there is a side-effect when iterating: it will RESTOCK the cache again. Meaning all your cached data will be gone (this is a side-effect from the rewind(), which i will talk about later).

    The fact that you are looking at your cache on the next lines (the foreach() block over $it->getCache()), means that you are fetching a fresh new generated cache WITHOUT your Surly, and Kathy values.

    If you would have placed this block at line 16 (after your cache modification and before the second iteration over the array), you WOULD have seen the changes.

    In your next codeblock, you are rewinding the iterator, and now things REALLY are falling apart.
    Here you are doing the same thing, but are rewind()ing the iterator in between. Sometimes, people think that rewinding is a solution for every problem, and in many cases they are right but alas, not in this one.

    So, what is exactly happening during a cacheIterator rewind? First of all, it does what it says: it rewinds the iterator (the so-called “spl_dual_it_rewind”, if you are into those internals kind of thing), but it does 2 other things:
    remember that i told you that it would restock the cache during each iteration before? Actually, what it does it just throw away the whole caching during a rewind. Literally.. like you would have done: unset($it->cache) or something. It’s gone. Never coming back. This way: iteration must be done in order to fill up the cache again.
    But why are you seeing one item inside your cache after the rewind()? This is because of the fact the the cachingIterator is a LookAheadIterator, and even this is a sort of a misnomer: the iterator will rewind(), and automatically do a “next()”. The current() you are fetching every time is not so much the current element the iterator is on now, but the PREVIOUS current. It’s not so much as a lookAhead, as it is a returnBehind iterator. But for the sake of a sane mind, lookAhead will do.

    Thus: a rewind() is doing an automatic next() call, which is exactly the place where the cache is being filled! This is why you are seeing one, and only one element in your cache. It’s still building up the cache!

    In your next example block, you are actually foreach()ing the cached version first (line 18-20) and doing another iteration second (22-24). By know you should understand that doing a cached foreach() again, would have cleared Kathy and Surly again.

    Bonus round:
    Now you should understand why it’s doing the things it’s doing (even though you might not agree with it). There are no bugs here, but merely bad/incomplete documentation.

    The FULL_CACHE *IS* important if you want to do ANYTHING with caching. If you don’t want to cache, don’t use FULL_CACHE, and treat the CachingIterator as a plain LookAheadIterator, without the additional cache storage. If you NEED the cache, by all means, use it BUT beware that on the next rewind of the iterator, your cache will be gone unless you save it by yourself.

    Summary:
    I think you could answer your 3 summary conclusions by now: the cache DOES change when you start a new iteration on the iterator.
    2. Rewind DOES forget everything. But the lookahead functionality automatically loads the first element, and stores it into the cache.
    3. The cachingITerator stores the cache during iteration, not during construction. This is also the only way to do it: if you would cache data during construction, it would mean that during construction you need to prefect ALL the data from the underlying iterators. Maybe doable with arrayIterators, but not really viable when you are encapsulating database cursors or heavy calculations.

    In SPL land, there is a big gap over what is to be expected, and what is really happening. Does it make sense? Yes, in a way it does, but the lack of documentation and strange naming of things doesn’t help making it obvious for everybody.

  2. Hi Joshua!

    Thank you for the very detailed explanation, I’m sure my readers appreciate the inside scoop. :) I took the liberty of adding a link to your book so that others can find it and buy it.

    You are right, I now see how the CachingIterator acts. I actually had a line in my book about the fact that it should have been named LookAheadIterator. :) That is the functionality that most people will be using.

    Two things.

    Now that I understand it better, I feel even stronger that it is broken. How it accomplishes what it does it wrong. The easiest case in point is the rewind(). Understanding why it does that does not change my believe that it is bad behavior.

    Second, again, it makes no sense that changes made to the object itself only get applied to the cache. The cache should be held in reserve as the original. Change made should be made to the object itself.

    Yeah, I figured out the part about having to iterate over the object to pre-load the cache. That’s just nuts. :)

    Thank you again!

    Cheers!
    =C=

Comments are closed.