Splitting a Python list into sublists

[Edited Dec 6, 2010 to mention another solution based on zip and iter.]

Suppose you want to divide a Python list into sublists of approximately equal size. Since the number of desired sublists may not evenly divide the length of the original list, this task is (just) a tad more complicated than one might at first assume.One Python Cookbook entry is:

1
2
3
4
5
6
def slice_it(li, cols=2):
start = 0
for i in xrange(cols):
stop = start + len(li[i::cols])
yield li[start:stop]
start = stop

which gives the exact number of subsequences, while varying the length of the subsequences a bit if necessary. It uses Python’s slicing feature to get the lengths.

That was written in response to an earlier cookbook entry which had the following one-liner:

1
[seq[i:i+size] for i  in range(0, len(seq), size)]

I like that it’s a one-liner but don’t like a couple of things about it. If your goal isn’t a particular sublist length but rather to divide the list up into pieces, you need another line to compute the size. And then it doesn’t turn out too well. Suppose you want to divide a string of length 10 into 4 substrings:

1
2
3
4
5
6
>>> size=10/4
>>> size
2
>>> seq = [1,2,3,4,5,6,7,8,9,10]
>>> [seq[i:i+size] for i in range(0, len(seq), size)]
[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]

This leaves us with one substring more than desired.

Try setting size to 3 to get fewer substrings:

1
2
3
>>> size=3
>>> [seq[i:i+size] for i in range(0, len(seq), size)]
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]

This leaves us with dissimilar lengths.

Here’s a briefer one-liner using the slice idea, which doesn’t require you to compute the length in advance, and does give the exact number of subsequences you want and with lengths that are more appropriately divided:

1
[seq[i::num] for i in range(num)]

The drawback here is that the subsequences are not actually subsequences of seq; seq is sliced and diced. But, in many situations that doesn’t matter. In any case, all the elements are in the output and the subsequences are as close as possible to the same length:

1
2
3
>>> seq = [1,2,3,4,5,6,7,8,9,10]
>>> [seq[i::num] for i in range(num)]
[[1, 5, 9], [2, 6, 10], [3, 7], [4, 8]]