python string += string
I use the string += operation very commonly in all my python programs, and sometimes store a few MB of data in memory before flushing it out to disk.
I just learnt that its implementation makes it a very slow operation. Basically, in python strings are immutable. This means += operation is destroying and creating objects on every call. Imagine doing this a few hundred thousand times in each program.
Today, when a simple loop was taking infinitely long time, I was forced to investigate, and sure enough someone had explained it on this thread on python forum.
But I cannot keep invoking file ios for each append operation either. Even though file writes already have buffering implemented, I like to explicitly store data in memory for a few steps of string appends, and then flushing it to disk. This is important if you want to monitor the progress of your program using these logs - deterministically - such as every 1000 steps of the loop. I wrote this simple class that makes this task very easy.
I just learnt that its implementation makes it a very slow operation. Basically, in python strings are immutable. This means += operation is destroying and creating objects on every call. Imagine doing this a few hundred thousand times in each program.
Today, when a simple loop was taking infinitely long time, I was forced to investigate, and sure enough someone had explained it on this thread on python forum.
But I cannot keep invoking file ios for each append operation either. Even though file writes already have buffering implemented, I like to explicitly store data in memory for a few steps of string appends, and then flushing it to disk. This is important if you want to monitor the progress of your program using these logs - deterministically - such as every 1000 steps of the loop. I wrote this simple class that makes this task very easy.
class hugeFileWrite:
def __init__(self, fname, step=100):
self.sout = ''
self.step = step
self.fname = fname
self.count = 0
f = open(fname, 'w')
f.write('')
f.close()
def addString(self, smore):
self.sout += smore
self.count += 1
if self.count > self.step:
self.flush()
# Make sure you call flush() after your last addString
def flush(self):
f = open(self.fname, 'a')
f.write(self.sout)
f.close()
self.sout = ''
self.count = 0
<< Home