Getting all mtimes in a directory recursively with Python
I need to get the mtimes for all files in a directory quickly, so I can add a good caching layer to Heron.
There's a lot of bad/beginner advice on the internet on how to do this, so I had to try it out myself. Here are three four ways to do it:
test_dir = Path('.')
def method1():
total = 0
for dirpath, dirnames, filenames in os.walk(test_dir):
for filename in filenames:
total = os.path.getmtime(os.path.join(dirpath, filename))
return total
def method2_1():
total = 0
for dirpath, dirnames, filenames, dir_fd, in os.fwalk(test_dir):
for filename in filenames:
total = os.stat(filename, dir_fd=dir_fd).st_mtime
return total
def method2_2():
total = 0
for dirpath, dirnames, filenames, dir_fd, in os.fwalk(test_dir):
for filename in filenames:
total = os.path.getmtime(os.path.join(dirpath, filename))
return total
def method3():
total = 0
file_paths = glob.glob("**/*", root_dir=test_dir)
for file_path in file_paths:
total = os.path.getmtime(test_dir / file_path)
return total
print(" 1", timeit.timeit(lambda: method1(), number=1000))
print("2_1", timeit.timeit(lambda: method2_1(), number=1000))
print("2_2", timeit.timeit(lambda: method2_2(), number=1000))
print(" 3", timeit.timeit(lambda: method3(), number=1000))
Here's the output for my specific test directory on a Linux computer:
1 2.029968542000006
2_1 1.4166910230001122
2_2 1.9437335370002984
3 3.277476520000164
Note that these are for 1000 runs each, so any of these methods will be quick enough in my case.
Using the glob
-kind of method can be much faster when you can discard many files, ie. if you only need all image files or some such. I need all files here, so this does not apply to me.
I guess I should implement a version using scandir
as well, but I don't expect a lot of speedup over method 2_1
, since on Linux, each stat
needs a stat
anyway. So I'll go with 2_1
for the moment.