Background#
A few days ago, I downloaded a 100GB dataset online, which contained tens of thousands of files. However, when I extracted it, I found that each file had a duplicate starting with ._
, for example, if there was a file named sub_12345
, there would be a corresponding ._sub_12345
file. This duplicate file is useless, but it is visible in Windows. It not only looks unpleasant but also affects the subsequent program's file reading.
Python script for batch deletion#
The core is to use the os.walk
module for processing:
import os
data_dir = './test/'
for root, subdir, filename in os.walk(data_dir, topdown=False):
if filename.startswith('._'):
os.remove(os.path.join(root, filename))
The above is the script, which is very simple. Because os.walk
implements recursive folder traversal, the task becomes much simpler.