Bulk delete certain files

Jan 7, 2024130

AI Translation

This post is translated from Chinese into English through AI.View Original

AI-generated summary

The background of the situation is that the person downloaded a 100GB dataset with thousands of files. However, when they extracted the files on a Windows computer, they found that each file had a useless duplicate file starting with "._". This duplicate file was causing inconvenience and affecting subsequent file reading. To solve this issue, they used a Python script that utilized the os.walk module to recursively delete these duplicate files.

Background#

A few days ago, I downloaded a 100GB dataset online, which contained tens of thousands of files. However, when I extracted it, I found that each file had a duplicate starting with ._, for example, if there was a file named sub_12345, there would be a corresponding ._sub_12345 file. This duplicate file is useless, but it is visible in Windows. It not only looks unpleasant but also affects the subsequent program's file reading.

Python script for batch deletion#

The core is to use the os.walk module for processing:

import os

data_dir = './test/'
for root, subdir, filename in os.walk(data_dir, topdown=False):
  if filename.startswith('._'):
    os.remove(os.path.join(root, filename))

The above is the script, which is very simple. Because os.walk implements recursive folder traversal, the task becomes much simpler.