Redacting many Microsoft Word files quickly

#Windows #VBScript #Sharepoint

Recently, my organization was cleaning out their files on SharePoint and moving them to be archived. The main concern with this was privacy as many of the documents in the collection have sensitive information. All of these files were stored in SharePoint, making it really difficult to work with them (download, edit, save, upload, repeat).

So, here's what I did to redact the 30,000 files. I'm using Windows 7 and the most up to date software as of 2016.

First of all, to make sure these documents are completely rid of sensitive information, you're going to need to do several things to make the process easier and to make sure you find and redact everything you're looking for.

Mounting the SharePoint folder

Before we begin working on the files, we need to mount the SharePoint folder to your computer so that Windows treats it like a local folder on your hard drive.

  1. Navigate to Start > Computer. In the top toolbar, click Map network drive. Mounting a SharePoint drive in Windows 7

  2. Now, open your browser and navigate to the folder location you want on SharePoint.

  3. Copy and paste that address into the Folder location in the Map network drive window. The drive letter doesn't matter, so you can change it to whatever is available in the list. Check the box to enable Reconnect at logon and then, click Finish. Mounting a SharePoint drive in Windows 7 - wizard

Now, you can work with the files on the SharePoint server without having to download/upload every time you edit them.

Renaming files

Many of the files in the Duke Archive had donor names in the filennames. One single file is easy to rename, but when we're dealing with a lot, it will take you forever if you rename each file one by one. We're going to use two programs to rename files in bulk.

Bulk Rename Utility

This is a slow application, but it gets the job done.

  1. Download the program and install it.

  2. Open the program. There are TONS of options and features that you can use, but we're only going to use a few features. Locate your mounted SharePoint drive in the top left pane and navigate to the folder with all of the targeted files. Bulk Rename Utility_main window screenshot

  3. When you've located your files, make sure that the “Subfolder” checkbox is checked in box 12 titled Filters. This will make the program search within the folders you've selected and rename the nested files. The program will immediately start working to populate the list of files. This process may take up to 30 minutes, so be patient! Bulk Rename Utility_main window screenshot with an arrow

  4. When the program is finished building the list, you can begin to rename the filenames. In box 3, type the name of the word you want to redact in the Replace field. In the Duke archives, I'm going to use a donor name. In the With field, I'm going to type “XXX DONOR”. Then, click in the file list pane at the top right of the window and press Ctrl + a on the keyboard to select all of the files. This process might take some time, so be patient. You can check the status of the selection in the bottom of the window. Bulk Rename Utility_main window screenshot indicating the Replace field

  5. When the process is complete, click the **Rename **button at the bottom right of the window. The popup will give you a summary of the changes you will make to the files. Click **OK **if everything looks right. Bulk Rename Utility screenshot of completed window

This process will take a long time, maybe over an hour. Minimize the program and ignore it for a while. A dialog window will popup with the results of the rename process when it is complete. The program may say “Not Responding” in the title bar, but rest assured that it's still running in the background. Ignore the warnings and just let it run.

Total Commander file manager

While Bulk Rename Tool does a great job, we're going to use a second program to cover all of our bases. Total Commander is an old (and ugly) application, but its rename tool is very powerful.

  1. Download the application and install it. Open the program when the install process is complete.

  2. Total Commander is ugly and can be very confusing to use. Don't worry! We're just going to focus on the leftmost navigation pane.

  3. Click on the address bar at the top of the left pane. Navigate to your mounted SharePoint drive. Total Commander main window screenshot

  4. Click inside the left pane with the file contents and press Ctrl + b on the keyboard to begin locating all of the subfolders and files. The program might take a few minutes, but it's working hard locating all of the files in the directory. Total Commander rename in progress screenshot

  5. When the process is complete, press Ctrl + A on the keyboard to select all of these subfolders and files. This process might take a few minutes. When the program is finished, all of your files in the left pane should be highlighted red. Total Commander rename red screenshot

  6. Now, press Ctrl + M on the keyboard to open up the renaming tool. Leave the **Rename mask **and the Extension box at their default values. Change the fields in Search & Replace to what you're looking to redact. In the case of the Duke Archives, I'll type a donor name in the **Search for **field and type “XXX DONOR” in the **Replace with **field. You can scroll through the file list at the bottom of the window to see if the file names look correct. The left column has the original name and the one to the right gives a preview of what the file will be named after the process is complete. Total Commander search and replace screenshot

  7. Click the Start! button and let it do its thing.

Total Commander is faster than the Bulk Rename Utility, but it still might take up to an hour to rename all of the files if you have a lot. When the program is finished with the renaming process, it will give you a confirmation message.

Once the program is finished renaming everything, you should be finished with everything you need to do as far as renaming goes. If you're satisfied, move on to configure Word.

Setting up Microsoft Word

If you want, you can open up each file in Word, look for every donor name, and try to redact everything yourself. That will take too long.

First, we have to tell Word to loosen up on security so that we can open up everything on our SharePoint folder.

  1. Open Word. Create a blank document and click on File in the Ribbon. Click **Options **and click on Trust Center. Then, click on the Trust Center Settings... button. Microsoft Word Trust Center screenshot

  2. From here, we're going to open up some of the security settings. Just trust me on these options. It'll save you headaches later.

Once you're finished changing those options, click OK and return to the main document view.

Now, you're going to want to add a macro to Word so that we can automate the redaction process.

  1. First, copy this macro file:
Sub AutoOpen()
'
' AutoOpen Macro
'
'

' Tim D'Annecy made this Word macro in 2016. It's designed to quickly redact text from documents 
' and remove any sensitive information. If you have questions, email me at <tdannecy@gmail.com> and 
' I'll help you, even if I don't work at Ipas anymore. Honestly, I don't know if this script is 
' efficient, but it gets the job done!

' If you name this macro "AutoOpen", Word will automatically run it whenever you open a file. This 
' comes in handy when you're redacting a lot of files and don't want to manually start the macro, but 
' it can be really annoying if you're just opening a regular Word document . To avoid running the 
' macro, hold down the Shift key to start Word without running the macro. When you're finished with 
' the redaction project, you should probably just delete the file out of the Word macro folder.

' First of all, we have to accept all of the Tracked Changes in the document and turn it off. I did 
' this because Word saves the original text after redacting. It defeats the purpose if someone could 
' just go into the file history and revert the changes to see the donor name, so we have to accept 
' the changes in the current document, turn off Tracked Changes, and replace the donor name without 
' keeping the original text.
    ActiveDocument.AcceptAllRevisions
    ActiveDocument.TrackRevisions = False

' This is the meat of the macro. You can use this template to find and replace any text in your 
' document. The .Text field is what you're finding. The .Replacement.Text is what you're replacing. 
' Make sure you're surrounding your text with quotation marks or it won't work correctly. To add 
' names, just copy and paste the whole block of information below and change those two fields. I had 
' over 200 of these sections in my macro when I was doing the Duke Redactions. 
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        .Text = "donor name"
        .Replacement.Text = "XXX DONOR"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = True
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    
' This section removes all of the headers and footers in the document. Word is a pain and won't 
' search through headers and footers when running "find and replace" so this section is a way around 
' this limitation. I'm playing it safe because I found that a lot of donor or employee information 
' was stored in the header or footer and it could be revealing. I'd recommend deleting these and not 
' worrying about missing names later on.
    Dim oSec As Section
    Dim oHead As HeaderFooter
    Dim oFoot As HeaderFooter
    For Each oSec In ActiveDocument.Sections
        For Each oHead In oSec.Headers
            If oHead.Exists Then oHead.Range.Delete
        Next oHead

        For Each oFoot In oSec.Footers
            If oFoot.Exists Then oFoot.Range.Delete
        Next oFoot
    Next oSec

' Use the following line your own risk!! I would recommend giving each document a once-over after 
' document when the macro completes, so make sure you have your "find and replace" words correct. 
' the macro is run to make sure everything looks right. This line automatically closes and saves the 
' Luckily, SharePoint saves versions, so you can revert the file you've made a mistake, but still. 
' It can be a pain. Just make sure you're comfortable with the macro and the outcomes you get before 
' you enable this line. Delete the apostrophe and save the file if you want the macro to 
' automatically save and close each document.

' ActiveDocument.Close SaveChanges:=wdSaveChanges
    
End Sub
  1. Open the file in Notepad or another file editor like Notepad++. Inside that file, there are instructions on how to edit and tweak the macro to your needs. Make sure you follow the instructions in the file and edit the code before moving on to the next step. When you're finished making changes, copy the entire file contents of the Redaction macro text file to the clipboard. (Just press Ctrl + c on the keyboard)

  2. Open Word and navigate to View > Macros on the ribbon. The macro window should pop up. Type “AutoOpen” in the Macro name field and then click on the Create button. Microsoft Word macros screenshot

  3. A new Visual Basic window will popup. In the whitespace, press Ctrl + A on the keyboard to select all of the text and then press Ctrl + V to paste in your edited macro. Check to make sure everything looks right and click the Save icon in the toolbar. Microsoft Word Visual Basic editor

Sometimes, documents want to remove your macros or replace the ones you've saved. You want to make the Normal macro file Read Only to prevent this from happening.

  1. Exit Word and navigate to Start > Computer and paste %userprofile%\AppData\Roaming\Microsoft\Templates in the location bar. Right click on the file titled Normal.dotm. Microsoft Word template location in Windows Explorer

  2. Check the Read-only checkbox and click OK. Screenshot of files properties in Windows Explorer

Searching through file contents

To find the names within the document text, you'll need to build an index and search through it. To do this, you'll need a program called DocFetcher.

  1. Download the application and install it. Open the program when the install process is complete.

  2. Before we search through the contents, we have to build an index. Right click on the bottom left “Search Scope” pane and selecting Create Index From > Folder... Screenshot of the DocFetcher main window

  3. Navigate to your mounted folder and click OK.

  4. Leave the indexing options on the default settings and click Run to begin building the index. Screenshot of the DocFetcher indexing queue

  5. Click on the Minimize button within DocFetcher to hide the window. minimize.PNG

  6. Select the scope in the bottom left “Search Scope” pane and you can begin searching through the documents. For this example, I'm going to find the word “Ipas” in every file.

  7. In the bottom left “Search Scope” pane, make sure that you have added a checkmark to the folders you want to search.

  8. At the top of the window, type your search term in the white search bar and click on “Search”.

  9. The results will populate in the main window pane. You can sort these using the column headers. For this example, we're going to redact information from Word documents (DOC or DOCX files). Sorting by filetype will be the most useful.

  10. You can click on the arrow on the bottom of the screen to open up a preview of the text within the file. It will highlight your search term. You can click on the up and down arrows at the top of the pane to turn the page. Screenshot of the DocFetcher preview pane

When you locate the file you want to redact, double-click on the file name to open it up in the default application. You can open up 10 files at a time. If you've opened up a Word document, you can make changes and redact the sensitive information by hand. If you're using the macro, it will automatically remove all of the text that you've specified. Just be sure to go through the document to make sure your macro removed everything correctly. When you save the file, it will automatically create a new version on SharePoint, so you don't have to worry about re-uploading or any of that SharePoint nonsense.