Guru's Verification engine ensures consistency, confidence, and trust in the knowledge your organization shares. Learn more.

Data Processing and File Organization Guidelines

When it comes to bank and credit card statements, we have a well-defined process for organizing the original source data (PDF statements) and converting it to Excel format. When it comes to other data types, the process is not so well-defined since there can be lots of variation in the original source data we're working with. The following guidelines apply more generally and should be followed when organizing and processing any type of PDF data.

  1. Remember that files in [Client Docs] should never be renamed or altered in any way, but if you copy and paste them into a subfolder within [Analysis], you are welcome to rename, combine, and reorganize them as needed--in fact, it's encouraged.

  2. If renaming PDF files for organization, always use the format YYYY-MMDD when including dates in filenames, since this will put them in the proper order when sorting by name. In general, consider how filenames will sort alphanumerically when deciding how to put together filenames.

  3. Excel schedules should always include a line number field; this comes in handy later in analysis when we want to be able to trace back to the original record after performing joins and/or appends in IDEA.

  4. Excel schedules should always contain a source file field containing the filename (and subfolder/path, if relevant) noting the original source file for each record. This facilitates review and makes it easier to find the original data if necessary.

  5. Whoever completes data processing should add check figures so the reviewer doesn't have to. When thinking about setting up check figures, consider where data entry errors can possibly occur and how you would know they had happened. If the original PDF data includes some kind of total, your check figures should compare that total to a total calculated from your schedule. For some types of data, it may also be useful to include a column that checks individual records (e.g. if the scheduled data includes hourly rate, number of hours, and total amount for each record, you could add a field that calculates total amount from the scheduled hourly rate and number of hours and compares that to the scheduled total amount).

You must have Author or Collection Owner permission to create Guru Cards. Contact your team's Guru admins to use this template.