Whilst data submitted to our systems is not used for GPT model training, it is still strongly recommended to redact or anonymize sensitive company or client-specific information from all training data you upload to our Platform. Data in upload files should also be well-structured to enable the AI Model to interpret it as easily and effectively as possible. We've included a few pointers on how best to achieve that below:

Data Preparation Basics for AI Training

Preparing your data properly is one of the most important steps before using it to train any AI model. Think of it as cleaning and organizing ingredients before cooking—you’ll get a much better result if you start with the right materials in the right condition.

1. Follow good data housekeeping

Messy data leads to messy results. Look for and fix common issues such as:

Duplicates: Remove repeated entries so the AI model doesn’t “over-learn” from them.
Errors: Correct typos or mistakes that could lead to confusion in interpretation.
Missing pieces: Decide whether to fill in missing information, or remove the data point(s) altogether.

2. Stay Consistent with formatting and presentation of data

AI models perform better when the data follows a clear, consistent pattern. For example:

Dates should all be written in the same format (e.g., 01/12/2025 instead of mixing 1 Dec 25, Dec 1 2025, etc.).
Boolean Categories (like “Yes/No” or “True/False”) should use the same wording every time.

3. Balance the Data

If one type of example heavily outweighs others, the model might become biased.

For example:
Imagine you’re training Productised on an AI business diagnostic tool that classifies companies as either “Well-Organized” or “Needs Improvement” based on their answers to a set of form questions.

If 85% of your training responses are labeled “Well-Organized” and only 15% are “Needs Improvement”, the model may learn to assume that almost every company is “Well-Organized,” even when that’s not true in actual fact.

To avoid this, try to:

Collect more “Needs Improvement” or equivalent examples so both categories are better represented.
Or, use a balanced sample when feeding the data into the model.

This ensures the AI gives fairer, more accurate results instead of leaning too heavily toward one outcome.

4. Organize Data into Distinct Sets

It’s good practice to divide your data into at least two groups:

Training data: The examples the model learns from.
Testing data: The examples the model hasn’t seen before, used to check how well it performs.

This helps ensure the AI model doesn’t just memorize but actually learns patterns it can apply to new situations.

Key Takeaway

Good data preparation is about quality, consistency, and fairness. If your data is clear, balanced, and relevant, your AI model has a much better chance of producing accurate and reliable results.

Best Data Practices for File Uploads

September 12, 2025