Skip to main content

Data Setup

Learn how to properly organize your data files for The Forge's automated processing pipeline. Follow these guidelines to ensure your data is processed correctly.

File Organization

The Forge requires your data to be organized in a specific way to automatically process and label your datasets. All datasets come with two standard columns: filename and label.

How It Works

  • Filename Column - Contains the names of individual files as they appear in your zip file
  • Label Column - Automatically populated based on folder structure
  • Folder Names - Become the labels for all files stored in that folder

Zip File Structure

Image Classification Example

Imagine you're building a fire detection model to identify if there's a fire in an image or not. You would need to organize your images like this:

ForestFireImages.zip/
├── Fire/
│ ├── fire1.png
│ ├── fire2.jpg
│ ├── fire3.png
│ └── fire4.jpg
└── NoFire/
├── forest1.png
├── forest2.jpg
├── trees1.png
└── trees2.jpg

Text Classification Example

For a sentiment analysis model, organize your text files like this:

SentimentData.zip/
├── Positive/
│ ├── review1.txt
│ ├── review2.txt
│ └── review3.txt
└── Negative/
├── complaint1.txt
├── complaint2.txt
└── complaint3.txt

Supported File Types

Image Files

  • JPG/JPEG - Standard image format
  • PNG - High-quality images with transparency
  • GIF - Animated or static images

Text Files

  • TXT - Plain text files

Best Practices

File Organization

  • Consistent Naming - Use clear, descriptive file names
  • Proper Folders - Create separate folders for each category
  • Clean Structure - Avoid nested subfolders
  • File Formats - Use standard, supported file formats

Data Quality

  • High Quality - Use clear, high-resolution images
  • Consistent Size - Keep similar file sizes when possible
  • Clean Data - Remove corrupted or duplicate files
  • Balanced Classes - Ensure roughly equal numbers in each category

Security

  • Data Privacy - Ensure sensitive data is properly handled
  • Access Control - Limit access to authorized users only
  • Backup - Keep copies of your original data
  • Compliance - Follow data protection regulations

Getting Started

Ready to process your data? Follow these steps:

  1. Organize Your Data - Create folders for each category
  2. Name Your Files - Use clear, descriptive names
  3. Create Zip File - Compress your organized folders
  4. Upload to The Forge - Let automated processing begin
Pro Tip

The more organized your data is, the better The Forge can process it. Take time to properly structure your folders and files before uploading.