# PIIDetectionService - Usage Guide

## 📖 Quick Reference

### **Main Method: `processDocument(string $filePath)`**

This is the primary method for processing documents. It accepts a **file path** (string) and returns complete PII analysis.

---

## 🎯 **Usage Examples**

### **1. Process a Local File**

```php
require_once 'src/classes/autoload.php';
use Redact\Classes\PIIDetectionService;

$awsCredentials = getConfig('AWS_Credentials');
$piiService = new PIIDetectionService($awsCredentials);

// Process document by file path
$result = $piiService->processDocument('C:/path/to/document.pdf');

if ($result['success']) {
    echo "✅ Found {$result['total_pii_instances']} PII instances\n";
    echo "⏱️  Processing time: {$result['processing_time']}\n";
    echo "📄 Pages: {$result['total_pages']}\n";
    echo "📊 Unique words: {$result['unique_words']}\n";
    echo "🔒 PII words: {$result['pii_words']}\n";
} else {
    echo "❌ Error: {$result['error']}\n";
}
```

---

### **2. Process Uploaded File from Form**

```php
// In your upload handler (e.g., process_upload.php)

if (isset($_FILES['document'])) {
    require_once 'src/classes/autoload.php';
    use Redact\Classes\PIIDetectionService;
    
    $awsCredentials = getConfig('AWS_Credentials');
    $piiService = new PIIDetectionService($awsCredentials);
    
    // Use processUploadedFile() for $_FILES arrays
    $result = $piiService->processUploadedFile($_FILES['document']);
    
    header('Content-Type: application/json');
    echo json_encode($result);
}
```

---

### **3. CLI Script**

```php
#!/usr/bin/env php
<?php
require_once __DIR__ . '/src/classes/autoload.php';
use Redact\Classes\PIIDetectionService;

if ($argc < 2) {
    echo "Usage: php script.php <file_path>\n";
    exit(1);
}

$filePath = $argv[1];

$awsCredentials = getConfig('AWS_Credentials');
$piiService = new PIIDetectionService($awsCredentials);

echo "🔄 Processing: $filePath\n";

$result = $piiService->processDocument($filePath);

if ($result['success']) {
    echo "✅ Success!\n";
    echo "   PII Instances: {$result['total_pii_instances']}\n";
    echo "   Processing Time: {$result['processing_time']}\n";
} else {
    echo "❌ Failed: {$result['error']}\n";
    exit(1);
}
```

---

### **4. Custom Configuration**

```php
$piiService = new PIIDetectionService($awsCredentials, [
    'region' => 'us-east-1',           // AWS region
    'max_file_size' => 10 * 1024 * 1024 // 10MB limit
]);

$result = $piiService->processDocument('/path/to/document.pdf');
```

---

## 📊 **Result Structure**

```php
[
    'success' => true,
    'processing_time' => '45.32s',
    'total_pages' => 3,
    'layout_count' => 156,
    'unique_words' => 523,
    'pii_words' => 18,
    'total_pii_instances' => 156,
    'comprehend_calls' => 65,
    'layouts_skipped' => 0,
    'optimization_rate' => 0,
    'pages' => [
        [
            'page_number' => 1,
            'layouts' => [...],        // Layout blocks
            'word_blocks' => [...],    // All words
            'pii_blocks' => [...],     // PII entities
            'image_data' => '...'      // Base64 image
        ],
        // ... more pages
    ]
]
```

---

## 🔧 **Two Methods Available**

### **Method 1: `processDocument(string $filePath)` ⭐ RECOMMENDED**

**Use for:**
- ✅ Local files
- ✅ CLI scripts
- ✅ Direct file processing
- ✅ Unit tests

**Example:**
```php
$result = $piiService->processDocument('/path/to/file.pdf');
```

---

### **Method 2: `processUploadedFile(array $uploadedFile)`**

**Use for:**
- ✅ Web uploads via `$_FILES`
- ✅ Form submissions
- ✅ AJAX uploads

**Example:**
```php
$result = $piiService->processUploadedFile($_FILES['document']);
```

**Note:** This method internally validates the upload and then calls `processDocument()` with the temporary file path.

---

## ✅ **Supported File Types**

- **PDF** (`.pdf`) - Automatically converted to images using Imagick
- **JPEG** (`.jpg`, `.jpeg`) - Processed directly
- **PNG** (`.png`) - Processed directly

### 🔒 **Smart File Detection**

The system uses **MIME type detection** to identify file types from content, not just file extensions. This means:

- ✅ **Cannot be fooled** by renamed files (e.g., `virus.exe` → `document.pdf`)
- ✅ **Automatically detects** whether Imagick is needed
- ✅ **More secure** than extension-only checking
- ✅ **Validates actual file content**

**Example:**
```php
// Even if a PDF is renamed to .jpg, it will still be detected as PDF
$result = $piiService->processDocument('fake_image.jpg'); // Actually a PDF
// ✅ System detects MIME type: application/pdf
// ✅ Automatically uses Imagick for conversion
```

---

## ⚠️ **Error Handling**

```php
$result = $piiService->processDocument($filePath);

if (!$result['success']) {
    switch ($result['error']) {
        case 'File not found: ...':
            // Handle missing file
            break;
        case 'Unsupported file type...':
            // Handle invalid file type
            break;
        case 'File size exceeds...':
            // Handle file too large
            break;
        default:
            // Handle other errors
            break;
    }
}
```

---

## 🚀 **Best Practices**

1. **Always check `$result['success']`** before accessing data
2. **Use `processDocument()` for direct file paths** (simpler)
3. **Use `processUploadedFile()` for `$_FILES` arrays** (handles validation)
4. **Set appropriate `max_file_size`** based on your needs
5. **Log errors** for debugging
6. **Handle timeouts** for large documents (increase `max_execution_time`)

---

## 📝 **Complete Web Example**

**HTML Form:**
```html
<form id="uploadForm" enctype="multipart/form-data">
    <input type="file" name="document" accept=".pdf,.jpg,.jpeg,.png" required>
    <button type="submit">Analyze Document</button>
</form>
```

**JavaScript:**
```javascript
$('#uploadForm').on('submit', function(e) {
    e.preventDefault();
    
    const formData = new FormData(this);
    
    $.ajax({
        url: 'process_upload.php',
        type: 'POST',
        data: formData,
        processData: false,
        contentType: false,
        success: function(result) {
            if (result.success) {
                console.log('PII found:', result.total_pii_instances);
            } else {
                alert('Error: ' + result.error);
            }
        }
    });
});
```

**PHP Backend (process_upload.php):**
```php
<?php
require_once 'src/classes/autoload.php';
use Redact\Classes\PIIDetectionService;

header('Content-Type: application/json');

try {
    $awsCredentials = getConfig('AWS_Credentials');
    $piiService = new PIIDetectionService($awsCredentials);
    
    $result = $piiService->processUploadedFile($_FILES['document']);
    
    echo json_encode($result);
    
} catch (Exception $e) {
    echo json_encode([
        'success' => false,
        'error' => $e->getMessage()
    ]);
}
```

---

## 🧪 **Testing**

```bash
# Run all unit tests
php src/classes/unit/TestRunner.php

# Test with sample file
php -r "
require_once 'src/classes/autoload.php';
use Redact\Classes\PIIDetectionService;
\$service = new PIIDetectionService(getConfig('AWS_Credentials'));
\$result = \$service->processDocument('src/classes/unit/BeytekinS Payslips.pdf');
echo json_encode(\$result, JSON_PRETTY_PRINT);
"
```

---

## 📚 **See Also**

- **Full Documentation:** `testing/layouts/README_CLASSES.md`
- **Unit Tests:** `src/classes/unit/`
- **Example Implementation:** `testing/layouts/process_layout_registry_v2.php`

---

**Need help? Check the README files or run the unit tests!** 🎉

