Top 10 Categorized Prompts for Data Cleaning & Preparation

Effective data cleaning and preparation are essential steps in data analysis and machine learning projects. Using the right prompts can streamline these processes, making data more reliable and easier to analyze. Here are the top 10 categorized prompts to assist you in data cleaning and preparation tasks.

1. Data Validation Prompts

  • Identify missing or null values in the dataset.
  • Detect duplicate records and suggest removal strategies.
  • Validate data types for each column (e.g., numeric, date, text).
  • Check for inconsistent data entries or formatting issues.

2. Data Cleaning Prompts

  • Standardize text data by converting to lowercase or uppercase.
  • Remove special characters or unwanted symbols from text fields.
  • Fill missing values using mean, median, mode, or interpolation.
  • Drop duplicate or irrelevant records to clean the dataset.

3. Data Transformation Prompts

  • Create new features based on existing data (feature engineering).
  • Normalize or scale numerical data for better model performance.
  • Convert categorical variables into dummy/indicator variables.
  • Aggregate data at different levels (e.g., daily to monthly).

4. Data Formatting Prompts

  • Convert date formats to a standard format (e.g., YYYY-MM-DD).
  • Ensure consistent units of measurement across data.
  • Format numerical data to a fixed number of decimal places.
  • Align data columns for easier analysis and visualization.

5. Outlier Detection Prompts

  • Identify outliers using statistical methods (e.g., z-score, IQR).
  • Visualize data to spot anomalies and outliers.
  • Decide whether to remove or transform outliers for analysis.

6. Data Integration Prompts

  • Merge datasets from different sources with common keys.
  • Handle conflicting data during integration.
  • Align data schemas for seamless merging.

7. Data Sampling Prompts

  • Create representative samples for testing and validation.
  • Balance datasets by oversampling or undersampling.
  • Randomly select subsets for quick analysis.

8. Data Quality Assessment Prompts

  • Calculate data quality metrics (completeness, accuracy).
  • Identify inconsistent or unreliable data points.
  • Generate reports on data quality issues.

9. Automation and Scripting Prompts

  • Write scripts to automate repetitive cleaning tasks.
  • Schedule regular data updates and cleaning processes.
  • Use macros or functions for common transformations.

10. Documentation and Metadata Prompts

  • Document data cleaning steps for reproducibility.
  • Maintain metadata about data sources and transformations.
  • Create data dictionaries for dataset clarity.