跳转至

加餐03 SQL实战:保险场景的数据查询与优化

你好,我是陈博士,今天我们来探讨一下保险续保业务的数据分析与优化。

续保业务是保险公司持续稳定发展的重要组成部分。通过深入的数据查询和分析,我们可以评估客户的续保倾向,识别潜在的流失风险,并为后续的续保策略提供决策支持。

针对该场景,我整理了5张数据表以及对应的查询问题。针对这些查询问题,你可以了解到这些SQL该如何撰写。

数据表

  • 客户信息表 customers

  • 保险产品表 insurance_products

  • 保单表 policies

  • 续保记录表 renewal_records

  • 续保提醒记录表 renewal_reminders

问题设定

我从客户分析、产品分析、保单分析、续保分析、异常分析、提醒效果分析、交叉分析等维度设置了一些常见的查询问题。

1. 客户分析

  • 查询购买保单数量最多的前10名客户及其保单数量
SELECT
    c.customer_id,
    c.name,
    c.phone,
    COUNT(p.policy_id) as policy_count
FROM customers c
JOIN policies p ON c.customer_id = p.customer_id
GROUP BY c.customer_id, c.name, c.phone
ORDER BY policy_count DESC
LIMIT 10;
 
// 说明:这里使用JOIN连接客户和保单表,GROUP BY按客户分组并统计保单数量
  • 统计每个年龄段的客户数量及平均保费
WITH customer_age AS (
    SELECT
        customer_id,
        FLOOR((YEAR(CURRENT_DATE) - YEAR(STR_TO_DATE(birth_date, '%Y-%m-%d')))/10)*10 as age_group
    FROM customers
)
SELECT
    ca.age_group as age_range,
    CONCAT(ca.age_group, '-', ca.age_group + 9) as age_range_display,
    COUNT(DISTINCT ca.customer_id) as customer_count,
    ROUND(AVG(p.premium_amount), 2) as avg_premium
FROM customer_age ca
JOIN policies p ON ca.customer_id = p.customer_id
GROUP BY ca.age_group
ORDER BY ca.age_group;
 
// 说明:WITH 关键字用于定义公用表表达式(Common Table Expressions,简称CTE),这是一种临时结果集,可以在查询中引用一次或多次。CTE 提高了查询的可读性和维护性,尤其是在复杂的查询中需要重复使用相同子查询的情况下。

2. 产品分析

  • 统计各类保险产品的销售份数和保费收入
SELECT
    ip.product_id,
    ip.product_name,
    ip.product_type,
    COUNT(p.policy_id) as sales_count,
    SUM(p.premium_amount) as total_premium
FROM insurance_products ip
LEFT JOIN policies p ON ip.product_id = p.product_id
GROUP BY ip.product_id, ip.product_name, ip.product_type
ORDER BY total_premium DESC;
 
// 说明:使用LEFT JOIN连接保险产品和保单表,确保包括所有保险产品,GROUP BY按产品分组统计销售份数和保费收入,ORDER BY按总保费降序排列。
  • 查询续保率最高的前5个产品
WITH renewal_stats AS (
    SELECT
        ip.product_id,
        ip.product_name,
        COUNT(DISTINCT p.policy_id) as total_policies,
        COUNT(DISTINCT CASE WHEN rr.renewal_status = '成功' THEN rr.renewal_id END) as successful_renewals
    FROM insurance_products ip
    JOIN policies p ON ip.product_id = p.product_id
    LEFT JOIN renewal_records rr ON p.policy_id = rr.policy_id
    GROUP BY ip.product_id, ip.product_name
)
SELECT
    product_id,
    product_name,
    total_policies,
    successful_renewals,
    ROUND(CAST(successful_renewals AS DECIMAL(10,2)) / NULLIF(total_policies, 0) * 100, 2) as renewal_rate
FROM renewal_stats
WHERE total_policies > 0
ORDER BY renewal_rate DESC
LIMIT 5;


// 说明:CAST 和 NULLIF 是SQL中用于数据类型转换和条件表达式的两个函数。在这个查询中,它们一起用于确保续保率计算的准确性,并处理可能的除以零的异常。

3. 保单分析

  • 查询即将到期(30 天内)的保单清单
SELECT
    p.policy_id,
    p.policy_number,
    c.name as customer_name,
    c.phone,
    ip.product_name,
    p.end_date,
    p.premium_amount
FROM policies p
JOIN customers c ON p.customer_id = c.customer_id
JOIN insurance_products ip ON p.product_id = ip.product_id
WHERE p.status = '有效'
AND STR_TO_DATE(p.end_date, '%Y-%m-%d')
    BETWEEN CURRENT_DATE AND DATE_ADD(CURRENT_DATE, INTERVAL 30 DAY)
ORDER BY p.end_date;
 
// 说明:STR_TO_DATE、CURRENT_DATE 和 DATE_ADD 是SQL中用于日期处理的函数。在这个查询中用于筛选即将到期的保单。
 
STR_TO_DATE 函数用于将字符串按照指定的格式转换为日期类型。
 
CURRENT_DATE 函数返回当前日期,不包含时间部分。它通常用于获取今天的日期,并且在不同的数据库系统中有类似的实现方式(例如,在MySQL中就是 CURRENT_DATE() 或 CURDATE())。
 
DATE_ADD 函数用于向日期添加一个时间间隔。
  • 分析保单状态分布
SELECT
    status,
    COUNT(*) as count,
    ROUND(CAST(COUNT(*) AS DECIMAL(10,2)) / SUM(COUNT(*)) OVER() * 100, 2) as percentage
FROM policies
GROUP BY status
ORDER BY count DESC;
 
// 说明:使用GROUP BY按保单状态分组统计数量,通过窗口函数SUM OVER()计算总数,CAST转换数据类型并计算各状态的百分比,ORDER BY按数量降序排列。

4. 续保分析

  • 查询连续续保 3 次以上的客户
WITH consecutive_renewals AS (
    SELECT
        p.customer_id,
        COUNT(rr.renewal_id) as renewal_count
    FROM policies p
    JOIN renewal_records rr ON p.policy_id = rr.policy_id
    WHERE rr.renewal_status = '成功'
    GROUP BY p.customer_id
    HAVING COUNT(rr.renewal_id) >= 3
)
SELECT
    c.customer_id,
    c.name,
    c.phone,
    cr.renewal_count
FROM consecutive_renewals cr
JOIN customers c ON cr.customer_id = c.customer_id
ORDER BY cr.renewal_count DESC;
 
// 说明:这里使用CTE筛选出成功续保3次及以上的客户,通过GROUP BY和HAVING条件聚合续保记录,最后JOIN客户表获取详细信息并按续保次数排序。

5. 异常分析

  • 查询保费异常的保单(超过平均值 2 个标准差)
WITH premium_stats AS (
    SELECT
        AVG(premium_amount) as avg_premium,
        STDDEV(premium_amount) as stddev_premium
    FROM policies
)
SELECT
    p.policy_id,
    p.policy_number,
    c.name as customer_name,
    ip.product_name,
    p.premium_amount,
    ps.avg_premium,
    ps.stddev_premium
FROM policies p
JOIN customers c ON p.customer_id = c.customer_id
JOIN insurance_products ip ON p.product_id = ip.product_id
CROSS JOIN premium_stats ps
WHERE p.premium_amount > ps.avg_premium + (2 * ps.stddev_premium)
   OR p.premium_amount < ps.avg_premium - (2 * ps.stddev_premium)
ORDER BY p.premium_amount DESC;
 
// 说明:使用CTE计算保费的平均值和标准差,通过CROSS JOIN将统计结果应用到每条保单记录,筛选出保费超过平均值±2个标准差的异常保单。
 
CROSS JOIN 是SQL中的一种连接类型,它返回两个表的笛卡尔积,即第一个表中的每一行与第二个表中的每一行组合。结果集中行的数量等于两个表行数的乘积。

6. 提醒效果分析

  • 统计不同提醒方式的效果(续保成功率)
WITH reminder_stats AS (
    SELECT
        rm.remind_method,
        COUNT(DISTINCT rm.policy_id) as total_reminders,
        COUNT(DISTINCT CASE WHEN rr.renewal_status = '成功' THEN rr.renewal_id END) as successful_renewals
    FROM renewal_reminders rm
    LEFT JOIN renewal_records rr ON rm.policy_id = rr.policy_id
    WHERE rm.remind_status = '成功'
    GROUP BY rm.remind_method
)
SELECT
    remind_method,
    total_reminders,
    successful_renewals,
    ROUND(CAST(successful_renewals AS DECIMAL(10,2)) / NULLIF(total_reminders, 0) * 100, 2) as success_rate
FROM reminder_stats
ORDER BY success_rate DESC;
 
// 说明:使用CTE统计每种提醒方式的总提醒数和成功续保数,通过GROUP BY分组,计算续保成功率并按成功率降序排列,确保分母不为零。
  • 分析提醒后一周内完成续保的比例
WITH reminder_conversion AS (
    SELECT
        COUNT(DISTINCT rr.renewal_id) as total_renewals,
        COUNT(DISTINCT CASE
            WHEN STR_TO_DATE(rr.renewal_date, '%Y-%m-%d') <=
                 DATE_ADD(STR_TO_DATE(rm.remind_time, '%Y-%m-%d %H:%i:%s'), INTERVAL 7 DAY)
            AND rr.renewal_status = '成功'
            THEN rr.renewal_id
        END) as converted_renewals
    FROM renewal_reminders rm
    LEFT JOIN renewal_records rr ON rm.policy_id = rr.policy_id
    WHERE rm.remind_status = '成功'
)
SELECT
    total_renewals,
    converted_renewals,
    ROUND(CAST(converted_renewals AS DECIMAL(10,2)) / NULLIF(total_renewals, 0) * 100, 2) as conversion_rate
FROM reminder_conversion;
 
// 说明:使用CTE统计提醒后一周内成功续保的数量,通过条件判断筛选符合条件的续保记录,计算转换率并确保分母不为零,最终输出总续保数、转换续保数及比率。

7. 交叉分析

  • 分析客户年龄与产品选择的关系
WITH customer_age_group AS (
    SELECT
        customer_id,
        FLOOR((YEAR(CURRENT_DATE) - YEAR(STR_TO_DATE(birth_date, '%Y-%m-%d')))/10)*10 as age_group
    FROM customers
)
SELECT
    cag.age_group,
    CONCAT(cag.age_group, '-', cag.age_group + 9) as age_range,
    ip.product_type,
    COUNT(DISTINCT p.policy_id) as policy_count,
    ROUND(AVG(p.premium_amount), 2) as avg_premium
FROM customer_age_group cag
JOIN policies p ON cag.customer_id = p.customer_id
JOIN insurance_products ip ON p.product_id = ip.product_id
GROUP BY cag.age_group, ip.product_type
ORDER BY cag.age_group, policy_count DESC;
 
// 说明:使用CTE计算客户年龄组,通过JOIN连接保单和产品表,按年龄组和产品类型分组统计保单数量及平均保费,ORDER BY按年龄组和保单数量排序展示结果。
 
CONCAT(cag.age_group, '-', cag.age_group + 9) as age_range 这段代码使用了SQL的 CONCAT 函数来创建一个更具可读性的年龄范围字符串。

如果你是保险行业的从业人员,这节课将对你非常有帮助。期待你的转发,我们下节课再见!