# 大学生消费分析

**Repository Path**: goodyer123/daxuesxiaofei

## Basic Information

- **Project Name**: 大学生消费分析
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-03-28
- **Last Updated**: 2025-03-28

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

太好了！我们可以使用 Streamlit 快速将你在 Hive 中分析的学生消费数据，整合成一个交互式、可视化的 Web 数据分析报表📊📈

🧰 工具选择：为什么选 Streamlit？
✅ 支持快速构建数据可视化 Web 应用
✅ 支持 Pandas、Matplotlib、Seaborn、Plotly 等
✅ 不需要前端知识
✅ 可部署为独立网页
✅ 一、安装 Streamlit 及依赖库

pip install streamlit pyhive pandas matplotlib seaborn

✅ 二、创建 Streamlit 应用（保存为 app.py）

# app.py
import streamlit as st
from pyhive import hive
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# 设置中文字体
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

# Streamlit 页面设置
st.set_page_config(page_title="学生消费数据可视化报表", layout="wide")

st.title("📊 学生消费数据可视化报表")
st.markdown("数据来源：Hive 数据仓库 `xiaofei_db`")

# 连接 Hive
@st.cache_data(ttl=600)
def load_data(query):
    conn = hive.Connection(
        host='your_hive_host',
        port=10000,
        username='your_user',
        database='xiaofei_db'
    )
    df = pd.read_sql(query, conn)
    conn.close()
    return df


# -------------------------------
# 1️⃣ 月度消费趋势
# -------------------------------
st.header("📈 月度消费趋势")

df_monthly = load_data("SELECT year, month, total_amount FROM monthly_trend ORDER BY year, month")
df_monthly['时间'] = df_monthly['year'].astype(str) + '-' + df_monthly['month'].astype(str).str.zfill(2)

fig1, ax1 = plt.subplots(figsize=(10,4))
sns.lineplot(data=df_monthly, x='时间', y='total_amount', marker='o', ax=ax1)
ax1.set_title("月度消费总金额趋势")
ax1.set_xlabel("时间")
ax1.set_ylabel("消费总金额")
ax1.tick_params(axis='x', rotation=45)
st.pyplot(fig1)


# -------------------------------
# 2️⃣ 学院消费对比
# -------------------------------
st.header("🏫 学院消费对比")
df_dept = load_data("SELECT department, avg_consume, total_consume FROM department_consume_compare")

fig2, ax2 = plt.subplots(figsize=(10,5))
sns.barplot(data=df_dept, y='department', x='total_consume', palette='Blues_d', ax=ax2)
ax2.set_title("各学院总消费对比")
ax2.set_xlabel("总消费金额")
ax2.set_ylabel("学院")
st.pyplot(fig2)


# -------------------------------
# 3️⃣ 消费类型分布（饼图）
# -------------------------------
st.header("🍱 消费类型金额占比")
df_type = load_data("SELECT consume_type, total_amount FROM consume_type_distribution")

fig3, ax3 = plt.subplots(figsize=(6,6))
ax3.pie(df_type['total_amount'], labels=df_type['consume_type'], autopct='%1.1f%%', startangle=140)
ax3.set_title("消费类型金额占比")
st.pyplot(fig3)


# -------------------------------
# 4️⃣ 消费等级分布
# -------------------------------
st.header("💰 消费等级分布")
df_level = load_data("SELECT consume_level, student_count FROM consume_level_distribution")

fig4, ax4 = plt.subplots(figsize=(6,4))
sns.barplot(data=df_level, x='consume_level', y='student_count', palette='Set2', ax=ax4)
ax4.set_title("学生消费等级分布")
ax4.set_xlabel("消费等级")
ax4.set_ylabel("学生人数")
st.pyplot(fig4)


# -------------------------------
# 5️⃣ 消费时间段画像
# -------------------------------
st.header("🕒 消费时间段画像")
df_time = load_data("""
    SELECT time_period, COUNT(*) AS total_times, ROUND(AVG(avg_amount),2) AS avg_amount
    FROM time_based_behavior
    GROUP BY time_period
""")

fig5, ax5 = plt.subplots(figsize=(6,4))
sns.barplot(data=df_time, x='time_period', y='avg_amount', palette='Oranges', ax=ax5)
ax5.set_title("不同时段平均消费额")
ax5.set_xlabel("消费时段")
ax5.set_ylabel("平均消费金额")
st.pyplot(fig5)


# -------------------------------
# 6️⃣ 热门消费地点
# -------------------------------
st.header("📍 热门消费地点")
df_location = load_data("SELECT location, visit_count, total_amount FROM location_hotspot LIMIT 10")

fig6, ax6 = plt.subplots(figsize=(10,5))
sns.barplot(data=df_location, y='location', x='visit_count', palette='coolwarm', ax=ax6)
ax6.set_title("最受欢迎的消费地点")
ax6.set_xlabel("访问次数")
ax6.set_ylabel("地点")
st.pyplot(fig6)


# -------------------------------
# 🔎 数据表展示（可选）
# -------------------------------
st.header("📋 原始数据展示（部分）")
if st.checkbox("显示月度消费数据"):
    st.dataframe(df_monthly.head(20))

if st.checkbox("显示学院消费数据"):
    st.dataframe(df_dept.head(20))

st.markdown("---")
st.caption("© 2025 学生消费分析系统 | 数据来源：Hive 数据仓库")
✅ 三、运行 Streamlit 应用
在命令行运行：


streamlit run app.py


默认会在浏览器打开：http://localhost:8501

✅ 四、效果展示（截图示例）
图表区域	展示内容
📈 折线图	月度消费趋势
📊 柱状图	学院消费对比
🥧 饼图	消费类型分布
💰 柱状图	消费等级分布
🕒 柱状图	消费时段画像
📍 柱状图	热门消费地点
✅ 五、进阶功能建议（可选）
功能	描述
🔎 筛选条件	按学院、性别、时间筛选
📥 数据导出	导出为 CSV、Excel
📈 使用 Plotly	更高级的交互图表
🌐 在线部署	用 Streamlit Cloud / Docker / 内网服务器部署
🔚 总结
你现在拥有了一个完整的 Hive 数据驱动的 Web 可视化分析报表系统！🎉
它可以：

展示学生消费趋势、偏好
发掘消费行为模式
支持学校管理与数据驱动决策
是否需要我为你：

打包部署应用？
接入更多数据？
加入筛选交互功能？
欢迎继续提问，我可以一步步帮你完善这个系统 🚀