# data_engineering **Repository Path**: lundechen/data_engineering ## Basic Information - **Project Name**: data_engineering - **Description**: Shanghai University, Data Engineering course (for master students) - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 4 - **Forks**: 1 - **Created**: 2022-11-06 - **Last Updated**: 2025-11-06 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Data Engineering (Master, UTSEUS, Shanghai University) WeChat Group: ![](img/wechat.png) ## Where :school: > **Attention: Bring your PC and headphone.** In place (by default): - UTSEUS Building 406 If online: - Tencent Meeting (VooV Meeting) - Room ID:# 958 9491 5777 ## When :clock8: Each Monday, from 8:00 to 11:40 ## Week 1 - Week 4 ### Machine Learning Web Application Self-paced learning. Tutorial: - https://gitee.com/lundechen/machine_learning_web_app Corresponding videos: - https://space.bilibili.com/472463946/channel/collectiondetail?sid=211561 | Time | Videos | Comments | |--------|---------------------|----------------------------------------| | Week 1 | Video 1 - Video 3 | First session. Mainly environment setup. You might need to pip install some packages, e.g.
- `pip install streamlit streamlit_drawable_canvas tensorflow opencv-python python-multipart`
- [click here for a full list of packages to install](https://gitee.com/lundechen/machine_learning_web_app/blob/master/requirements.txt) | | Week 2 | Video 4 - Video 7 | Each student should be able to deploy the app locally. | | Week 3 | Video 8 - Video 10 | Each 2-3 students form a group and deploy the app on the cloud together.
Buy a cloud VM:
- https://cloud.tencent.com/act/campus (A Lighthouse VM 轻量服务器 is more than enough. )
- https://cloud.tencent.com/act
- Other alternatives: Huawei/Ali/Qingyun Cloud

*When you finish your cloud deployment, send your web app link to the WeChat group, e.g. http://30.42.91.34:8501* | | Week 4 | Video 11 - Video 12 | Students can optionally follow Video 13 and Video 14. | ### FAQ for ML Web App | Question | Answer | |--------|---------------------| | Can I open Jupyter Notebooks directly in VS Code or PyCharm, instead of in a web browser? | Yes, absolutely. | | I have already some experience with Streamlit. Should I still follow those basics? | Well, in this case, you could use Flask instead of Streamlit. Or you could propose an App that involves advanced Streamlit features. | | Which terminal software is used in the video? | Windows Terminal. And yes, it's a good one.
https://github.com/microsoft/terminal | ## Week 5 - Week 7 ### Static Website with Go Hugo (Video 1-5) And yes, you will have your own personal website, which could serve as an online resume/CV! Tutorial: - https://gitee.com/lundechen/static_website_with_go_hugo Corresponding videos (Video 1-5): - https://space.bilibili.com/472463946/channel/collectiondetail?sid=419963 ### GitHub Pull Request https://space.bilibili.com/472463946/channel/collectiondetail?sid=917876 ### Reveal.js Personal CV hosted on GitHub Pages Video: - https://www.bilibili.com/video/BV1os421M7WN/ Tutorial: - https://gitee.com/lundechen/revealjs_cv ### Gallery
- https://maximemet.github.io - https://1456382895.github.io - https://gongleiaz.github.io - https://36884522.github.io - https://ggzzzmmm.github.io - https://noemie0105.github.io - https://pagemlgohugo.github.io - https://whq19991.github.io - https://pageshuzyx.github.io - https://xinxu11.github.io - https://walnut8pro.github.io - https://whqwhqwhq.github.io - https://erasme153.github.io By default your website/CV will be included in the gallery. If you don't want that, tell me in private chat.
## Week 8 - Week 10 From RNN to Attention to Transformer to LLM ## Your Final Project ### Expectations for Your `ML App`: - **Engagement**: Your app should be interesting and engaging. - **Originality**: While you can draw inspiration from existing projects online (ensure to include references in your documentation/slides), your project should demonstrate noticeable originality (aim for at least a 50% unique contribution). - **Technical Overview**: Provide a brief overview of the technical aspects of your app, including but not limited to: - Recommended Python packages (e.g., `tensorflow`, `streamlit`, `transformers`, `torch`) - Whether to use a cloud VM - Integration with AI services (Google, Tencent, Baidu, Alibaba, etc.) - GPU usage considerations - The role of GitHub Actions in your project - **Non-Technical Considerations**: Equally important are the non-technical aspects, such as: - Commercialization strategies - User promotion methods (advertising, etc.) - Social impact considerations - Potential ethical issues - **Visual Aids**: Include drawings (hand-drawn or created using online/software tools like `draw.io`, `figma`, etc.) to help illustrate your ideas. - **Documentation Requirements**: - If submitting a document, it should be at least 5 pages long. - If creating slides (e.g., using reveal.js), they should consist of at least 10 pages. By default, your ML app will be included in the gallery. If you prefer not to be included, please inform me in a private chat. ### Project Presentation: - **Due Date**: Week 10 ### Group Formation: - ## Notes :100: - **Project Scope**: Conceive and develop a Machine Learning/Data Engineering/Artificial Intelligence Web/Desktop App. - **Team Structure**: Each group should consist of two students. - **Deliverables**: Submit your project in one of the following formats: docx, pdf, or as a reveal.js/Hugo website. - **GPU Access**: You may need access to a GPU server. UTSEUS has one available; let me know if you need assistance connecting to it. ## Misc
### Prerequisites for Taking This Course - Basic exposure to machine learning is required. - Knowledge of Python is preferred, or a willingness to learn. ### Asking Questions :question: #### 使用 **[Gitee Issue](https://gitee.com/lundechen/data_engineering/issues)** 提问 You should primarily ask questions via **[Gitee Issue](https://gitee.com/lundechen/data_engineering/issues)**. Here’s how: - [Video Guide](https://www.bilibili.com/video/BV1364y1h7sb/) #### Principle Follow this principle when asking questions: > **Google/AI First, Peers Second, Profs Last.** You are encouraged to use **[Gitee Issue](https://gitee.com/lundechen/data_engineering/issues)** for questions. As a secondary option, you may ask in the WeChat group, but this is less preferred. > Why Gitee/GitHub Issue? It is more **professional** and effective. Questions in Gitee Issue and the WeChat group will be answered selectively. Questions will not be answered if: - They can be easily resolved with a Google/AI search. - They are outside the course scope. - They are asked well in advance of the course progress. - Professors deem them uninteresting for discussion. #### Regarding Private WeChat Chats: - **Questions in private WeChat chats will NOT be answered.** - **Messages sent after 21:00 in private chats are discouraged.** #### Office Visits Office visits are not welcome unless you make an appointment at least one day in advance.