配置LLM运行环境时遇到的坑

来自网友在路上 197897提问提问时间：2023-11-01 14:20:39阅读次数： 97

最佳答案问答题库978位专家为你答疑解惑

1. bitsandbytes 遇到CUDA Setup failed despite GPU being available.

使用conda 管理环境时加载大模型会遇到bitsandbytes无法识别cuda的情况：
此处windows系统：

pip install bitsandbytes-windows

linux 系统：
将bitsandbytes版本降低至0.39.0

pip install bitsandbytes==0.39.0

2. 在安装deepspeed库时报错， can not find CUDA_HOME,

由于使用conda 管理环境时安装pytorch会安装一系列cuda基础包，体现为对应环境的/anaconda/env/xxx/lib/libcudart11…so。但是deepspeed不识别这部分，需要重新安装nvidia-cudatookkit才行，具体的版本号需要和你的虚拟环境使用的相同，例如都应该为cuda113.
例如cuda113：

wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda_11.3.1_465.19.01_linux.run
sudo sh cuda_11.3.1_465.19.01_linux.run

进入界面后安装时只选择粗大toolkit，不安装驱动等其它包
然后设置CUDA_HOME变量

export CUDA_HOME=/usr/loca/cuda-xxx

或者直接写入你的bash文件里面也可以
然后再次安装deepspeed还是会报错，错误信息大概是 “file does not belong to current user”，因为上述方法安装cuda是在root权限下安装，但是我们在自己的机器上跑大多用非root账号。此时可以将该部分cuda文件的权限更改掉，

sudo chown xxxx /usr/loca/cuda-xxx -R

这样就可以正常安装了

3. UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES

猜测是安装了上述两个cudatoolkit导致的问题，之前是没有的
可以通过如下方法解决

sudo apt-add-repository multiverse
sudo apt updatesudo apt install nvidia-modprobe

4. Error no file named pytorch_model.bin, tf_model.h5, model.ckpt

开始加载大模型时找不到模型权重
我试的qwen，安装他使用的保存权重的库就可以

pip install safetensors

查看全文

99%的人还看了

相似问题

猜你感兴趣

版权申明

本文"配置LLM运行环境时遇到的坑"：http://eshow365.cn/6-29395-0.html 内容来自互联网，请自行判断内容的正确性。如有侵权请联系我们，立即删除！

晴海小常识分享

晴海小常识分享

配置LLM运行环境时遇到的坑

最佳答案问答题库978位专家为你答疑解惑

1. bitsandbytes 遇到CUDA Setup failed despite GPU being available.

2. 在安装deepspeed库时报错， can not find CUDA_HOME,

3. UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES

4. Error no file named pytorch_model.bin, tf_model.h5, model.ckpt

99%的人还看了

相似问题

猜你感兴趣

版权申明

推荐回答

晴海小常识分享

晴海小常识分享

配置LLM运行环境时遇到的坑

最佳答案 问答题库978位专家为你答疑解惑

1. bitsandbytes 遇到CUDA Setup failed despite GPU being available.

2. 在安装deepspeed库时报错， can not find CUDA_HOME,

3. UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES

4. Error no file named pytorch_model.bin, tf_model.h5, model.ckpt

99%的人还看了

相似问题

猜你感兴趣

版权申明

推荐回答

最佳答案问答题库978位专家为你答疑解惑