1.5K Star 5.7K Fork 2.1K

GVPMindSpore / mindspore

2021-09-25 16:31
6574854 jjfeing 1584438580 jjfeing

MindSpore 1.5.0

MindSpore 1.5.0 Release Notes

Major Features and Improvements

NewModels

  • [STABLE] Add CV model on Ascend: Fast-SCNN
  • [BETA] Add CV models on Ascend: midas_V2, attgan, FairMOT, CenterNet_resnet101, SEResNext, YOLOV3-tiny, RetinaFace
  • [STABLE] Add CV models on GPU: ssd_mobilenetv1_fpn, shufflenetv1, tinyDarkNet, CNN-CTC, unet++, DeepText, SqueezeNet
  • [STABLE] Add NLP models on GPU: GRU, GNMT2, Bert-Squad
  • [STABLE] Add recommand models on GPU: NCF
  • [BETA] Add CV models on GPU: FaceAttribute, FaceDetection, FaceRecongnition SENet,
  • [BETA] Add Audio models on GPU: DeepSpeech2
  • [STABLE]model_zoo has been seperated to an individual repositorymodels

FrontEnd

  • [STABLE] Supportwhile andbreak,continue statements of training network inGRAPH_MODE.
  • [BETA] Support export MindIR file after model training in cloud side and evaluate in edge side by import the MindIR file.
  • [STABLE] Support forward mode auto-diff interface Jvp(Jacobian-Vector-Product).
  • [STABLE] Support backward mode auto-diff interface Vjp(Vector-Jacobian-Product).

Auto Parallel

  • [STABLE] Support distributed pipeline inference.
  • [STABLE] Add implementation of the sparse attention and its distributed operator.
  • [STABLE] Add implementations of distributed operator of Conv2d/Conv2dTranspose/Conv2dBackpropInput/Maxpool/Avgpool/Batchnorm/Gatherd.
  • [STABLE] Support configuring the dataset strategy on distributed training and inference mode.
  • [STABLE] Add high level API of the Transformer module.

Executor

  • [STABLE] Support AlltoAll operator.
  • [STABLE] CPU operator (Adam) performance optimization increased by 50%.
  • [BETA] Support Adam offload feature, reduce the static memory usage of Pangu large model by 50%.
  • [STABLE] MindSpore Ascend backend supports configuration operator generation and loading cache path.
  • [STABLE] MindSpore Ascend backend supports lazy build in PyNaitve mode and compilation performance improved by 10 times.
  • [STABLE] The function or Cell decorated by ms_function supports gradient calculation in PyNative mode.
  • [STABLE] The outermost network supports parameters of non tensor type in PyNative mode.

DataSet

  • [BETA] Add a new method for class Model to support auto data preprocessing in scenario of Ascend 310 inference.
  • [STABLE] Add a new drawing tool to visualize detection/segmentation datasets.
  • [STABLE] Support a new tensor operaiton named ConvertColor to support color space transform of images.
  • [STABLE] Enhance the following tensor operations to handle multiple columns simultaneously: RandomCrop, RandomHorizontalFlip, RandomResize, RandomResizedCrop, RandomVerticalFlip.
  • [STABLE] Support electromagnetic simulation dataset loading and data augmentation.
  • [STABLE] Optimze the error logs of Dataset to make them more friendly to users.

Federated Learning

Running Data Recorder

  • [STABLE] RDR saves collected data files within directories named by Rank ID on distributed training on Ascend, GPU and CPU.

GraphKernel Fusion

API Change

Backwards Incompatible Change

Python API
New Recomputation Configuration for AutoParallel and SemiAutoParallel Scenarios

Configuring the recomputation of the communication operations generated by the model parallel and optimizer parallel to save the memory on the
devices. Users can pass mp_comm_recompute and parallel_optimizer_comm_recompute to enable the recomputation of the communication operations.

Bug fixes

FrontEnd

Executor

Dataset

MindSpore Lite

Major Features and Improvements

Converter and runtime

  1. Optimize TDNN-like streaming model by reusing the result of last inference.
  2. Support dynamic filter Convolution.
  3. Support serializing float32 weight into float16 weight for reducing size of model file.
  4. Provide unified runtime API for developer reusing their code between cloud side and end side.
  5. Now developer can configure build-in pass as custom passes.
  6. Now user can specify format and shape of model inputs while converting model.
  7. Support multiple devices inference, includeing CPU, NPU, GPU. User can set devices in mindspore::Context.
  8. Support mixed precision inference. User can set inference precision by LoadConfig API.
  9. Support custom operator registration and enable inference on third-party hardware.

ARM backend optimization

  1. Support the nchw data format of some Operators, such as Conv, InstanceNorm, etc. The performance of some models convertered from onnx and caffe is greatly improved.
  2. Fix bugs of memory leak on NPU.

Post quantization

  1. Weight quantization supports mixed bit quantization.
  2. Full quantization supports data pre-processing.
  3. Adjust the quantization parameters from the command line to the configuration file.

Training on Device

  1. Unify lite external api with MindSpore.
  2. Implement static memory allocator and common workspace for TOD,save memory 10-20%.
  3. Provide getgradients and setgradients interface,get and set optimizer params interfaces to support MOE Model.
  4. Support user specified output node when export IOD Model.
  5. Support more text networks (tinybert,albert) and operators.

Codegen

  1. Support kernel register for custom op. Third-party hardware like NNIE can be accessed through it.

API Change

API Incompatible Change

C++ API
Last committed message: !23912 add const constraint
2021-08-05 20:10
6574854 jjfeing 1584438580 jjfeing

MindSpore 1.4.0

MindSpore 1.4.0 Release Notes

Major Features and Improvements

NewModels

FrontEnd

Auto Parallel

  • Add distributed operators: Conv2D/Conv2DTranspose/Conv2DBackpropInput/MaxPool/AvgPool/BatchNorm/GatherD
  • Support to configure shard strategy for dataset

Executor

DataSet

FederatedLearning

Running Data Recorder

GraphKernel Fusion

Profiler

  • [STABLE] Support MS_DIAGNOSTIC_DATA_PATH for profiler feature.(Ascend/GPU)

Dump

  • [STABLE] Support MS_DIAGNOSTIC_DATA_PATH for dump feature.(Ascend/GPU/CPU)

API Change

Backwards Incompatible Change

Python API
Command Line Interface
Dump Config

Previously, we need to set the dump path in dump config file. To make the dump feature easier to use on cloud, we support new environment parameter MS_DIAGNOSTIC_DATA_PATH. View the New Dump Tutorial.

1.3.0 1.4.0
path is a mandatory field. path field is optional. If path field is not provided or is empty string, MS_DIAGNOSTIC_DATA_PATH should be set in environment.

Bug fixes

FrontEnd

Executor

Dataset

MindSpore Lite

Major Features and Improvements

Converter and runtime

x86 backend optimization

ARM backend optimization

Cuda backend optimization

OpenCL backend

Post quantization

Training on Device

Codegen

API Change

API Incompatible Change

C++ API

New features

Java API

Bug fixes

Deprecations

Contributors

Thanks goes to these wonderful people:

Adel, AGroupofProbiotocs, anthonyaje, anzhengqi, askmiao, baihuawei, baiyangfan, bai-yangfan, bingyaweng, BowenK, buxue, caifubi, CaoJian, caojian05, caozhou, Cathy, changzherui, chenbo116, chenfei, chengxianbin, chenhaozhe, chenjianping, chenzomi, chenzupeng, chujinjin, cj, cjh9368, Corleone, damon0626, danish, Danish, davidmc, dayschan, doitH, dong-li001, eric, Eric, fary86, fuzhiye, Gaoxiong, GAO_HYP_XYJ, gengdongjie, Gogery, gongdaguo, gray0v0, gukecai, guoqi, gzhcv, hangq, hanhuifeng2020, Harshvardhan, He, heleiwang, hexia, Hoai, HuangBingjian, huangdongrun, huanghui, huangxinjing, huqi, huzhifeng, hwjiaorui, Islam Amin, Jesse, , Jiabin Liu, jianghui58, jiangzhiwen, Jiaqi, jin-xiulang, jinyaohui, jjfeing, John, Jonathan, jonyguo, JulyAi, jzg, kai00, kingfo, kingxian, kpy, kswang, laiyongqiang, leonwanghui, Li, liangchenghui, liangzelang, lichen_101010, lichenever, lihongkang, lilei, limingqi107, ling, linqingke, Lin Xh, liubuyu, liuwenhao4, liuxiao78, liuxiao93, liuyang_655, liuzhongkai, Lixia, lixian, liyanliu, liyong, lizhenyu, luopengting, luoyang, lvchangquan, lvliang, lz, mahdi, Mahdi, maning202007, Margaret_wangrui, mayang, mengyuanli, Ming_blue, nhussain, ougongchang, panfengfeng, panyifeng, Payne, Peilin, peixu_ren, Pengyongrong, qianlong, qianjiahong, r1chardf1d0, riemann_penn, rmdyh, Sheng, shenwei41, simson, Simson, Su, sunsuodong, tao_yunhao, tinazhang, VectorSL, , Wan, wandongdong, wangdongxu, wangmin, wangnan39@huawei.com, wangyue01, wangzhe, wanyiming, Wei, wenchunjiang, wilfChen, WilliamLian, wsc, wudenggang, wukesong, wuweikang, wuxuejian, Xiao Tianci, Xiaoda, xiefangqi, xinyunfan, xuanyue, xulei2020, Xun, xuyongfei, yanghaitao, yanghaitao1, yanghaoran, YangLuo, yangruoqi713, yankai, yanzhenxiang2020, yao_yf, yepei6, yeyunpeng, Yi, yoni, yoonlee666, yuchaojie, yujianfeng, yuximiao, zengzitao, Zhang, zhanghaibo5@huawei.com, zhanghuiyao, zhanghui_china, zhangxinfeng3, zhangyihui, zhangz0911gm, zhanke, zhanyuan, zhaodezan, zhaojichen, zhaoting, zhaozhenlong, zhengjun10, Zhenglong Li, zhiqwang, zhoufeng, zhousiyi, zhouyaqiang, zhouyifengCode, Zichun, Zirui, Ziyan, zjun, ZPaC, wangfengwfwf, zymaa, gerayking.

Contributions of any kind are welcome!

Last committed message: !21304 fix parse aicpu file error
2021-07-15 09:37
liucunwei

MindSpore 1.3.0

MindSpore 1.3.0 Release Notes

Major Features and Improvements

NewModels

  • [STABLE] Add CV models on Ascend: CPM, FCN8s, SSD-ResNet50-FPN, EAST, AdvancedEast.
  • [STABLE] Add NLP models on Ascend: DGU, TextCNN, SentimentNet(LSTM).
  • [STABLE] Add CV models on GPU: Faster-RCNN, FCN8s, CycleGAN, AdvancedEast.
  • [BETA] Add CV models on Ascend: CycleGAN, PoseNet, SimCLR.
  • [BETA] Add NLP models on Ascend: DGU, EmoTect, Senta, KT-Net.
  • [BETA] Add NLP models on GPU: DGU, EmoTect.
  • [BETA] Add EPP-MVSNet: a novel deep learning network for 3D reconstruction from multi-view stereo, which has won the first place in Tanks & Temples leaderboard(until April 1, 2021)(GPU).

FrontEnd

  • [STABLE] The default running mode of MindSpore is changed to Graph mode.
  • [STABLE] Support interface run_check to check whether MindSpore is working properly or not.
  • [STABLE] Support saving custom information in the checkpoint file.
  • [STABLE] Normal class adds mean parameter.
  • [STABLE] Support export YOLOv3-DarkNet53 and YOLOv4 ONNX model.
  • [STABLE] Support 40+ operator export ONNX model.
  • [STABLE] The Metric module supports set_indexes to select the inputs of update in the specified order.
  • [STABLE] Switch _Loss to an external API LossBase as the base class of losses.

Auto Parallel

  • [STABLE] Add distributed operators: Select/GatherNd/ScatterUpdate/TopK.
  • [STABLE] Support basic pipeline parallelism.
  • [STABLE] Optimize sharding strategy setting of Gather.
  • [STABLE] Optimize mix precision and shared parameter scenarios.
  • [STABLE] Optimize distributed prediction scenarios.

Executor

  • [STABLE] Support unified runtime in GPU and CPU backend.
  • [STABLE] MindSpore GPU support CUDA11 with cuDNN8.
  • [STABLE] MindSpore GPU inference performance optimization by integrating TensorRT.
  • [STABLE] MindSpore built on one Linux distribution can now be used on multiple Linux distributions with the same CPU architecture (e.g. EulerOS, Ubuntu, CentOS).
  • [STABLE] MindSpore now supports Ascend310 and Ascend910 environments with one single wheel package and provides an alternate binary package for Ascend310 specifically.
  • [STABLE] MindSpore Ascend support group convolution.

DataSet

  • [STABLE] Support caching over MindRecord dataset.
  • [STABLE] Support new shuffle mode for MindRecord dataset.
  • [STABLE] Support a cropper tool for MindSpore Lite to allow the user to customize MindData binary file according to their script.
  • [STABLE] Support share memory mechanism to optimize the multi-processing efficiency of GeneratorDataset/Map/Batch.
  • [STABLE] Add features for the GNN dataset to support molecular dynamics simulation scenarios.

FederatedLearning

  • [STABLE] Support Cross-device federated learning framework.
  • [STABLE] Support FL-Server distributed networking including TCP and HTTP communication.
  • [STABLE] Support FL-Server distributed federated aggregation,support autoscaling and fault tolerance.
  • [STABLE] Develop FL-Client framework.
  • [STABLE] Supports local differential privacy algorithms.
  • [STABLE] MPC-based security aggregation algorithm.
  • [STABLE] MindSpore Lite Device-side Inference & Training Interconnection with FL-Client.

Running Data Recorder

  • [STABLE] Provide records of multi-stage computational graphs, memory allocation information and graph execution order when a "Launch kernel failed" occurs. (CPU)

GraphKernel Fusion

  • [STABLE] Add options to control the optimization level.
  • [STABLE] Enhance the generalization ability on GPU. GraphKernel is enabled by default in 40+ networks which cover the field of NLP, CV, Recommender, NAS and Audio. The result shows their throughput is significantly improved, and you are Recommended enabling GraphKernel in your network.

Debug

  • [STABLE] Unified dump function.

API Change

Backwards Incompatible Change

Python API
mindspore.dataset.Dataset.device_que interface removes unused parameter prefetch_size(!18973:Delete unused param in device_que)

Previously, we have a parameter prefetch_size in device_que to define the prefetch number of records ahead of the user's request. But indeed this parameter is never used which means it is an ineffective parameter. Therefore, we remove this parameter in 1.3.0 and users can set this configuration by mindspore.dataset.config.set_prefetch_size.

1.2.1 1.3.0
device_que(prefetch_size=None, send_epoch_end=True, create_data_info_queue=False)
device_que(send_epoch_end=True, create_data_info_queue=False)
mindspore.nn.optim.thor interface changes to lowercase thor and adds two parameters enable_clip_grad and frequency(!17212:clearn codechekc for thor)

The parameter enable_clip_grad is used for gradient clipping and another parameter frequency is used to control the update interval of second order information matrix.

1.2.1 1.3.0
THOR(net, learning_rate, damping, momentum, weight_decay=0.0, loss_scale=1.0, batch_size=32,
     use_nesterov=False, decay_filter=lambda x: x.name not in [], split_indices=None)
thor(net, learning_rate, damping, momentum, weight_decay=0.0, loss_scale=1.0, batch_size=32,
     use_nesterov=False, decay_filter=lambda x: x.name not in [], split_indices=None, enable_clip_grad=False,
     frequency=100)
Dump Config

Previously, we could only dump tensor data for one or all steps. To make the dump feature easier to use, we changed the dump configuration format and dump structure. View the New Dump Tutorial.

1.2.1 1.3.0
iteration is an int. iteration is a string.
op_debug_mode is in async_dump_settings field. op_debug_mode is in common_dump_settings field. async_dump_settings is removed.

Bug fixes

FrontEnd

Executor

Dataset

MindSpore Lite

Major Features and Improvements

Converter and runtime

  1. Support Caffe model running on Hi3516D.
  2. Support delegate mechanism to run your models(part or whole) on user specified executor.
  3. Support control flow models.
  4. Support cross-compiling for iOS, so that we can inference models on iOS devices.

x86 backend optimization

  1. Optimize kernels for x86 using Advanced Vector Extensions(AVX).

ARM backend optimization

  1. Optimize fp16 kernels.
  2. Support arm32 fp16 instruction acceleration on ARMv8.2.

Cuda backend optimization

  1. Support NV GPU backend base on delegate mechanism(use TensorRT as delegate).

OpenCL backend

  1. Optimize the strategy of workgroup and blocksize to improve performance.
  2. Support OpenCL dynamic infershape.
  3. Support INT32 type ops.

Post quantization

  1. Support fp32 training model converts to quantization training model.

Training on Device

  1. Support fp32 training model export to quantization model after training process end.
  2. Unify APIs and output package name of training and inference.
  3. Simplify implementation of Train Session.
  4. Optimize train and infer compile, reduce libmindspore-lite-train.so memory.
  5. Training memory optimization: memory reduce 10-50% compare with r1.2.
  6. Training peformance optimization: for 1*1 special input shape Cov2DGradInput and SparseSoftmaxCrossEntropyWithLogits operator optimization, improved 10%-20%.
  7. Support more networks(transformer, albert).

Codegen

  1. Support deployment on HarmonyOS for device

API Change

API Incompatible Change

C++ API
Unify LiteSession and TrainSession, Merge LiteSession And TrainSession.(!17356:[MS][LITE] Merge TrainSession into LiteSession)

Previously, Training on Device use TrainSession while Inference on Device use LiteSession. To simplify implementation, we move TrainSession functions to LiteSession as virtual function. and move APIs previous defined in train_session.h to lite_session.h.

class MS_API LiteSession {
...
static LiteSession *CreateTrainSession(const std::string &filename, const lite::Context *context,
                                         bool train_mode = false, const lite::TrainCfg *cfg = nullptr);
 static LiteSession *CreateTransferSession(const std::string &filename_backbone, const std::string &filename_head,
                                            const lite::Context *context, bool train_mode = false,
                                            const lite::TrainCfg *cfg = nullptr);
virtual int Train() { return mindspore::lite::RET_ERROR; }
virtual int Eval() { return mindspore::lite::RET_OK; }
virtual int SetupVirtualBatch(int virtual_batch_multiplier, float lr = -1.0f, float momentum = -1.0f) {
    return mindspore::lite::RET_ERROR;
  }
virtual std::vector<tensor::MSTensor *> GetPredictions() const {
    std::vector<tensor::MSTensor *> outputs;
    return outputs;
 }
...
Add Export API for Training on device, obsolete SaveToFile API.(!17356:[MS][LITE] Merge TrainSession into LiteSession)

Previously, Training on Device uses SaveToFile API to save the training model to file. Export API was added in this release to support more format, more model type(train or interface part of the model), and save weight quant model of train.

virtual int Export(const std::string &file_name, lite::ModelType model_type = lite::MT_TRAIN,
                     lite::QuantizationType quant_type = lite::QT_DEFAULT, lite::FormatType = lite::FT_FLATBUFFERS) {
    return mindspore::lite::RET_ERROR;
 }
Add GetFeatureMaps and UpdateFeatureMaps interface for Training on device.(!18344:[MS][LITE] add mindfl lite train code)

When Training on the device, we may need to update the model featuremap and get model featuremap.particularly in MindSpore Federated Scenario.

virtual std::vector<tensor::MSTensor *> GetFeatureMaps() const {
    std::vector<tensor::MSTensor *> features;
    return features;
  }
  virtual int UpdateFeatureMaps(const std::vector<tensor::MSTensor *> &features) { return mindspore::lite::RET_ERROR; }

New features

Java API
new static method for creating LiteSession by MSConifg in LiteSession.class

Previously, if we want to create a LiteSession object, we need to call two APIs:

MSConfig config;
// config options ...
LiteSession liteSession = new LiteSession();
boolean ret = liteSession.init(config);
if (!ret) {
  // handle init LiteSession failed ...
}

now we can create a LiteSession object with new API just like:

MSConfig config;
// config options ...
LiteSession liteSession = createSession(config);
if (liteSession == null) {
  // handle create LiteSession failed ...
}
new static method for creating LiteSession byModelBuffer and MSConfig in LiteSession.class

Previously, if we want to inference a model, we need to call APIs like:

MSConfig config;
// config options ...
LiteSession liteSession = new LiteSession();
boolean initSessionRet = liteSession.init(config);
if (!initSessionRet) {
  // handle init LiteSession failed and return ...
}
Model model = new Model();
boolean loadModelRet = model.loadModel(modelMappedByteBuffer);
if (!loadModelRet) {
  // handle load model failed and return ...
}
boolean compileModelRet = liteSession.compileGraph(model);
if (!loadModelRet) {
  // handle compile model failed and return ...
}
model.free();
// liteSession is ready to inference model, call runGraph in LiteSession.class ...

now we can use new API just like:

MSConfig config;
// config options ...
LiteSession liteSession = createSession(modelMappedByteBuffer, config);
if (liteSession == null) {
  // handle init LiteSession failed and return ...
}
// liteSession is ready to inference model, call runGraph in LiteSession.class ...

New createSession method is an API that integrates four old APIs: LiteSession.init, Model.loadModel, LiteSession.compileGraph and model.free. It is simple and efficient as it reduces one modelBuffer copy operation.

new methods getFeaturesMap and updateFeatures for in LiteSession.class

Recently, we add a new C++ api in LiteSession class, Correspondingly we add a new java API in LiteSession.java.

public List<MSTensor> getFeaturesMap() {
         List<Long> ret = this.getFeaturesMap(this.sessionPtr);
                ArrayList<MSTensor> tensors = new ArrayList<MSTensor>();
                for (Long msTensorAddr : ret) {
                    MSTensor msTensor = new MSTensor(msTensorAddr);
                    tensors.add(msTensor);
                }
                return tensors;
   }
   public boolean updateFeatures(List<MSTensor> features) {
            long[] inputsArray = new long[features.size()];
            for (int i = 0; i < features.size(); i++) {
                inputsArray[i] = features.get(i).getMSTensorPtr();
            }
             return this.updateFeatures(this.sessionPtr, inputsArray);
   }
new methods export to replace saveToFile API in LiteSession.class

Recently, we add a new C++ api in LiteSession class, Correspondingly we add a new java API in LiteSession.java.

public boolean export(String modelFileName, int modelType, int quantizationType) {
        return this.export(this.sessionPtr, modelFileName, modelType, quantizationType);
    }
new train related API moved to LiteSession.class from TrainSession.class

Align with update of C++ api in LiteSession class, add new java API to LiteSession.java Correspondingly.

public class LiteSession {
...
public static LiteSession createTrainSession(String modelName, final MSConfig config, boolean trainMode){...}
public boolean train() {...}
public boolean eval() {...}
...

Bug fixes

  1. Fix the bug that the train session does not release memory cause of refcount bug.

Deprecations

Contributors

Thanks goes to these wonderful people:

Adel, AGroupofProbiotocs, anthonyaje, anzhengqi, askmiao, baihuawei, baiyangfan, bai-yangfan, bingyaweng, BowenK, buxue, caifubi, CaoJian, caojian05, caozhou, Cathy, changzherui, chenbo116, chenfei, chengxianbin, chenhaozhe, chenjianping, chenzomi, chenzupeng, chujinjin, cj, cjh9368, Corleone, damon0626, danish, Danish, davidmc, dayschan, doitH, dong-li001, eric, Eric, fary86, fuzhiye, Gaoxiong, GAO_HYP_XYJ, gengdongjie, Gogery, gongdaguo, gray0v0, gukecai, guoqi, gzhcv, hangq, hanhuifeng2020, Harshvardhan, He, heleiwang, hexia, Hoai, HuangBingjian, huangdongrun, huanghui, huangxinjing, huqi, huzhifeng, hwjiaorui, Islam Amin, Jesse, , Jiabin Liu, jianghui58, jiangzhiwen, Jiaqi, jin-xiulang, jinyaohui, jjfeing, John, Jonathan, jonyguo, JulyAi, jzg, kai00, kingfo, kingxian, kpy, kswang, laiyongqiang, leonwanghui, Li, liangchenghui, liangzelang, lichen_101010, lichenever, lihongkang, lilei, limingqi107, ling, linqingke, Lin Xh, liubuyu, liuwenhao4, liuxiao78, liuxiao93, liuyang_655, liuzhongkai, Lixia, lixian, liyanliu, liyong, lizhenyu, luopengting, luoyang, lvchangquan, lvliang, lz, mahdi, Mahdi, maning202007, Margaret_wangrui, mayang, mengyuanli, Ming_blue, nhussain, ougongchang, panfengfeng, panyifeng, Payne, Peilin, peixu_ren, Pengyongrong, qianlong, qianjiahong, r1chardf1d0, riemann_penn, rmdyh, Sheng, shenwei41, simson, Simson, Su, sunsuodong, tao_yunhao, tinazhang, VectorSL, , Wan, wandongdong, wangdongxu, wangmin, wangnan39@huawei.com, wangyue01, wangzhe, wanyiming, Wei, wenchunjiang, wilfChen, WilliamLian, wsc, wudenggang, wukesong, wuweikang, wuxuejian, Xiao Tianci, Xiaoda, xiefangqi, xinyunfan, xuanyue, xulei2020, Xun, xuyongfei, yanghaitao, yanghaitao1, yanghaoran, YangLuo, yangruoqi713, yankai, yanzhenxiang2020, yao_yf, yepei6, yeyunpeng, Yi, yoni, yoonlee666, yuchaojie, yujianfeng, yuximiao, zengzitao, Zhang, zhanghaibo5@huawei.com, zhanghuiyao, zhanghui_china, zhangxinfeng3, zhangyihui, zhangz0911gm, zhanke, zhanyuan, zhaodezan, zhaojichen, zhaoting, zhaozhenlong, zhengjun10, Zhenglong Li, zhiqwang, zhoufeng, zhousiyi, zhouyaqiang, zhouyifengCode, Zichun, Zirui, Ziyan, zjun, ZPaC, wangfengwfwf, zymaa, gerayking.

Contributions of any kind are welcome!

2021-07-02 10:28
lilongfei

MindSpore 1.2.1

MindSpore 1.2.1 Release Notes

Major Features and Improvements

FrontEnd

  • [STABLE] Add MaskedSelect aicpu operation.(Ascend)

Auto Parallel

  • [STABLE] Support distributed checkpoint loading.(Ascend/GPU)
Last committed message: !18004 adaptation run_package 0608
2021-04-17 16:04
lilongfei

MindSpore 1.2.0

MindSpore 1.2.0 Release Notes

Major Features and Improvements

NewModels

  • [STABLE] Add CV models on Ascend: 3D Unet, Unet++, SSD-Resnet50-fpn, SSD-VGG16, crnn_seq2seq_ocr for BSI, CTPN, resnet18, DPN
  • [STABLE] Add CV models on GPU: Faster-RCNN
  • [STABLE] Add NLP models on Ascend: NAML, Fasttext, GRU, LSTM
  • [BETA] Add TPRR: Thinking Path Re-Ranker, an original ranked-base framework for Multi-Hop Question Answering which has won the first place in HotpotQA leaderboard.(Ascend)

FrontEnd

  • [STABLE] Support side effects expression to ensure that the perform order of user's semantics is correct.(Ascend/GPU/CPU)
  • [STABLE] Support calculating the gradient for network that contain non-Tensor input parameters(int, float, bool, mstype,int, mstype.float, mstype.uint, mstype.bool_, tuple, list, dict).(Ascend/GPU/CPU)
  • [STABLE] Support the inverse of a bool Tensor.(Ascend/GPU/CPU)
  • [STABLE] Uniform the interface isinstance.(Ascend/GPU/CPU)
  • [STABLE] Support negative indexes.(Ascend/GPU/CPU)
  • [STABLE] Support 110+ Numpy-like interfaces in mindspore.numpy.(Ascend/GPU/CPU)
  • [STABLE] Support export/load mindir model with a size greater than 2 GB.
  • [STABLE] The optimizer supports gradient centralization.(Ascend)
  • [STABLE] Support support auc metric, rou metric, bleu score metric, confusion matrix metric, cosine similarity metric, dice metric, hausdorff distance metric, occlusion sensitivity metric, perplexity metric, mean surface distance metric, root mean surface distance metric.
  • [STABLE] Support use EmbeddingLookup with cache.(Ascend)

Auto Parallel

  • [STABLE] Support AllGather and ReduceScatter fusion.(Ascend)
  • [STABLE] Support gradient accumulation feature in auto parallel mode.(Ascend/GPU)
  • [STABLE] Support running parallel optimizer with gradient accumulation.(Ascend)
  • [STABLE] Add the configuration of communication operators' fusion.(Ascend)

Executor

  • [STABLE] Support inference with Nvidia GPU.
  • [STABLE] Support data parallelism in PyNative mode.(Ascend/GPU)
  • [STABLE] Optimize LSTM inference memory consumption in Graph mode with CPU.

Sponge

  • [STABLE] Add SPONGE modules for molecular dynamics simulation, including Bond, Angle, Dihedral, Non Bond 14, NeighborList, Particle Mesh Ewald, Langevin MD and LIUJIAN MD.(GPU)

DataSet

  • [STABLE] If the libnuma library is installed in the environment, you can run export DATASET_ENABLE_NUMA=True to configure NUMA binding. In multi-card training scenarios, the training data processing speed can be improved, thereby improving the network training efficiency.
  • [STABLE] Unify API Tensor structure of Training/Inference interfaces in C++ SDK.
  • [STABLE] Optimize duplicated Decode in data preprocess using cache, improve preprocess efficiency.
  • [STABLE] Support eager mode to run data augmentation in Python & C++.
  • [STABLE] Support more data augmentation operators(e.g. Affine, Perspective) in MindSpore-Lite.
  • [STABLE] Support light pipeline to process MindData in MindSpore-Lite training.
  • [STABLE] Support more data preprossing operators based on DVPP hardware module and can be used on on Ascend310 platform.
  • [STABLE] Support copy-free property for data in Ascend310 inference process scenarios.

Running Data Recorder

  • [STABLE] Support running data recorder (RDR) for exception demarcation.
  • [STABLE] Provide records of multi-stage computational graphs, memory allocation information, graph execution order, stream execution order and task debug information when a "run task error" or "distribute task failed" occurs. (Ascend)
  • [STABLE] Provide records of multi-stage computational graphs, memory allocation information and graph execution order when a "SyncStream error" occurs. (GPU)

3D Feature

  • [STABLE] Support 3D ops: Conv3D, Conv3DBackpropInput, Conv3DBackpropFilter, Conv3DTranspose, BiasAdd, BiasAddGrad, PReLU, Transpose, Reshape, transdata, StrideSlice, MaxPool3D, MaxPool3DGrad, BinaryCrossEntropy, SigmoidCrossEntropyWithLogits, SigmoidCrossEntropyWithLogitsGrad, SoftmaxCrossEntropyWithLogits, SigmoidCrossEntropyWithLogits, SigmoidCrossEntropyWithLogitsGrad, BatchNorm3d, BatchNorm3dGrad, Dropout3d.
  • [STABLE] Support RMSELoss loss function, MAELoss loss function, FocalLoss loss function, DiceLoss binary loss function, and MultiClassDiceLoss multi-type loss function for 2D/3D network.
  • [STABLE] Add optimizer: AdamApplyOne(3D), ApplyMomentum(3D), SGD(3D).

API Change

Backwards Incompatible Change

Python API
mindspore.numpy.array(), mindspore.numpy.asarray(), mindspore.numpy.asfarray(), mindspore.numpy.copy() now support GRAPH mode, but cannot accept numpy.ndarray as input arguments anymore(!12726:Add March Numpy interfaces to mindspore)

Previously, these interfaces can accept numpy.ndarray as arguments and convert numpy.ndarray to Tensor, but cannot be used in GRAPH mode.
However, currently MindSpore Parser cannot parse numpy.ndarray in JIT-graph. To support these interfaces in graph mode, we have to remove numpy.ndarray support. With that being said, users can still use Tensor to convert numpy.ndarray to tensors.

1.1.1 1.2.0
>>> import mindspore.numpy as mnp
>>> import numpy
>>>
>>> nd_array = numpy.array([1,2,3])
>>> tensor = mnp.asarray(nd_array) # this line cannot be parsed in GRAPH mode
>>> import mindspore.numpy as mnp
>>> import numpy
>>>
>>> tensor = mnp.asarray([1,2,3]) # this line can be parsed in GRAPH mode
mindspore.numpy interfaces remove support for keyword arguments out and where(!12726:Add March Numpy interfaces to mindspore)

Previously, we have incomplete support for keyword arguments out and where in mindspore.numpy interfaces, however, the out argument is only functional when where argument is also provided, and out cannot be used to pass reference to numpy functions. Therefore, we have removed these two arguments to avoid any confusion users may have. Their original functionality can be found in np.where

1.1.1 1.2.0
>>> import mindspore.numpy as np
>>>
>>> a = np.ones((3,3))
>>> b = np.ones((3,3))
>>> out = np.zeros((3,3))
>>> where = np.asarray([[True, False, True],[False, False, True],[True, True, True]])
>>> res = np.add(a, b, out=out, where=where) # `out` cannot be used as a reference, therefore it is misleading
>>> import mindspore.numpy as np
>>>
>>> a = np.ones((3,3))
>>> b = np.ones((3,3))
>>> out = np.zeros((3,3))
>>> where = np.asarray([[True, False, True],[False, False, True],[True, True, True]])
>>> res = np.add(a, b)
>>> out = np.where(where, x=res, y=out) # instead of np.add(a, b, out=out, where=where)
Turn ops.MakeRefKey into an internal interface (!12010:Convert MakeRefKey to an internal interface)

Previously MakeRefKey is an external interface that is not used, now make it an internal interface with the same usage. We do not recommend users to use this interface, and we will remove the relevant introduction of this interface from the official website.

ops.ApplyFtrl, ops.ApplyMomentum, ops.ApplyRMSProp, ops.ApplyCenteredRMSProp change the output on Ascend backend from multiple to a single. (!11895:unify mindir for different backend: the output num of optimizer ops, the backward of concat)

Previously the number of outputs of these operator is different on different backends. To unify their definition we change their output on Ascend backend from multiple to a single.

P.FusedBatchNorm, P.FusedBatchNormEx deleted (!12115:IR operators of GPU and CPU are unified as batchnorm)

The FusedBatchNorm and FusedBatchNormEx interface has been deleted. Please use the batchnorm operator to replace it.

MetaTensor deleted (!10325:modify MetaTensor and Tensor)

The MetaTensor interface has been deleted. The function of MetaTensor has been integrated into tensor.

ControlDepend is deleted, use Depend instead. The decorator @C.add_flags(has_effect=True) does not work. (!13793:remove control_depend from py file)

Previously, we used ControlDepend to control the execution order of multiple operators. In version 1.2.0, mindspore introduces the auto-monad side effects expression to ensure that the perform order of user's semantics is correct. Therefore, ControlDepend is deleted and Depend is recommended.

In most scenarios, if operators have IO side effects (such as print) or memory side effects (such as assign), they will be executed according to the user's semantics. In some scenarios, if the two operators A and B have no order dependency, and A must be executed before B, we recommend using Depend to specify their execution order. See the API documentation of the Depend operator for specific usage.

1.1.1 1.2.0
    In some side-effect scenarios, we need to ensure the execution order of operators.
    In order to ensure that operator A is executed before operator B, it is recommended
    to insert the Depend operator between operators A and B.

    Previously, the ControlDepend operator was used to control the execution order.
    Since the ControlDepend operator is deprecated from version 1.1, it is recommended
    to use the Depend operator instead. The replacement method is as follows::

        a = A(x)                --->        a = A(x)
        b = B(y)                --->        y = Depend(y, a)
        ControlDepend(a, b)     --->        b = B(y)
    In most scenarios, if operators have IO side effects or memory side effects,
    they will be executed according to the user's semantics. In some scenarios,
    if the two operators A and B have no order dependency, and A must be executed
    before B, we recommend using Depend to specify their execution order. The
    usage method is as follows::

        a = A(x)                --->        a = A(x)
        b = B(y)                --->        y = Depend(y, a)
                                --->        b = B(y)

After the introduction of the auto-monad side effect expression feature, the decorator @C.add_flags(has_effect=True) does not work. If the decorator is used in the script, please modify. Take the overflow identification operator (without side effects) as an example, the modification method is as follows:

1.1.1 1.2.0
@C.add_flags(has_effect=True)
def construct(self, *inputs):
    ...
    loss = self.network(*inputs)
    init = self.allo_status()
    self.clear_status(init)
    ...
def construct(self, *inputs):
    ...
    loss = self.network(*inputs)
    init = self.allo_status()
    init = F.depend(init, loss)
    clear_status = self.clear_status(init)
    ...
C++ API
C++ API support dual ABI now.(!12432:api support dual abi )

1.1.1 supports only the old ABI. Currently, both the new and the old are supported.

1.1.1 1.2.0
add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0)
add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0)  # old ABI are supported
add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=1)  # new ABI are supprrted, too
                                                   # write nothing, use new ABI as default
Context refactor.(!13515:cpp api modify)

The Context class is refactored. For details, see the API docs.

1.1.1 1.2.0
GlobalContext::SetGlobalDeviceTarget(kDeviceTypeAscend310);       // set device target is ascend310
GlobalContext::SetGlobalDeviceID(0);                              // set device id is 0
auto model_context = std::make_shared<ModelContext>();            // create a model context
ModelContext::SetInsertOpConfigPath(model_context, "./aipp.cfg")  // set aipp config file is ./aipp.cfg
auto model_context = std::make_shared<Context>();                 // create a model context
auto ascend310_info = std::make_shared<Ascend310DeviceInfo>();
model_context.MutableDeviceInfo().push_back(ascend310_info );     // set device target is ascend310
ascend310_info->SetDeviceID(0);                                   // set device id is 0
ascend310_info->SetInsertOpConfigPath("./aipp.cfg");              // set aipp config file is ./aipp.cfg
LoadModel interface changes.(!13515:cpp api modify)

LoadModel is renamed Load. No exception is thrown new but the return status should be checked.

1.1.1 1.2.0
try {
  auto graph = Serialization::LoadModel(model_file_path, kMindIR);
} catch (...) { ... }
Graph graph;
auto ret = Serialization::Load(model_file_path, kMindIR, &graph);
if (ret != kSuccess) { ... }
Model ctor changes.(!13515:cpp api modify)

Model uses a non-parameter ctor now, and arguments are passed in through Build.

1.1.1 1.2.0
Model net(net_cell, model_context);
auto ret = net.Build();
if (ret != kSuccess) { ... }
Model net;
auto ret = net.Build(net_cell, model_context);
if (ret != kSuccess) { ... }
MSTensor::CreateTensor returns a native pointer now.(!13515:cpp api modify)

MSTensor::CreateTensor and MSTensor::CreateRefTensor returns a native pointer now, need to be destroy by DestroyTensorPtr.

1.1.1 1.2.0
auto tensor = MSTensor::CreateTensor(xxx, xxx, ...);
auto name = tensor.Name();
auto tensor = MSTensor::CreateTensor(xxx, xxx, ...);
auto name = tensor->Name();
MSTensor::DestroyTensorPtr(tensor);

New features

Python API
  • Add SPONGE functions: mindspore.ops.operations.BondForceWithAtomEnergy, mindspore.ops.operations.AngleForceWithAtomEnergy, mindspore.ops.operations.DihedralForceWithAtomEnergy, mindspore.ops.operations.Dihedral14LJCFForceWithAtomEnergy, mindspore.ops.operations.LJForceWithPMEDirectForce, mindspore.ops.operations.PMEExcludedForce, mindspore.ops.operations.PMEReciprocalForce,mindspore.ops.operations.BondEnergy, mindspore.ops.operations.AngleEnergy,mindspore.ops.operations.DihedralEnergy, mindspore.ops.operations.Dihedral14LJEnergy, mindspore.ops.operations.Dihedral14CFEnergy,mindspore.ops.operations.LJEnergy, mindspore.ops.operations.PMEEnergy. All operators are supported in GPU.

Deprecations

Python API
nn.MatMul is now deprecated in favor of ops.matmul (!12817:numpy-native deprecate nn.MatMul)

ops.matmul follows the API of numpy.matmul as closely as possible. As a function interface, ops.matmul is applied without instantiation, as opposed to nn.MatMul, which should only be used as a class instance.

1.1.1 1.2.0
>>> import numpy as np
>>> from mindspore import Tensor, nn
>>>
>>> x = Tensor(np.ones((2, 3)).astype(onp.float32)
>>> y = Tensor(np.ones((3, 4)).astype(onp.float32)
>>> nn.MatMul()(x, y)
>>> import numpy as np
>>> from mindspore import Tensor, ops
>>>
>>> x = Tensor(np.ones((2, 3)).astype(onp.float32)
>>> y = Tensor(np.ones((3, 4)).astype(onp.float32)
>>> ops.matmul(x, y)

Bug fixes

FrontEnd

Executor

Dataset

MindSpore Lite

Major Features and Improvements

Converter and runtime

  1. Support TensorFlow model in Converter except aware-training model.
  2. Add fusion pattern for same horizontal operators in Converter.
  3. Support Jar in x86_64 system for integrating into server with Java backend conveniently.
  4. Provide unified runtime API for developer reusing their code between cloud side and end side.[BETA]
  5. Improve control-flow capabilities continually: Support GRU fusion in Converter; Support weight-quant for control-flow model; Support control-flow model inference with half precision; Support nested control-flow model.[BETA]

ARM backend optimization

  1. Add NLP dependent float16 operators(like lstm) to enhance inference performance.
  2. Optimize operators: lstm, gru, depthwise.
  3. Add 6 NPU operators(like FullConnection), and fix some bugs about buildIR failed.

OpenCL backend

  1. Add new ops:add 10+ ops,total 72 ops;
  2. Performance optimization:by memory layout optimize,block tiling,Performance improved by 30% compared to version 1.1 at Adreno GPU.
  3. Initialization time optimization:initialization time improve 100% vs MSLITE Version1.1 by store kernel cache as binary.
  4. Support Java call on Mali or Adreno GPU.

Post quantization

  1. Support quantization of gather and lstm ops.
  2. Support quantizatizing TF Lite models with sub-graph node.
  3. Add quantiztion strategy to decide quantize ops or not,less accuracy loss and higher compression rate.

Training on Device

  1. Virtual batching, use mini-batch to minic large batch in theorical with few RAM consumption.
  2. Converter unify, do not compile tod and iod converter separately.
  3. Performance optimization to BWD ops.
  4. TrainLoop with Off-The-Shelf Functionality blocks, like LR scheduler, Loss Monitor, Ckpt Saver, Accuracy Monitor.
  5. Integration of code with Minddata lite.
  6. Support more networks (googlenet, densenet, shufflenetv2, nin, vgg) and operators.

Codegen

  1. Support 79 ops for the ARM platform and all CMSIS ops for Arm Cortex-M Series.
  2. Multiplatform support, including Android, IoT Devices.
  3. Support offline model weight preprocessing while compiling.
  4. Support offline memory reuse computing for minimum runtime buffer size.

API Change

API Incompatible Change

C++ API
Add header file named lite_types.h for some common data structs. (!12262:[MS][LITE][Develop]remove cross dependency in inner headers)

Previously, some common data structs such as CpuBindMode and DeviceType are in context.h, this may cause cross-dependency between headers. So we create a new header named lite_types.h for some common data structs and move CpuBindMode and DeviceType from context.h into lite_types.h.

lite_types.h
namespace mindspore::lite {
/// \brief CpuBindMode defined for holding bind cpu strategy argument.
typedef enum {
  NO_BIND,    /**< no bind */
  HIGHER_CPU, /**< bind higher cpu first */
  MID_CPU     /**< bind middle cpu first */
} CpuBindMode;

/// \brief DeviceType defined for holding user's preferred backend.
typedef enum {
  DT_CPU, /**< CPU device type */
  DT_GPU, /**< GPU device type */
  DT_NPU  /**< NPU device type */
} DeviceType;
}  // namespace mindspore::lite
Add some new interfaces in ms_tensor.h for unified runtime API.(!13515:cpp api modify)

Previously, users could not create MSTensor or modify ``MSTensor, all MSTensor are created and managed by framework. However users need to create or modify MSTensor sometimes such as pre-processing input data. So we provide two new interfaces in ms_tensor.h: `CreateTensor` interface for creating `MSTensor` by user and `set_shape` interface for modifying the shape of `MSTensor`.

CreateTensor
/// \brief Create a MSTensor.
///
/// \return Pointer to an instance of MindSpore Lite MSTensor.
static MSTensor *CreateTensor(const std::string &name, TypeId type, const std::vector<int> &shape, const void *data,
                                size_t data_len);
set_shape
/// \brief Set the shape of MSTensor.
virtual void set_shape(const std::vector<int> &shape) = 0;

Previously, users could access to data of MSTensor by interface named MutableData. However MutableData is not only returning data of tensor but also allocating data for tensor if its data is nullptr. So we provide a new interfaces in ms_tensor.h named data for returning data of tensor without allocating automatically.

data
/// \brief Get the pointer of data in MSTensor.
///
/// \note The data pointer can be used to both write and read data in MSTensor. No memory buffer will be
/// allocated.
///
/// \return the pointer points to data in MSTensor.
virtual void *data() = 0;
Delete DimensionSize() in ms_tensor.h.(!13515:cpp api modify)

The interface named DimensionSize is fuinctionally overlapped with the interface named shape. For the simplicity of the interface, we delete DimensionSize and recommend users to use the new interface named shape instead.

DimensionSize()
/// \brief Get size of the dimension of the MindSpore Lite MSTensor index by the parameter index.
///
/// \param[in] index Define index of dimension returned.
///
/// \return Size of dimension of the MindSpore Lite MSTensor.
virtual int DimensionSize(size_t index) const = 0;
Move allocator from namespace mindspore::lite to namespace lite for unified runtime API.(!13515:cpp api modify)

Previously, class Allocator is in namespace mindspore::lite. Considering unified allocator interface for unified runtime API, we move Allocator to namespace mindspore.

1.1.0 1.2.0
namespace mindspore::lite {
/// \brief Allocator defined a memory pool for malloc memory and free memory dynamically.
///
/// \note List public class and interface for reference.
class Allocator;
}
namespace mindspore {
/// \brief Allocator defined a memory pool for malloc memory and free memory dynamically.
///
/// \note List public class and interface for reference.
class Allocator;
}

Bug fixes

  1. Fix the bug that the array in kernel registrar is not initialized.
  2. Fix segment fault caused by releasing of OpParameter in Crop kernel in mistake.
  3. Fix the bug that the MINDIR aware-training model is finally interpreted as weight-quant model.

Contributors

Thanks goes to these wonderful people:

Adel, AGroupofProbiotocs, anthonyaje, anzhengqi, askmiao, baihuawei, baiyangfan, bai-yangfan, bingyaweng, BowenK, buxue, caifubi, CaoJian, caojian05, caozhou, Cathy, changzherui, chenbo116, chenfei, chengxianbin, chenhaozhe, chenjianping, chenzomi, chenzupeng, chujinjin, cj, cjh9368, Corleone, damon0626, danish, Danish, davidmc, dayschan, doitH, dong-li001, eric, Eric, fary86, fuzhiye, Gaoxiong, GAO_HYP_XYJ, gengdongjie, Gogery, gongdaguo, gray0v0, gukecai, guoqi, gzhcv, hangq, hanhuifeng2020, Harshvardhan, He, heleiwang, hexia, Hoai, HuangBingjian, huangdongrun, huanghui, huangxinjing, huqi, huzhifeng, hwjiaorui, Islam Amin, Jesse, , Jiabin Liu, jianghui58, jiangzhiwen, Jiaqi, jin-xiulang, jinyaohui, jjfeing, John, Jonathan, jonyguo, JulyAi, jzg, kai00, kingfo, kingxian, kpy, kswang, laiyongqiang, leonwanghui, Li, liangchenghui, liangzelang, lichen_101010, lichenever, lihongkang, lilei, limingqi107, ling, linqingke, Lin Xh, liubuyu, liuwenhao4, liuxiao78, liuxiao93, liuyang_655, liuzhongkai, Lixia, lixian, liyanliu, liyong, lizhenyu, luopengting, luoyang, lvchangquan, lvliang, lz, mahdi, Mahdi, maning202007, Margaret_wangrui, mayang, mengyuanli, Ming_blue, nhussain, ougongchang, panfengfeng, panyifeng, Payne, Peilin, peixu_ren, Pengyongrong, qianlong, qianjiahong, r1chardf1d0, riemann_penn, rmdyh, Sheng, shenwei41, simson, Simson, Su, sunsuodong, tao_yunhao, tinazhang, VectorSL, , Wan, wandongdong, wangdongxu, wangmin, wangnan39@huawei.com, wangyue01, wangzhe, wanyiming, Wei, wenchunjiang, wilfChen, WilliamLian, wsc, wudenggang, wukesong, wuweikang, wuxuejian, Xiaoda, xiefangqi, xinyunfan, xuanyue, xulei2020, Xun, xuyongfei, yanghaitao, yanghaitao1, yanghaoran, YangLuo, yangruoqi713, yankai, yanzhenxiang2020, yao_yf, yepei6, yeyunpeng, Yi, yoni, yoonlee666, yuchaojie, yujianfeng, yuximiao, zengzitao, Zhang, zhanghaibo5@huawei.com, zhanghuiyao, zhanghui_china, zhangxinfeng3, zhangyihui, zhangz0911gm, zhanke, zhanyuan, zhaodezan, zhaojichen, zhaoting, zhaozhenlong, zhengjun10, zhiqwang, zhoufeng, zhousiyi, zhouyaqiang, zhouyifengCode, Zichun, Zirui, Ziyan, zjun, ZPaC, zymaa.

Contributions of any kind are welcome!

Last committed message: !15320 fix naml compile error
2021-04-14 18:15
lujiale

MindSpore 1.2.0-rc1

MindSpore 1.2.0 Release Notes

Major Features and Improvements

NewModels

  • [STABLE] Add CV models on Ascend: 3D Unet, Unet++, SSD-Resnet50-fpn, SSD-VGG16, crnn_seq2seq_ocr for BSI, CTPN, resnet18, DPN
  • [STABLE] Add CV models on GPU: Faster-RCNN
  • [STABLE] Add NLP models on Ascend: NAML, Fasttext, GRU, LSTM
  • [BETA] Add TPRR: Thinking Path Re-Ranker, an original ranked-base framework for Multi-Hop Question Answering which has won the first place in HotpotQA leaderboard.(Ascend)

FrontEnd

  • [STABLE] Support side effects expression to ensure that the perform order of user's semantics is correct.(Ascend/GPU/CPU)
  • [STABLE] Support calculating the gradient for network that contain non-Tensor input parameters(int, float, bool, mstype,int, mstype.float, mstype.uint, mstype.bool_, tuple, list, dict).(Ascend/GPU/CPU)
  • [STABLE] Support the inverse of a bool Tensor.(Ascend/GPU/CPU)
  • [STABLE] Uniform the interface isinstance.(Ascend/GPU/CPU)
  • [STABLE] Support negative indexes.(Ascend/GPU/CPU)
  • [STABLE] Support 110+ Numpy-like interfaces in mindspore.numpy.(Ascend/GPU/CPU)
  • [STABLE] Support export/load mindir model with a size greater than 2 GB.
  • [STABLE] The optimizer supports gradient centralization.(Ascend)
  • [STABLE] Support support auc metric, rou metric, bleu score metric, confusion matrix metric, cosine similarity metric, dice metric, hausdorff distance metric, occlusion sensitivity metric, perplexity metric, mean surface distance metric, root mean surface distance metric.
  • [STABLE] Support use EmbeddingLookup with cache.(Ascend)

Auto Parallel

  • [STABLE] Support AllGather and ReduceScatter fusion.(Ascend)
  • [STABLE] Support gradient accumulation feature in auto parallel mode.(Ascend/GPU)
  • [STABLE] Support running parallel optimizer with gradient accumulation.(Ascend)
  • [STABLE] Add the configuration of communication operators' fusion.(Ascend)

Executor

  • [STABLE] Support inference with Nvidia GPU.
  • [STABLE] Support data parallelism in PyNative mode.(Ascend/GPU)
  • [STABLE] Optimize LSTM inference memory consumption in Graph mode with CPU.

Sponge

  • [STABLE] Add SPONGE modules for molecular dynamics simulation, including Bond, Angle, Dihedral, Non Bond 14, NeighborList, Particle Mesh Ewald, Langevin MD and LIUJIAN MD.(GPU)

DataSet

  • [STABLE] If the libnuma library is installed in the environment, you can run export DATASET_ENABLE_NUMA=True to configure NUMA binding. In multi-card training scenarios, the training data processing speed can be improved, thereby improving the network training efficiency.
  • [STABLE] Unify API Tensor structure of Training/Inference interfaces in C++ SDK.
  • [STABLE] Optimize duplicated Decode in data preprocess using cache, improve preprocess efficiency.
  • [STABLE] Support eager mode to run data augmentation in Python & C++.
  • [STABLE] Support more data augmentation operators(e.g. Affine, Perspective) in MindSpore-Lite.
  • [STABLE] Support light pipeline to process MindData in MindSpore-Lite training.
  • [STABLE] Support more data preprossing operators based on DVPP hardware module and can be used on on Ascend310 platform.
  • [STABLE] Support copy-free property for data in Ascend310 inference process scenarios.

Running Data Recorder

  • [STABLE] Support running data recorder (RDR) for exception demarcation.
  • [STABLE] Provide records of multi-stage computational graphs, memory allocation information, graph execution order, stream execution order and task debug information when a "run task error" or "distribute task failed" occurs. (Ascend)
  • [STABLE] Provide records of multi-stage computational graphs, memory allocation information and graph execution order when a "SyncStream error" occurs. (GPU)

3D Feature

  • [STABLE] Support 3D ops: Conv3D, Conv3DBackpropInput, Conv3DBackpropFilter, Conv3DTranspose, BiasAdd, BiasAddGrad, PReLU, Transpose, Reshape, transdata, StrideSlice, MaxPool3D, MaxPool3DGrad, BinaryCrossEntropy, SigmoidCrossEntropyWithLogits, SigmoidCrossEntropyWithLogitsGrad, SoftmaxCrossEntropyWithLogits, SigmoidCrossEntropyWithLogits, SigmoidCrossEntropyWithLogitsGrad, BatchNorm3d, BatchNorm3dGrad, Dropout3d.
  • [STABLE] Support RMSELoss loss function, MAELoss loss function, FocalLoss loss function, DiceLoss binary loss function, and MultiClassDiceLoss multi-type loss function for 2D/3D network.
  • [STABLE] Add optimizer: AdamApplyOne(3D), ApplyMomentum(3D), SGD(3D).

API Change

Backwards Incompatible Change

Python API
mindspore.numpy.array(), mindspore.numpy.asarray(), mindspore.numpy.asfarray(), mindspore.numpy.copy() now support GRAPH mode, but cannot accept numpy.ndarray as input arguments anymore(!12726:Add March Numpy interfaces to mindspore)

Previously, these interfaces can accept numpy.ndarray as arguments and convert numpy.ndarray to Tensor, but cannot be used in GRAPH mode.
However, currently MindSpore Parser cannot parse numpy.ndarray in JIT-graph. To support these interfaces in graph mode, we have to remove numpy.ndarray support. With that being said, users can still use Tensor to convert numpy.ndarray to tensors.

1.1.1 1.2.0
>>> import mindspore.numpy as mnp
>>> import numpy
>>>
>>> nd_array = numpy.array([1,2,3])
>>> tensor = mnp.asarray(nd_array) # this line cannot be parsed in GRAPH mode
>>> import mindspore.numpy as mnp
>>> import numpy
>>>
>>> tensor = mnp.asarray([1,2,3]) # this line can be parsed in GRAPH mode
mindspore.numpy interfaces remove support for keyword arguments out and where(!12726:Add March Numpy interfaces to mindspore)

Previously, we have incomplete support for keyword arguments out and where in mindspore.numpy interfaces, however, the out argument is only functional when where argument is also provided, and out cannot be used to pass reference to numpy functions. Therefore, we have removed these two arguments to avoid any confusion users may have. Their original functionality can be found in np.where

1.1.1 1.2.0
>>> import mindspore.numpy as np
>>>
>>> a = np.ones((3,3))
>>> b = np.ones((3,3))
>>> out = np.zeros((3,3))
>>> where = np.asarray([[True, False, True],[False, False, True],[True, True, True]])
>>> res = np.add(a, b, out=out, where=where) # `out` cannot be used as a reference, therefore it is misleading
>>> import mindspore.numpy as np
>>>
>>> a = np.ones((3,3))
>>> b = np.ones((3,3))
>>> out = np.zeros((3,3))
>>> where = np.asarray([[True, False, True],[False, False, True],[True, True, True]])
>>> res = np.add(a, b)
>>> out = np.where(where, x=res, y=out) # instead of np.add(a, b, out=out, where=where)
Turn ops.MakeRefKey into an internal interface (!12010:Convert MakeRefKey to an internal interface)

Previously MakeRefKey is an external interface that is not used, now make it an internal interface with the same usage. We do not recommend users to use this interface, and we will remove the relevant introduction of this interface from the official website.

ops.ApplyFtrl, ops.ApplyMomentum, ops.ApplyRMSProp, ops.ApplyCenteredRMSProp change the output on Ascend backend from multiple to a single. (!11895:unify mindir for different backend: the output num of optimizer ops, the backward of concat)

Previously the number of outputs of these operator is different on different backends. To unify their definition we change their output on Ascend backend from multiple to a single.

P.FusedBatchNorm, P.FusedBatchNormEx deleted (!12115:IR operators of GPU and CPU are unified as batchnorm)

The FusedBatchNorm and FusedBatchNormEx interface has been deleted. Please use the batchnorm operator to replace it.

MetaTensor deleted (!10325:modify MetaTensor and Tensor)

The MetaTensor interface has been deleted. The function of MetaTensor has been integrated into tensor.

ControlDepend is deleted, use Depend instead. The decorator @C.add_flags(has_effect=True) does not work. (!13793:remove control_depend from py file)

Previously, we used ControlDepend to control the execution order of multiple operators. In version 1.2.0, mindspore introduces the auto-monad side effects expression to ensure that the perform order of user's semantics is correct. Therefore, ControlDepend is deleted and Depend is recommended.

In most scenarios, if operators have IO side effects (such as print) or memory side effects (such as assign), they will be executed according to the user's semantics. In some scenarios, if the two operators A and B have no order dependency, and A must be executed before B, we recommend using Depend to specify their execution order. See the API documentation of the Depend operator for specific usage.

1.1.1 1.2.0
    In some side-effect scenarios, we need to ensure the execution order of operators.
    In order to ensure that operator A is executed before operator B, it is recommended
    to insert the Depend operator between operators A and B.

    Previously, the ControlDepend operator was used to control the execution order.
    Since the ControlDepend operator is deprecated from version 1.1, it is recommended
    to use the Depend operator instead. The replacement method is as follows::

        a = A(x)                --->        a = A(x)
        b = B(y)                --->        y = Depend(y, a)
        ControlDepend(a, b)     --->        b = B(y)
    In most scenarios, if operators have IO side effects or memory side effects,
    they will be executed according to the user's semantics. In some scenarios,
    if the two operators A and B have no order dependency, and A must be executed
    before B, we recommend using Depend to specify their execution order. The
    usage method is as follows::

        a = A(x)                --->        a = A(x)
        b = B(y)                --->        y = Depend(y, a)
                                --->        b = B(y)

After the introduction of the auto-monad side effect expression feature, the decorator @C.add_flags(has_effect=True) does not work. If the decorator is used in the script, please modify. Take the overflow identification operator (without side effects) as an example, the modification method is as follows:

1.1.1 1.2.0
@C.add_flags(has_effect=True)
def construct(self, *inputs):
    ...
    loss = self.network(*inputs)
    init = self.allo_status()
    self.clear_status(init)
    ...
def construct(self, *inputs):
    ...
    loss = self.network(*inputs)
    init = self.allo_status()
    init = F.depend(init, loss)
    clear_status = self.clear_status(init)
    ...
C++ API
C++ API support dual ABI now.(!12432:api support dual abi )

1.1.1 supports only the old ABI. Currently, both the new and the old are supported.

1.1.1 1.2.0
add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0)
add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0)  # old ABI are supported
add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=1)  # new ABI are supprrted, too
                                                   # write nothing, use new ABI as default
Context refactor.(!13515:cpp api modify)

The Context class is refactored. For details, see the API docs.

1.1.1 1.2.0
GlobalContext::SetGlobalDeviceTarget(kDeviceTypeAscend310);       // set device target is ascend310
GlobalContext::SetGlobalDeviceID(0);                              // set device id is 0
auto model_context = std::make_shared<ModelContext>();            // create a model context
ModelContext::SetInsertOpConfigPath(model_context, "./aipp.cfg")  // set aipp config file is ./aipp.cfg
auto model_context = std::make_shared<Context>();                 // create a model context
auto ascend310_info = std::make_shared<Ascend310DeviceInfo>();
model_context.MutableDeviceInfo().push_back(ascend310_info );     // set device target is ascend310
ascend310_info->SetDeviceID(0);                                   // set device id is 0
ascend310_info->SetInsertOpConfigPath("./aipp.cfg");              // set aipp config file is ./aipp.cfg
LoadModel interface changes.(!13515:cpp api modify)

LoadModel is renamed Load. No exception is thrown new but the return status should be checked.

1.1.1 1.2.0
try {
  auto graph = Serialization::LoadModel(model_file_path, kMindIR);
} catch (...) { ... }
Graph graph;
auto ret = Serialization::Load(model_file_path, kMindIR, &graph);
if (ret != kSuccess) { ... }
Model ctor changes.(!13515:cpp api modify)

Model uses a non-parameter ctor now, and arguments are passed in through Build.

1.1.1 1.2.0
Model net(net_cell, model_context);
auto ret = net.Build();
if (ret != kSuccess) { ... }
Model net;
auto ret = net.Build(net_cell, model_context);
if (ret != kSuccess) { ... }
MSTensor::CreateTensor returns a native pointer now.(!13515:cpp api modify)

MSTensor::CreateTensor and MSTensor::CreateRefTensor returns a native pointer now, need to be destroy by DestroyTensorPtr.

1.1.1 1.2.0
auto tensor = MSTensor::CreateTensor(xxx, xxx, ...);
auto name = tensor.Name();
auto tensor = MSTensor::CreateTensor(xxx, xxx, ...);
auto name = tensor->Name();
MSTensor::DestroyTensorPtr(tensor);

New features

Python API
  • Add SPONGE functions: mindspore.ops.operations.BondForceWithAtomEnergy, mindspore.ops.operations.AngleForceWithAtomEnergy, mindspore.ops.operations.DihedralForceWithAtomEnergy, mindspore.ops.operations.Dihedral14LJCFForceWithAtomEnergy, mindspore.ops.operations.LJForceWithPMEDirectForce, mindspore.ops.operations.PMEExcludedForce, mindspore.ops.operations.PMEReciprocalForce,mindspore.ops.operations.BondEnergy, mindspore.ops.operations.AngleEnergy,mindspore.ops.operations.DihedralEnergy, mindspore.ops.operations.Dihedral14LJEnergy, mindspore.ops.operations.Dihedral14CFEnergy,mindspore.ops.operations.LJEnergy, mindspore.ops.operations.PMEEnergy. All operators are supported in GPU.

Deprecations

Python API
nn.MatMul is now deprecated in favor of ops.matmul (!12817:numpy-native deprecate nn.MatMul)

ops.matmul follows the API of numpy.matmul as closely as possible. As a function interface, ops.matmul is applied without instantiation, as opposed to nn.MatMul, which should only be used as a class instance.

1.1.1 1.2.0
>>> import numpy as np
>>> from mindspore import Tensor, nn
>>>
>>> x = Tensor(np.ones((2, 3)).astype(onp.float32)
>>> y = Tensor(np.ones((3, 4)).astype(onp.float32)
>>> nn.MatMul()(x, y)
>>> import numpy as np
>>> from mindspore import Tensor, ops
>>>
>>> x = Tensor(np.ones((2, 3)).astype(onp.float32)
>>> y = Tensor(np.ones((3, 4)).astype(onp.float32)
>>> ops.matmul(x, y)

Bug fixes

FrontEnd

Executor

Dataset

MindSpore Lite

Major Features and Improvements

Converter and runtime

  1. Support TensorFlow model in Converter except aware-training model.
  2. Add fusion pattern for same horizontal operators in Converter.
  3. Support Jar in x86_64 system for integrating into server with Java backend conveniently.
  4. Provide unified runtime API for developer reusing their code between cloud side and end side.[BETA]
  5. Improve control-flow capabilities continually: Support GRU fusion in Converter; Support weight-quant for control-flow model; Support control-flow model inference with half precision; Support nested control-flow model.[BETA]

ARM backend optimization

  1. Add NLP dependent float16 operators(like lstm) to enhance inference performance.
  2. Optimize operators: lstm, gru, depthwise.
  3. Add 6 NPU operators(like FullConnection), and fix some bugs about buildIR failed.

OpenCL backend

  1. Add new ops:add 10+ ops,total 72 ops;
  2. Performance optimization:by memory layout optimize,block tiling,Performance improved by 30% compared to version 1.1 at Adreno GPU.
  3. Initialization time optimization:initialization time improve 100% vs MSLITE Version1.1 by store kernel cache as binary.
  4. Support Java call on Mali or Adreno GPU.

Post quantization

  1. Support quantization of gather and lstm ops.
  2. Support quantizatizing TF Lite models with sub-graph node.
  3. Add quantiztion strategy to decide quantize ops or not,less accuracy loss and higher compression rate.

Training on Device

  1. Virtual batching, use mini-batch to minic large batch in theorical with few RAM consumption.
  2. Converter unify, do not compile tod and iod converter separately.
  3. Performance optimization to BWD ops.
  4. TrainLoop with Off-The-Shelf Functionality blocks, like LR scheduler, Loss Monitor, Ckpt Saver, Accuracy Monitor.
  5. Integration of code with Minddata lite.
  6. Support more networks (googlenet, densenet, shufflenetv2, nin, vgg) and operators.

Codegen

  1. Support 79 ops for the ARM platform and all CMSIS ops for Arm Cortex-M Series.
  2. Multiplatform support, including Android, IoT Devices.
  3. Support offline model weight preprocessing while compiling.
  4. Support offline memory reuse computing for minimum runtime buffer size.

API Change

API Incompatible Change

C++ API
Add header file named lite_types.h for some common data structs. (!12262:[MS][LITE][Develop]remove cross dependency in inner headers)

Previously, some common data structs such as CpuBindMode and DeviceType are in context.h, this may cause cross-dependency between headers. So we create a new header named lite_types.h for some common data structs and move CpuBindMode and DeviceType from context.h into lite_types.h.

lite_types.h
namespace mindspore::lite {
/// \brief CpuBindMode defined for holding bind cpu strategy argument.
typedef enum {
  NO_BIND,    /**< no bind */
  HIGHER_CPU, /**< bind higher cpu first */
  MID_CPU     /**< bind middle cpu first */
} CpuBindMode;

/// \brief DeviceType defined for holding user's preferred backend.
typedef enum {
  DT_CPU, /**< CPU device type */
  DT_GPU, /**< GPU device type */
  DT_NPU  /**< NPU device type */
} DeviceType;
}  // namespace mindspore::lite
Add some new interfaces in ms_tensor.h for unified runtime API.(!13515:cpp api modify)

Previously, users could not create MSTensor or modify ``MSTensor, all MSTensor are created and managed by framework. However users need to create or modify MSTensor sometimes such as pre-processing input data. So we provide two new interfaces in ms_tensor.h: `CreateTensor` interface for creating `MSTensor` by user and `set_shape` interface for modifying the shape of `MSTensor`.

CreateTensor
/// \brief Create a MSTensor.
///
/// \return Pointer to an instance of MindSpore Lite MSTensor.
static MSTensor *CreateTensor(const std::string &name, TypeId type, const std::vector<int> &shape, const void *data,
                                size_t data_len);
set_shape
/// \brief Set the shape of MSTensor.
virtual void set_shape(const std::vector<int> &shape) = 0;

Previously, users could access to data of MSTensor by interface named MutableData. However MutableData is not only returning data of tensor but also allocating data for tensor if its data is nullptr. So we provide a new interfaces in ms_tensor.h named data for returning data of tensor without allocating automatically.

data
/// \brief Get the pointer of data in MSTensor.
///
/// \note The data pointer can be used to both write and read data in MSTensor. No memory buffer will be
/// allocated.
///
/// \return the pointer points to data in MSTensor.
virtual void *data() = 0;
Delete DimensionSize() in ms_tensor.h.(!13515:cpp api modify)

The interface named DimensionSize is fuinctionally overlapped with the interface named shape. For the simplicity of the interface, we delete DimensionSize and recommend users to use the new interface named shape instead.

DimensionSize()
/// \brief Get size of the dimension of the MindSpore Lite MSTensor index by the parameter index.
///
/// \param[in] index Define index of dimension returned.
///
/// \return Size of dimension of the MindSpore Lite MSTensor.
virtual int DimensionSize(size_t index) const = 0;
Move allocator from namespace mindspore::lite to namespace lite for unified runtime API.(!13515:cpp api modify)

Previously, class Allocator is in namespace mindspore::lite. Considering unified allocator interface for unified runtime API, we move Allocator to namespace mindspore.

1.1.0 1.2.0
namespace mindspore::lite {
/// \brief Allocator defined a memory pool for malloc memory and free memory dynamically.
///
/// \note List public class and interface for reference.
class Allocator;
}
namespace mindspore {
/// \brief Allocator defined a memory pool for malloc memory and free memory dynamically.
///
/// \note List public class and interface for reference.
class Allocator;
}

Bug fixes

  1. Fix the bug that the array in kernel registrar is not initialized.
  2. Fix segment fault caused by releasing of OpParameter in Crop kernel in mistake.
  3. Fix the bug that the MINDIR aware-training model is finally interpreted as weight-quant model.

Contributors

Thanks goes to these wonderful people:

Adel, AGroupofProbiotocs, anthonyaje, anzhengqi, askmiao, baihuawei, baiyangfan, bai-yangfan, bingyaweng, BowenK, buxue, caifubi, CaoJian, caojian05, caozhou, Cathy, changzherui, chenbo116, chenfei, chengxianbin, chenhaozhe, chenjianping, chenzomi, chenzupeng, chujinjin, cj, cjh9368, Corleone, damon0626, danish, Danish, davidmc, dayschan, doitH, dong-li001, eric, Eric, fary86, fuzhiye, Gaoxiong, GAO_HYP_XYJ, gengdongjie, Gogery, gongdaguo, gray0v0, gukecai, guoqi, gzhcv, hangq, hanhuifeng2020, Harshvardhan, He, heleiwang, hexia, Hoai, HuangBingjian, huangdongrun, huanghui, huangxinjing, huqi, huzhifeng, hwjiaorui, Islam Amin, Jesse, , Jiabin Liu, jianghui58, jiangzhiwen, Jiaqi, jin-xiulang, jinyaohui, jjfeing, John, Jonathan, jonyguo, JulyAi, jzg, kai00, kingfo, kingxian, kpy, kswang, laiyongqiang, leonwanghui, Li, liangchenghui, liangzelang, lichen_101010, lichenever, lihongkang, lilei, limingqi107, ling, linqingke, Lin Xh, liubuyu, liuwenhao4, liuxiao78, liuxiao93, liuyang_655, liuzhongkai, Lixia, lixian, liyanliu, liyong, lizhenyu, luopengting, luoyang, lvchangquan, lvliang, lz, mahdi, Mahdi, maning202007, Margaret_wangrui, mayang, mengyuanli, Ming_blue, nhussain, ougongchang, panfengfeng, panyifeng, Payne, Peilin, peixu_ren, Pengyongrong, qianlong, qianjiahong, r1chardf1d0, riemann_penn, rmdyh, Sheng, shenwei41, simson, Simson, Su, sunsuodong, tao_yunhao, tinazhang, VectorSL, , Wan, wandongdong, wangdongxu, wangmin, wangnan39@huawei.com, wangyue01, wangzhe, wanyiming, Wei, wenchunjiang, wilfChen, WilliamLian, wsc, wudenggang, wukesong, wuweikang, wuxuejian, Xiaoda, xiefangqi, xinyunfan, xuanyue, xulei2020, Xun, xuyongfei, yanghaitao, yanghaitao1, yanghaoran, YangLuo, yangruoqi713, yankai, yanzhenxiang2020, yao_yf, yepei6, yeyunpeng, Yi, yoni, yoonlee666, yuchaojie, yujianfeng, yuximiao, zengzitao, Zhang, zhanghaibo5@huawei.com, zhanghuiyao, zhanghui_china, zhangxinfeng3, zhangyihui, zhangz0911gm, zhanke, zhanyuan, zhaodezan, zhaojichen, zhaoting, zhaozhenlong, zhengjun10, zhiqwang, zhoufeng, zhousiyi, zhouyaqiang, zhouyifengCode, Zichun, Zirui, Ziyan, zjun, ZPaC, zymaa.

Contributions of any kind are welcome!

Last committed message: !14199 revert pr13787
2021-02-01 10:18
lujiale

MindSpore 1.1.1 Release Notes

MindSpore

Major Features and Improvements

NewModels

  • [STABLE] BGCF: a Bayesian Graph Collaborative Filtering(BGCF) framework used to model the uncertainty in the user-item interaction graph and thus recommend accurate and diverse items on Amazon recommendation dataset.(Ascend)
  • [STABLE] GRU: a recurrent neural network architecture like the LSTM(Long-Short Term Memory) on Multi30K dataset.(Ascend)
  • [STABLE] FastText: a simple and efficient text classification algorithm on AG's news topic classification dataset, DBPedia Ontology classification dataset and Yelp Review Polarity dataset.(Ascend)
  • [STABLE] LSTM: a recurrent neural network architecture used to learn word vectors for sentiment analysis on aclImdb_v1 dataset.(Ascend)
  • [STABLE] SimplePoseNet: a convolution-based neural network for the task of human pose estimation and tracking on COCO2017 dataset.(Ascend)

FrontEnd

  • [BETA] Support Tensor Fancy Index Getitem with tuple and list. (Ascend/GPU/CPU)

Graph Kernel Fusion

  • [STABLE] Optimize buffer fusion performance based on primary-op aggregation. (Ascend/GPU)
  • [STABLE] Support auto-expanding composite operators into primary-op sub-graphs for fusion. (Ascend/GPU)
  • [STABLE] Optimize reduce operators performance with atomic funcions in AKG. (GPU)
  • [BETA] Support Parallel Fusion and Buffer Stitching. (GPU)
  • [STABLE] Improve the performance of BERT Base by enabling Graph Kernel Fusion. (GPU)

Backwards Incompatible Change

Python API

ops.AvgPool, ops.MaxPool, ops.MaxPoolWithArgmax change attr name from 'ksize', 'padding' to 'kernel_size', 'pad_mode' (!11350:update Pooling's attr kernel_size, pad_mode)

Previously the kernel size and pad mode attrs of pooling ops are named "ksize" and "padding", which is a little puzzling and inconsistent with convolution ops. So they are rename to "kernel_size" and "pad_mode".

1.1.0 1.1.1
>>> import mindspore.ops as ops
>>>
>>> avg_pool = ops.AvgPool(ksize=2, padding='same')
>>> max_pool = ops.MaxPool(ksize=2, padding='same')
>>> max_pool_with_argmax = ops.MaxPoolWithArgmax(ksize=2, padding='same')
>>> import mindspore.ops as ops
>>>
>>> avg_pool = ops.AvgPool(kernel_size=2, pad_mode='same')
>>> max_pool = ops.MaxPool(kernel_size=2, pad_mode='same')
>>> max_pool_with_argmax = ops.MaxPoolWithArgmax(kernel_size=2, pad_mode='same')
ops.TensorAdd, change API name to ops.Add (!11568:Change TensorAdd to Add)

The operator name TensorAdd is not standardized, it is changed to Add. The old interface can be used continuously, but will be deleted in subsequent versions, it is recommended to use and switch to the latest interface.

1.1.0 1.1.1
>>> import mindspore.ops as ops
>>>
>>> add = ops.TensorAdd()
>>> import mindspore.ops as ops
>>>
>>> add = ops.Add()
ops.Gelu, ops.GeluGrad, ops.FastGelu, ops.FastGeluGrad, change API name to ops.GeLU, ops.GeLUGrad, ops.FastGeLU, ops.FastGeLUGrad (!11603:fix_gelu_name)

Gelu, GeluGrad, FastGelu, and FastGeluGrad names are unified into ReLU naming rules, "lu" is changed to the uppercase "LU". The old interface can be used continuously, but will be deleted in subsequent versions, it is recommended to use and switch to the latest interface.

1.1.0 1.1.1
>>> import mindspore.ops as ops
>>>
>>> gelu = ops.Gelu()
>>> gelu_grad = ops.GeluGrad()
>>> fast_gelu = ops.FastGelu()
>>> fast_gelu_grad = ops.FastGeluGrad()
>>> import mindspore.ops as ops
>>>
>>> gelu = ops.GeLU()
>>> gelu_grad = ops.GeLUGrad()
>>> fast_gelu = ops.FastGeLU()
>>> fast_gelu_grad = ops.FastGeLUGrad()
ops.GatherV2, change API name to ops.Gather (!11713:Change GatherV2 to Gather)

GatherV2 is changed to Gather. The old interface can be used continuously, but will be deleted in subsequent versions, it is recommended to use and switch to the latest interface.

1.1.0 1.1.1
>>> import mindspore.ops as ops
>>>
>>> gather = ops.GatherV2()
>>> import mindspore.ops as ops
>>>
>>> gather = ops.Gather()
ops.Packops.Unpack, change API name to ops.Stackops.Unstack (!11828:fix_stack_name)

Pack is changed to Stack, and Unpack is changed to Unstack. The old interface can be used continuously, but will be deleted in subsequent versions, it is recommended to use and switch to the latest interface.

1.1.0 1.1.1
>>> import mindspore.ops as ops
>>>
>>> pack= ops.Pack()
>>> unpack= ops.Unpack()
>>> import mindspore.ops as ops
>>>
>>> stack= ops.Stack()
>>> unstack= ops.Unstack()
ops.ControlDepend, add deprecated to ControlDepend (!11844:add deprecated to ControlDepend)

ControlDepend is deprecated and will be removed in a future version, use Depend instead.

1.1.0 1.1.1
Note:
    This operation does not work in `PYNATIVE_MODE`.
Note:
        This operation does not work in `PYNATIVE_MODE`.
        `ControlDepend` is deprecated from version 1.1 and will be removed in a future version, use `Depend` instead.
ops.Depend, add operator description and use case (!11815:Modify the description of the depend operator), (!11879:modify depend interface description)

Since the ControlDepend operator will be deprecated from version 1.2, it is recommended to use the Depend operator instead.

1.1.0 1.1.1
Depend is used for processing side-effect operations.

Inputs:
    - **value** (Tensor) - the real value to return for depend operator.
    - **expr** (Expression) - the expression to execute with no outputs.

Outputs:
    Tensor, the value passed by last operator.

Supported Platforms:
    ``Ascend`` ``GPU`` ``CPU``
Depend is used for processing dependency operations.

In some side-effect scenarios, we need to ensure the execution order of operators.
In order to ensure that operator A is executed before operator B, it is recommended
to insert the Depend operator between operators A and B.

Previously, the ControlDepend operator was used to control the execution order.
Since the ControlDepend operator will be deprecated from version 1.2, it is
recommended to use the Depend operator instead. The replacement method is as follows::

    a = A(x)                --->        a = A(x)
    b = B(y)                --->        y = Depend(y, a)
    ControlDepend(a, b)     --->        b = B(y)

Inputs:
    - **value** (Tensor) - the real value to return for depend operator.
    - **expr** (Expression) - the expression to execute with no outputs.

Outputs:
    Tensor, the value passed by last operator.

Supported Platforms:
    ``Ascend`` ``GPU`` ``CPU``

Examples:
    >>> import numpy as np
    >>> import mindspore
    >>> import mindspore.nn as nn
    >>> import mindspore.ops.operations as P
    >>> from mindspore import Tensor
    >>> class Net(nn.Cell):
    ...     def __init__(self):
    ...         super(Net, self).__init__()
    ...         self.softmax = P.Softmax()
    ...         self.depend = P.Depend()
    ...
    ...     def construct(self, x, y):
    ...         mul = x * y
    ...         y = self.depend(y, mul)
    ...         ret = self.softmax(y)
    ...         return ret
    ...
    >>> x = Tensor(np.ones([4, 5]), dtype=mindspore.float32)
    >>> y = Tensor(np.ones([4, 5]), dtype=mindspore.float32)
    >>> net = Net()
    >>> output = net(x, y)
    >>> print(output)
    [[0.2 0.2 0.2 0.2 0.2]
     [0.2 0.2 0.2 0.2 0.2]
     [0.2 0.2 0.2 0.2 0.2]
     [0.2 0.2 0.2 0.2 0.2]]

C++ API

change namespace from mindspore::api to mindspore (!11574:unifiled lite & cloud api )
1.1.0 1.1.1
namespace ms = mindspore::api;
namespace ms = mindspore;
Context (!11574:unifiled lite & cloud api )
1.1.0 1.1.1
ms::Context::Instance().SetDeviceTarget(ms::kDeviceTypeAscend310).SetDeviceID(0);
ms::GlobalContext::SetGlobalDeviceTarget(ms::kDeviceTypeAscend310);
ms::GlobalContext::SetGlobalDeviceID(0);
rename Tensor to MSTensor (!11574:unifiled lite & cloud api )
1.1.0 1.1.1
ms::Tensor a;
ms::MSTensor a;
Model move setting of model options from Build to ctor Model (!11574:unifiled lite & cloud api )
1.1.0 1.1.1
ms::Model model(graph_cell);
model.Build(model_options);
ms::Model model(graph_cell, model_context);
model.Build();
Model modify GetInputsInfo, GetOutputsInfo to GetInputs, GetOutputs (!11574:unifiled lite & cloud api )
1.1.0 1.1.1
std::vector<std::string> names;
std::vector<ms::DataType> types;
std::vector<std::vector<int64_t>> shapes;
std::vector<size_t> mem_sizes;
model.GetInputsInfo(&names, &types, &shapes, &mem_sizes);
std::cout << "Input 0 name: " << names[0] << std::endl;
auto inputs = model.GetInputs();
std::cout << "Input 0 name: " << inputs[0].Name() << std::endl;
Model modify Predict parameters type from Buffer to MSTensor (!11574:unifiled lite & cloud api )
1.1.0 1.1.1
std::vector<ms::Buffer> inputs;
std::vector<ms::Buffer> outputs;
model.Predict(inputs, &outputs);
std::vector<ms::MSTensor> inputs;
std::vector<ms::MSTensor> outputs;
model.Predict(inputs, &outputs);

Deprecations

Python API

ops.SpaceToBatch, ops.BatchToSpace are deprecated in favor of ops.SpaceToBatchND, ops.BatchToSpaceND(!11527: unify SpaceToBatchND's attr)

The ops.SpaceToBatchND, ops.BatchToSpaceND are more general and have same behavior as ops.SpaceToBatch, ops.BatchToSpace when block_shape is a int.

ops.DepthwiseConv2dNative is deprecated in favor of nn.Conv2D(!11702:replace DepthWiseConv with nn.Conv2D)

The ops.DepthwiseConv2dNative is only supported by Ascend, it is recommended to directly use nn.Conv2D. If group is equal to in_ channels and out_channels, the 2D convolution layer is also a 2D depthwise convolution layer.

Contributors

Thanks goes to these wonderful people:

Adel, AGroupofProbiotocs, anthonyaje, anzhengqi, askmiao, baihuawei, baiyangfan, bai-yangfan, bingyaweng, BowenK, buxue, caifubi, CaoJian, caojian05, caozhou, Cathy, changzherui, chenbo116, chenfei, chengxianbin, chenhaozhe, chenjianping, chenzomi, chenzupeng, chujinjin, cj, cjh9368, Corleone, damon0626, danish, Danish, davidmc, dayschan, doitH, eric, Eric, fary86, fuzhiye, Gaoxiong, gengdongjie, Gogery, gongdaguo, gray0v0, gukecai, guoqi, gzhcv, hangq, hanhuifeng2020, Harshvardhan, He, heleiwang, hexia, Hoai, HuangBingjian, huangdongrun, huanghui, huangxinjing, huqi, huzhifeng, hwjiaorui, Jesse, jianghui58, jiangzhiwen, Jiaqi, jin-xiulang, jinyaohui, jjfeing, John, Jonathan, jonyguo, JulyAi, jzg, kai00, kingfo, kingxian, kpy, kswang, laiyongqiang, leonwanghui, Li, liangchenghui, liangzelang, lichen_101010, lichenever, lihongkang, lilei, limingqi107, ling, linqingke, liubuyu, liuwenhao4, liuxiao78, liuxiao93, liuyang_655, liuzhongkai, Lixia, lixian, liyanliu, liyong, lizhenyu, luoyang, lvchangquan, lvliang, lz, mahdi, Mahdi, maning202007, Margaret_wangrui, mayang, mengyuanli, nhussain, ougongchang, panfengfeng, panyifeng, Payne, Peilin, peixu_ren, Pengyongrong, qianlong, r1chardf1d0, riemann_penn, rmdyh, Sheng, shenwei41, simson, Simson, Su, sunsuodong, tao_yunhao, tinazhang, VectorSL, , Wan, wandongdong, wangdongxu, wangmin, wangnan39@huawei.com, wangyue01, wangzhe, wanyiming, Wei, wenchunjiang, wilfChen, WilliamLian, wsc, wukesong, wuweikang, wuxuejian, Xiaoda, xiefangqi, xinyunfan, xuanyue, xulei2020, Xun, xuyongfei, yanghaitao, yanghaitao1, yanghaoran, YangLuo, yangruoqi713, yankai, yanzhenxiang2020, yao_yf, yepei6, yeyunpeng, Yi, yoni, yoonlee666, yuchaojie, yujianfeng, yuximiao, zengzitao, Zhang, zhanghaibo5@huawei.com, zhanghuiyao, zhangyihui, zhangz0911gm, zhanke, zhanyuan, zhaodezan, zhaojichen, zhaoting, zhaozhenlong, zhengjun10, zhoufeng, zhousiyi, zhouyaqiang, zhouyifengCode, Zichun, Zirui, Ziyan, zjun, ZPaC, zymaa

Contributions of any kind are welcome!

2020-12-31 17:47
6560119 panza 1584156773 zhunaipan

MindSpore 1.1.0 Release Notes

MindSpore

Major Features and Improvements

NewModels

  • [STABLE] GNMT v2: similar to the model described in Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, which is mainly used for corpus translation, on WMT Englis-German dataset.(Ascend)
  • [STABLE] MaskRCNN: a conceptually simple, flexible, and general framework for object instance segmentation on COCO2017 dataset.(Ascend)
  • [STABLE] YOLOv4: a state-of-the-art detector which is faster and more accurate than all available alternative detectors on MS COCO dataset.(Ascend)
  • [STABLE] Openpose: proposes a bottom-up human attitude estimation algorithm using Part Affinity Fields on COCO2017 dataset.(Ascend)
  • [STABLE] CNN-CTC: proposes three major contributions to addresses scene text recognition (STR) on MJSynth and SynthText dataset.(Ascend)
  • [STABLE] CenterFace: a practical anchor-free face detection and alignment method for edge devices on WiderFace dataset.(Ascend)
  • [STABLE] ShuffleNetV2: a much faster and more accurate netowrk than the previous networks on ImageNet 2012 dataset.(GPU)
  • [STABLE] EfficientNet-B0: a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient on ImageNet 2012 dataset.(GPU)
  • [BETA] SSD-GhostNet: based on an Ghost module structure which generate more features from cheap operations on Oxford-IIIT Pet dataset.(Ascend)
  • [BETA] DS-CNN: Depthwise separable convolutional neural network on Speech commands dataset.(Ascend)
  • [BETA] DeepPotentialH2O: A neural network model for molecular dynamics simulations. (Ascend)
  • [BETA] GOMO: A classical numerical method called GOMO for ocean simulation. (GPU)

FrontEnd

  • [STABLE] Refactor the MINDIR to support 310 inference(Ascend).
  • [STABLE] The execution backend of sparse operations in optimizer can be set through 'target'. (Ascend/GPU/CPU)
  • [STABLE] Support saving specified network to checkpoint and filtering parameters according to prefix when load checkpoint. (Ascend/GPU/CPU)
  • [STABLE] Allow user choose whether to load parameter into network strictly.(Ascend/GPU/CPU)
  • [STABLE] Before training, in graph mode, in order to have the same network initialization parameter values ​​for all devices, broadcast the parameters on device 0 to other devices. (Ascend/GPU)
  • [STABLE] Support if by if of control flow subgraph. (Ascend/GPU)
  • [STABLE] Support the judgment that whether a tensor is in a list. (Ascend/GPU/CPU)
  • [STABLE] Support to get a value by using the corresponding key in a dictionary in the network; Support to get keys and values of a dictionary in the network. (Ascend/GPU/CPU)
  • [STABLE] Support Tensor in enumerate. (Ascend/GPU/CPU)
  • [STABLE] Support multilevel index assignment. (Ascend/GPU/CPU)
  • [STABLE] Support the 'expand_as','view','abs','mean' method of Tensor. (Ascend/GPU/CPU)
  • [STABLE] Support ResizeBilinear operation transfer ratio. (Ascend)
  • [STABLE] nn.Matmul supports matrix-vector product and batched matrix multiply. (Ascend/GPU)
  • [STABLE] nn.Dense supports input tensor whose dimension can be greater than 2. (Ascend/GPU)
  • [BETA] Support higher order differentiation for partial operators.(CPU/GPU/Ascend)
  • [STABLE] Support Tensor Augassign.(Ascend/GPU)
  • [BETA] Support 22 numpy native interfaces.

Auto Parallel

  • [STABLE] Support parallel optimizer with weight shard. (Ascend/GPU)
  • [STABLE] Support distributed operators: element-wise series, UnsortedSegmentSum, UnsortedSegmentMin, Split, BroadcastTo and Unique etc. (Ascend/GPU)
  • [STABLE] Support distributed model prediction. (Ascend/GPU)
  • [STABLE] Support auto mixed precision level "O2" in auto and semi auto parallel mode. (Ascend/GPU)
  • [STABLE] Add MultiFieldEmbeddingLookup high-level interface. (Ascend/GPU)

Executor

  • [STABLE] ResNet50 performance optimze. (GPU)
  • [STABLE] Support modelzoo net in PyNative mode(Ascend 29, GPU 23, CPU 2).(Ascend/GPU/CPU)
  • [STABLE] Support PyNative mode on CPU.(CPU)
  • [STABLE] Optimize performance in PyNative mode.(Ascend/GPU/CPU)
  • [STABLE] Support Safe Optimized Memory Allocation Solver (SOMAS) on Ascend to improve the memory-reuse, the batch size of Bert large model (128 sequence length) is increased from 160 to 208.(Ascend)
  • [BETA] Support second order differentiation in PyNative mode.(Ascend/GPU)
  • [DEMO] Add distributed trainning in PyNative mode.(Ascend/GPU)

MDP

  • [STABLE] Add new operators for Ascend and GPU: IGamma, LGamma, DiGamma;
  • [STABLE] Add new distributions for Ascend and GPU: LogNormal, and Logistic;
  • [BETA] Add new distributions for Ascend only: Gumbel, Cauchy, Gamma, Beta, and Poisson; Add Categorical distribution for GPU;
  • [STABLE] Add new bijectors for Ascend and GPU: GumbelCDF, Invert;
  • [STABLE] Add Bayesian layer realized by local reparameterization method for Ascend and GPU;
  • [STABLE] Add Anomaly Detection Toolbox based on VAE for Ascend and GPU.

DataSet

  • [STABLE] Support single node multi-p distributed cache data sharing
  • [STABLE] Support GPU profiling with data processing
  • [STABLE] Support YOLOV3 dynamic shape in sink mode with dataset
  • [STABLE] Support unique processing in the data processing pipeline
  • [STABLE] Python layer parameter verification error information unified

API Change

Backwards Incompatible Change

Python API
Parts of Optimizer add target interface (!6760:sparse optimizer)

The usage of the sparse optimizer is changed.

The target interface is used to set the execution backend of the sparse operator.

The add_primitive_attr interface is no longer allowed.

The following optimizers add the target interface: Adam, FTRL, LazyAdam, ProximalAdagrad

1.0.1 1.1.0
>>> from mindspore.nn import Adam
>>>
>>> net = LeNet5()
>>> optimizer = Adam(filter(lambda x: x.requires_grad, net.get_parameters()))
>>> optimizer.sparse_opt.add_prim_attr("primitive_target", "CPU")
>>> from mindspore.nn import Adam
>>>
>>> net = LeNet5()
>>> optimizer = Adam(filter(lambda x: x.requires_grad, net.get_parameters()))
>>> optimizer.target = 'CPU'
export Modify the input parameters and export's file name (!7385:integrate_export_v2!9057:modify export file name)

Export the MindSpore prediction model to a file in the specified format.

The reference includes:net, *inputs, file_name, file_format, **kwargs.

Input parameters can be input according to specific export requirements.

Add the file name extension based on the format.

1.0.1 1.1.0
>>> from mindspore.train.quant import quant
>>>
>>> network = LeNetQuant()
>>> inputs = Tensor(np.ones([1, 1, 32, 32]), mindspore.float32)
>>> quant.export(network, inputs, file_name="lenet_quant.mindir", file_format='MINDIR')
lenet_quant.mindir
>>> from mindspore import export
>>>
>>> network = LeNetQuant()
>>> inputs = Tensor(np.ones([1, 1, 32, 32]), mindspore.float32)
>>> export(network, inputs, file_name="lenet_quant", file_format='MINDIR', quant_mode='AUTO')
lenet_quant.mindir
Dense, Conv2dBnAct, DenseBnAct, DenseQuant support setting the activation attribute as an instance of a class derived from nn.Cell or Primtive (!7581:Extension interface for dense)

activation (Union[str, Cell, Primitive]): activate function applied to the output of the fully connected layer

1.0.1 1.1.0
>>> import mindspore.nn as nn
>>>
>>> dense = nn.Dense(1, 1, activation='relu')
>>> import mindspore.nn as nn
>>> import mindspore.ops as ops
>>>
>>> dense = nn.Dense(1, 1, activation=nn.ReLU())
>>> dense = nn.Dense(1, 1, activation=ops.ReLU())
tensor.dim(), tensor.size() has been renamed to tensor.ndim, tensor.size (!10175:Add tensor.ndim and rename tensor.size() to tensor.size)

Previously, tensor.size() and tensor.dim() were used for checking the total number of elements/dimensions in the tensor.
However, from a user's perspective, tensor.size and tensor.ndim (methods -> properties) are better choices, since they follow the numpy naming convention.

1.0.1 1.1.0
>>> from mindspore import Tensor
>>>
>>> Tensor((1,2,3)).size()
>>> Tensor((1,2,3)).dim()
>>> from mindspore import Tensor
>>>
>>> Tensor((1,2,3)).size
>>> Tensor((1,2,3)).ndim
EmbeddingLookup add a config in the interface: sparse (!8202:wide&deep backward unique)

sparse (bool): Using sparse mode. When 'target' is set to 'CPU', 'sparse' has to be true. Default: True.

1.0.1 1.1.0
>>> from mindspore.nn import EmbeddingLookup
>>>
>>> input_indices = Tensor(np.array([[1, 0], [3, 2]]), mindspore.int32)
>>> result = EmbeddingLookup(4,2)(input_indices)
>>> print(result.shape)
(2, 2, 2)
>>> from mindspore.nn import EmbeddingLookup
>>>
>>> input_indices = Tensor(np.array([[1, 0], [3, 2]]), mindspore.int32)
>>> result = EmbeddingLookup(4,2)(input_indices, sparse=False)
>>> print(result.shape)
(2, 2, 2)
nn.probability.bijector change types of attributes from (int, float) to (float, list, numpy.ndarray, Tensor) (!8191:redesigned bijector broadcast shape and bijector's dtype logic)

Attributes Type change: (int, float) -> (float, list, numpy.ndarray, Tensor).
Int type is not supported anymore. Parameters of all bijectors should be type float, list, numpy.ndarray or Tensor.

1.0.1 1.1.0
>>> import mindspore.nn.probability.bijector as msb
>>>
>>> power = 2
>>> bijector = msb.PowerTransform(power=power)
>>> import mindspore.nn.probability.bijector as msb
>>>
>>> power = 2.0
>>> bijector = msb.PowerTransform(power=power)
nn.probability.bijector.GumbelCDF remove a attribute in the interface: dtype (!8191:redesigned bijector broadcast shape and bijector's dtype logic)

dtype is removed from GumbelCDF and is no longer an argument of the class.

1.0.1 1.1.0
>>> import mindspore.nn.probability.bijector as msb
>>> from mindspore import dtype as mstype
>>>
>>> bijector = msb.GumbelCDF(loc=0.0, scale=1.0, dtype=mstype.float32)
>>> import mindspore.nn.probability.bijector as msb
>>>
>>> bijector = msb.GumbelCDF(loc=0.0, scale=1.0)
nn.layer.combined.Conv2dBnAct, nn.layer.combined.DenseBnAct move from nn.layer.quant to nn.layer.combined (!8187:move Conv2dBnAct,DenseBnAct to combined.py)

Previously Conv2dBnAct and DenseBnAct are in nn.layer.quant, since they are not quant cells, now move them to nn.layer.combined. If you import Conv2dBnAct, DenseBnAct from mindspore.nn, then your code don't need any change.

1.0.1 1.1.0
>>> from mindspore.nn.layer.quant import Conv2dBnAct, DenseBnAct
>>> from mindspore.nn import Conv2dBnAct, DenseBnAct
nn.layer.conv.Conv2D, nn.layer.quant.Conv2dBnFoldQuant, nn.layer.quant.Conv2dBnWithoutFoldQuant change weight shape when group > 1 in Ascend platform (!9723:add ir passes to unify mindir)

In Ascend platform, if group > 1, the weight shape of Conv2D change from [in_channels//group, out_channels, kernel_size, kernel_size] to [out_channels, in_channels//group, kernel_size, kernel_size]. Previously used checkpoints of the networks which use Conv2D with group > 1, such as MobileNet, can not be directly used now, need to transpose the first and second axis of the weight.

C++ API

Bug fixes

FrontEnd

  • [STABLE] Fix the problem of the cse optimization in the situation of control flow. (Ascend/GPU)

Auto Parallel

  • [STABLE] Resolve the restriction: input and output layouts of Reshape are restricted in tensor redistribution. (Ascend/GPU)
  • [STABLE] Resolve the restriction: output strategy should be data parallel in model evaluation. (Ascend/GPU)

Executor

  • [STABLE] Fix fusion operator compilation cache. (Ascend)
  • [STABLE] Fix compilation error of dynamic shape operator. (Ascend)
  • [STABLE] Fix bug of pynative cannot insert transdata of node output when node should be spilted in the backend opt.(Ascend)
  • [STABLE] Fix bug of TensorMove and memcpy_async merge to one after backend cse pass (Ascend)

DataSet

  • [STABLE] Fix cache server hang on RequestFreeTag. (Ascend/GPU/CPU)
  • [STABLE] Fix hung when use pyfunc multi-processing. (Ascend/GPU/CPU)
  • [STABLE] Fix add multiple parent nodes to tree node cause core dump. (Ascend/GPU/CPU)

MindSpore Lite

Converter and runtime

  1. Support dynamic shape in MindSpore Lite Converter.
  2. Optimize sub-graph mechanism by dynamically splitting the entire graph into multiple subgraphs based on the operator supported, backend hardware and user configuration.
  3. Support TensorList and TensorList operators such as TensorListFromTensor, TensorListGetItem and so on.
  4. Support BatchMatMul fusion and LSTM fusion in MindSpore Lite Converter.
  5. Support converting model and run inference on Windows operator system.
  6. Support Model(.ms) visualization on Netron.
  7. Support Tensorflow model in MindSpore Lite Converter
  8. Add 86 converter parsers.
  9. Convert aware training model without user’s awareness
  10. Support scalar tensor in MindSpore Lite Converter and Runtime
  11. Support NPU backend on HUAWEI Kirin SoC.[BETA]
  12. Merge timeprofiler into benchmark

ARM backend optimization:

  1. Add 50+ new operators, including new Op type(like Adder, Gru).
  2. Enhanced performance on armv8.2 supported platform. For example, utilizing sdot instruction more efficiently.
  3. Optimize all operators(fp32, fp16, int8) by implementing multi-thread, SIMD tech as much as possible. Model inference time can reduce at least 20% after these optimizations.
  4. Extending to support operators for x86_64 platform based on SSE/AVX instruction set.

OpenCL backend:

  1. Add new ops: add 10+ ops, total 58 ops;
  2. Performance optimization: by memory layout optimize, Winograd Convolution select strategyoptimize, SIMT local size optimize, local cache optimize, GPU performance improvement up to 20+% vs MSLITE Version1.0
  3. Add Online Graph optimzation: by fusion Convolution/Matmul/Fullconnection and add/mul/pad/reshape, improve performance up to 50+% for some networks;
  4. Add auto tuning: by online tuning in the graph compilation phase, optimize performance up to 10%;
  5. Add weight quant: support weight quant
  6. Add opencl kernel binary cache: improve Initilization time .

Post quantization

MindSpore Lite supports both weight quantization and full quantization. Currently, Weights can be quantized into 1 ~ 16 bits according to user configuration. In internal testing, quantization of networks, such as classification, detection, segmentation and transformer are well supported. To ensure high accuracy of quantized models, MindSpore Lite uses a pipeline quantization method. In the first phase, the weight and activation value are quantized using linear quantization methods, such as MIN-MAX. In the second phase, the quantization error is analyzed, and uses statistical methods to compensate loss caused by fp32 quantization to a fixed point such as Int8 to quantized models. The features of Post-training quantization are:

  1. perchannel asymmetric quantization for weights, such as MAX_MIN and KMEANS
  2. Perlayer symmetric quantization for activation, such as KL and MAX_MIN.
  3. perlayer asymmetrical quantization for activation, such as, RemoveOutlier.
  4. accuracy loss compensation, such as BiasCorrection
mobilenet_v2 ACC (ImageNet)
FP32 71.56%
A8W8 71.16%
A8W8(without BiasCorrection) 70.74%
A8W7 71.06%
A7W7 70.78%

The above table uses the mobilenet_v2 model from TF official website. Using MindSpore Lite quantization, the precision of A8W8 (8-bit activation value quantization and 8-bit weight quantization) decreases from 0.82% to 0.4% after accuracy loss compensation, for 7-bit quantization, the precision loss is still no more than 1%.

Training on Device

Within MindSpore 1.1 release, the MindSpore Lite provides the following Training-on-Device (ToD) capabilities:

  1. Learning from scratch and Transfer Learning strategies are supported
  2. MindSpore based models can be converted and used in training on the device. (Third-party models such as TensorFlow and PyTorch for now cannot be directly imported to the framework)
  3. Grad operations are supported for more than 30 operators such as Dense layers, Convolutions and Batch Normalizations. Momentum, SGD, and ADAM optimizers are supported.
  4. Supports networks such as LeNet, Alexnet, Resnet, MobileNetV1/V2/V3, and EffectiveNet, and provides complete model loading, conversion, and Python training scripts on the device side.
    The MindSpore Lite ToD framework is already in use in the newest Huawei Smart TV, providing a unique and personalized user experience as a family entertainment center.

API Change

API Incompatible Change

C++ API
  • [Modify] Context now support multi-context configuration.(Context.h)
  • [Modify] Callback is move from lite_session.h into ms_tensor.h.
  • [Modify] GetInputsByName in lite_session.h is changed into GetInputsByTensorName
  • [Add] add static LiteSession *CreateSession(const char *model_buf, size_t size, const lite::Context *context) in lite_session.h
  • [Add] add GetErrorInfo interface returning error message in errorcode.h
  • [Delete] Remove model_generated.h, ops_generated.h and headers of FlatBuffers library from interfaces
Java API
  • [Add] Implament JNI layer and add Java api for CPU and GPU backend

Deprecations

C++ API

Deprecate Interface GetOutputsByNodeName

Bug fixes

  • [BUGFIX] Fix bug in sub-graph segmentation
  • [BUGFIX] Fix bug in Tensor getitem in which the ellipsis matches the wrong dim-size.
  • [BUGFIX] Fix bug that activation modification after defining Dense will not take effect.

Contributors

zhouyifengCode, huqi, JulyAi, damon0626, chenbo116, rmdyh, davidmc, gray0v0, doitH, Gogery, zymaa, xinyunfan,Adel, AGroupofProbiotocs, anthonyaje, anzhengqi, askmiao, baihuawei, baiyangfan, bai-yangfan, bingyaweng, BowenK, buxue, caifubi, CaoJian, caojian05, caozhou, Cathy, changzherui, chenbo116, chenfei, chengxianbin, chenhaozhe, chenjianping, chenzomi, chenzupeng, chujinjin, cj, cjh9368, Corleone, damon0626, danish, Danish, davidmc, dayschan, doitH, eric, Eric, fary86, fuzhiye, Gaoxiong, gengdongjie, Gogery, gongdaguo, gray0v0, gukecai, guoqi, gzhcv, hangq, hanhuifeng2020, Harshvardhan, He, heleiwang, hexia, Hoai, HuangBingjian, huangdongrun, huanghui, huangxinjing, huqi, huzhifeng, hwjiaorui, Jesse, jianghui58, jiangzhiwen, Jiaqi, jin-xiulang, jinyaohui, jjfeing, John, Jonathan, jonyguo, JulyAi, jzg, kai00, kingfo, kingxian, kpy, kswang, laiyongqiang, leonwanghui, Li, liangchenghui, liangzelang, lichen_101010, lichenever, lihongkang, lilei, limingqi107, ling, linqingke, liubuyu, liuwenhao4, liuxiao78, liuxiao93, liuyang_655, liuzhongkai, Lixia, lixian, liyanliu, liyong, lizhenyu, luoyang, lvchangquan, lvliang, lz, mahdi, Mahdi, maning202007, Margaret_wangrui, mayang, mengyuanli, nhussain, ougongchang, panfengfeng, panyifeng, Payne, Peilin, peixu_ren, Pengyongrong, qianlong, r1chardf1d0, riemann_penn, rmdyh, Sheng, shenwei41, simson, Simson, Su, sunsuodong, tao_yunhao, tinazhang, VectorSL, , Wan, wandongdong, wangdongxu, wangmin, wangnan39@huawei.com, wangyue01, wangzhe, wanyiming, Wei, wenchunjiang, wilfChen, WilliamLian, wsc, wukesong, wuweikang, wuxuejian, Xiaoda, xiefangqi, xinyunfan, xuanyue, xulei2020, Xun, xuyongfei, yanghaitao, yanghaitao1, yanghaoran, YangLuo, yangruoqi713, yankai, yanzhenxiang2020, yao_yf, yepei6, yeyunpeng, Yi, yoni, yoonlee666, yuchaojie, yujianfeng, yuximiao, zengzitao, Zhang, zhanghaibo5@huawei.com, zhanghuiyao, zhangyihui, zhangz0911gm, zhanke, zhanyuan, zhaodezan, zhaojichen, zhaoting, zhaozhenlong, zhengjun10, zhoufeng, zhousiyi, zhouyaqiang, zhouyifengCode, Zichun, Zirui, Ziyan, zjun, ZPaC, zymaa

Last committed message: !10877 fix Softmax problems
2020-11-03 16:23
liucunwei

Release 1.0.1

Major Features and Improvements

Bugfixes

Contributors

Thanks goes to these wonderful people:

Adel, AGroupofProbiotocs, anthonyaje, anzhengqi, askmiao, baihuawei, baiyangfan, bai-yangfan, bingyaweng, BowenK, buxue, caifubi, CaoJian, caojian05, caozhou, Cathy, changzherui, chenfei, chengxianbin, chenhaozhe, chenjianping, chenzomi, chenzupeng, chujinjin, cj, cjh9368, Corleone, danish, Danish, dayschan, eric, Eric, fary86, fuzhiye, Gaoxiong, gengdongjie, gongdaguo, gukecai, guoqi, gzhcv, hangq, hanhuifeng2020, Harshvardhan, He, heleiwang, hexia, Hoai, HuangBingjian, huangdongrun, huanghui, huangxinjing, huzhifeng, hwjiaorui, Jesse, jianghui58, jiangzhiwen, Jiaqi, jin-xiulang, jinyaohui, jjfeing, John, Jonathan, jonyguo, jzg, kai00, kingfo, kingxian, kpy, kswang, laiyongqiang, leonwanghui, Li, liangchenghui, liangzelang, lichen_101010, lichenever, lihongkang, lilei, limingqi107, ling, linqingke, liubuyu, liuwenhao4, liuxiao78, liuxiao93, liuyang_655, liuzhongkai, Lixia, lixian, liyanliu, liyong, lizhenyu, luoyang, lvchangquan, lvliang, lz, mahdi, Mahdi, maning202007, Margaret_wangrui, mayang, mengyuanli, nhussain, ougongchang, panfengfeng, panyifeng, Payne, Peilin, peixu_ren, Pengyongrong, qianlong, r1chardf1d0, riemann_penn, root, Sheng, shenwei41, simson, Simson, Su, sunsuodong, tao_yunhao, tinazhang, VectorSL, , Wan, wandongdong, wangdongxu, wangmin, wangnan39@huawei.com, wangyue01, wangzhe, wanyiming, Wei, wenchunjiang, wilfChen, WilliamLian, wsc, wukesong, wuweikang, wuxuejian, Xiaoda, xiefangqi, xuanyue, xulei2020, Xun, xuyongfei, yanghaitao, yanghaitao1, yanghaoran, YangLuo, yangruoqi713, yankai, yanzhenxiang2020, yao_yf, yepei6, yeyunpeng, Yi, yoni, yoonlee666, yuchaojie, yujianfeng, yuximiao, zengzitao, Zhang, zhanghaibo5@huawei.com, zhanghuiyao, zhangyihui, zhangz0911gm, zhanke, zhanyuan, zhaodezan, zhaojichen, zhaoting, zhaozhenlong, zhengjun10, zhoufeng, zhousiyi, zhouyaqiang, Zichun, Zirui, Ziyan, zjun, ZPaC

Contributions of any kind are welcome!

Last committed message: !7919 update lstm README
2020-09-23 20:34
liucunwei

Release 1.0.0

Major Features and Improvements

MindSpore Training and Inference Framework

Ascend 910

  • New models
    • DenseNet121: a dense convolutional neural network, which connects each layer to every other layer in a feed-forward fashion for object recognition on ImageNet dataset.
    • UNet2D-Medical: Unet Medical model for 2D image segmentation, Convolutional Networks for Biomedical Image Segmentation on ISBI Challenge database.
  • Frontend and user interface
    • Second-Order Optimization
      • Enable second-order optimization for Bert on Ascend 910, which can achieve a masked lm accuracy of 71.3% in 800 seconds using 8 Ascend 910 (Bert-Large @MLPerf v0.7 dataset).
    • New GNN model BGCF
      • Bayesian Graph Convolutional Filtering network which naturally incorporate the uncertainty in the user-item interaction graph shows excellent recommendation performance on Amazon-Beauty dataset.
    • Add append interface for SequentialCell.
    • Add a level auto for AMP.
  • Executor and performance optimization
    • Support quantitative network (Resnet50 & YoloV3 & MobileNetV2).
    • Project ease of use optimization: project compilation time optimization, CMakelist regularization, cudnn, cuda independent compilation and installation independent.
  • Data processing, augmentation, and save format
    • Support GeneratorDataset return string type

Other Hardware Support

  • GPU platform
    • Enable second-order optimization for resnet50 on GPU, which achieve 30% improvement on training time compared to SGD with Momentum (Resnet50 @ImageNet).

User interfaces change log

MindSpore Lite

  • Converter

    • Add 6 TFLite op, 7 Caffe op, 1 ONNX op.
    • Add support for Windows.
    • Support parallel inference of multiple sessions to adapt to more scenarios
    • Support 8bits only weight-quantization, most main-stream models has small accuracy loss (less than 0.5%) when compared to non-qunantized fp32 model.
  • CPU & GPU

    • Add 20 CPU ops,include FP32, int8/uint8, FP16 and int32 ops.
    • Add supporting FP16 for GPU, add 14 GPU ops include FP32/FP16.
    • Add Buffer/Image2D transform op for GPU
    • Performance optimization for CPU ops focus on ARM32.
    • Performance optimization for GPU Convolution using winograd.
  • Tool & example

    • Add object detection Android Demo.

Bugfixes

Contributors

Thanks goes to these wonderful people:

Adel, AGroupofProbiotocs, anthonyaje, anzhengqi, askmiao, baihuawei, baiyangfan, bai-yangfan, bingyaweng, BowenK, buxue, caifubi, CaoJian, caojian05, caozhou, Cathy, changzherui, chenfei, chengxianbin, chenhaozhe, chenjianping, chenzomi, chenzupeng, chujinjin, cj, cjh9368, Corleone, danish, Danish, dayschan, eric, Eric, fary86, fuzhiye, Gaoxiong, gengdongjie, gongdaguo, gukecai, guoqi, gzhcv, hangq, hanhuifeng2020, Harshvardhan, He, heleiwang, hexia, Hoai, HuangBingjian, huangdongrun, huanghui, huangxinjing, huzhifeng, hwjiaorui, Jesse, jianghui58, jiangzhiwen, Jiaqi, jin-xiulang, jinyaohui, jjfeing, John, Jonathan, jonyguo, jzg, kai00, kingfo, kingxian, kpy, kswang, laiyongqiang, leonwanghui, Li, liangchenghui, liangzelang, lichen_101010, lichenever, lihongkang, lilei, limingqi107, ling, linqingke, liubuyu, liuwenhao4, liuxiao78, liuxiao93, liuyang_655, liuzhongkai, Lixia, lixian, liyanliu, liyong, lizhenyu, luoyang, lvchangquan, lvliang, lz, mahdi, Mahdi, maning202007, Margaret_wangrui, mayang, mengyuanli, nhussain, ougongchang, panfengfeng, panyifeng, Payne, Peilin, peixu_ren, Pengyongrong, qianlong, r1chardf1d0, riemann_penn, root, Sheng, shenwei41, simson, Simson, Su, sunsuodong, tao_yunhao, tinazhang, VectorSL, , Wan, wandongdong, wangdongxu, wangmin, wangnan39@huawei.com, wangyue01, wangzhe, wanyiming, Wei, wenchunjiang, wilfChen, WilliamLian, wsc, wukesong, wuweikang, wuxuejian, Xiaoda, xiefangqi, xuanyue, xulei2020, Xun, xuyongfei, yanghaitao, yanghaitao1, yanghaoran, YangLuo, yangruoqi713, yankai, yanzhenxiang2020, yao_yf, yepei6, yeyunpeng, Yi, yoni, yoonlee666, yuchaojie, yujianfeng, yuximiao, zengzitao, Zhang, zhanghaibo5@huawei.com, zhanghuiyao, zhangyihui, zhangz0911gm, zhanke, zhanyuan, zhaodezan, zhaojichen, zhaoting, zhaozhenlong, zhengjun10, zhoufeng, zhousiyi, zhouyaqiang, Zichun, Zirui, Ziyan, zjun, ZPaC

Contributions of any kind are welcome!

2020-08-31 17:24
6560119 panza 1584156773 zhunaipan

Release 0.7.0-beta

Major Features and Improvements

MindSpore Training and Inference Framework

Ascend 910

  • New models
    • TinyBert: a smaller and faster version of BERT using transformer distillation for natural language understanding on GLUE benchmark.
    • SE-ResNet50: add Squeeze-and-Excitation blocks(SE-Blocks) to the resnet50 network to improve channel interdependencies for image classification on ImageNet 2012 dataset.
    • Inception V3: the third version of Inception convolutional architectures for image classification on ImageNet 2012 dataset.
  • Frontend and user interface
    • Embedding operator high-level packaging to support segmented by field for Wide&Deep.
    • Load multi-node checkpoint into single-process to support host-device hybrid inference.
    • Support Concat/Tile/Strideslice distributed operators.
    • Support cumulative gradient and batch training split.
    • Support variable parameter input for Cell object.
    • Parameter mixed calculation optimization for pynative mode.
    • Deep Probabilistic Programming
      • Support statistical distributions classes used to generate stochastic tensors.
      • Support probabilistic inference algorithms.
      • Support BNN layers used to construct BNN in Graph mode.
      • Support interfaces for the transformation between BNN and DNN in Graph mode.
      • Support uncertainty estimation to estimate epistemic uncertainty and aleatoric uncertainty.
    • User interfaces change log
  • Executor and performance optimization
    • Minspore graph compilation process performance improved by 20%.
    • Decoupling C++ and Python modules to achieve separate compilation of core modules.
  • Data processing, augmentation, and save format
    • Support automatic data augmentation
    • Support GNN distributed cache in single node
    • Support ConcatDataset using distributed sampler

Other Hardware Support

  • GPU platform
    • New model supported: VGG16, ResNet101, DeepFM.
    • Support some distributed operators in ResNet50 and Wide&Deep.
    • Support automatic parallel for Wide&Deep.
    • Support function funcsi (such as switch-case).
    • Support distributed training with parameter server.
    • Support GPU operator profiling.
    • Performance optimization of the distributed training with allreduce.
    • Performance optimization of the mixed precision training.
    • Performance optimization of the pynative mode.
    • Performance optimization of the convolution operator, batch normalization operator.
  • CPU platform
    • Support MobileNetV2 Re-Training: Re-train the network with different class number.

MindSpore Lite

  • Converter
    • Support third party model, including TFLite/Caffe/ONNX.
    • Add 93 TFLite op.
    • Add 24 Caffe op.
    • Add 62 ONNX op.
    • Add 11 optimized passes, include fusion/const fold.
    • Support aware-training and Post-training quantization.
  • CPU
    • Add 100+ops,support fp32, int8/uint8, FP16 ops
    • Support fast convolution algorithms: Sliding Window, Img2col + Gemm, Strassen, Winograd
    • Support assembly/neon instruction.
    • Support CPU fp16 and sdot on ARM v8.2+.
  • GPU
    • Add 20+ ops for OpenCL.
    • Support image2D/buffer format.
    • Optimize online initialization time.
    • add optimized convolution1X1/3X3/depthwise/convolution_transposed for OpenCL.
  • Tool & example
    • Add benchmark and TimeProfile tools.
    • Add image classification Android Demo.

Bugfixes

Contributors

Thanks goes to these wonderful people:

Adel, Alexey, andy, andy_wangrui, anthonyaje, anzhengqi, askmiao, avakh, baihuawei, bingyaweng, BowenK, buxue, caifubi, CaoJian, caozhou, Cathy, changzherui, chenfei, chengxianbin, chenhaozhe, chenjianping, chentingting, chenzomi, chenzupeng, chujinjin, cjh9368, Corleone, cristoval, danish, dengyutao, eric, Eric, ervinzhang, etone-chan, fangzehua, fary86, fuzhiye, gengdongjie, genglishuai, Giancarlo, gongdaguo, gukecai, guohongzilong, GuoMengHao, hangq, hanhaocheng, hanhuifeng2020, hanjun996, Harshvardhan, He, heleiwang, hesham, hexia, Hoai, hongxing, huangdongrun, huanghui, huangxinjing, islam_amin, Jesse, jianghui58, jiangzhiwen, jin-xiulang, jinyaohui, jjfeing, John, Jonathan, jonyguo, kai00, kingfo, kpy, kswang, laiyongqiang, leilei_snow, leopz, Li, liangzelang, lianliguang, lichen_101010, lichenever, lihongkang, lilei, limingqi107, ling, lingyunli63, linqingke, lirongzhen1, liubuyu, liuwenhao4, liuxiao78, liuxiao93, liuzhongkai, Lixia, lixian, liyong, lizhenyu, looop5, luoyang, lvchangquan, lvliang, lvwenyuan, lyvette, mahdi, Mahdi, mamba_ni, maning202007, Margaret_wangrui, mayang, meixiaowei, meng_chunyang, ms_yan, nhussain, panbingao, panfengfeng, panyifeng, Payne, Peilin, peixu_ren, pengyongrong, Pengyongrong, qianlong, qujianwei, root, shenwei41, shibeiji, simson, songhonglei413, Su, sunsuodong, suteng, tao_yunhao, TFbunny, tinazhang, tom__chen, tony_liu2, tronzhang, VectorSL, wandongdong, wangdongxu, wanghua, wangmin, wangshaocong, wangzhe, wanyiming, Wei, wenchunjiang, wilfChen, WilliamLian, wsc, wukesong, wuweikang, wuxuejian, wuyongkang, xiefangqi, xuanyue, Xun, xutianchun, xuyongfei, yanghaitao, yangjie159, YangLuo, yangruoqi713, yangyongjie, yangzhenzhang, yankai, yao_yf, yelihua, yeyunpeng, Yi, yoni, yoonlee666, yuchaojie, yujianfeng, yuximiao, zhangxuetong, zhaizhiqiang, Zhang, zhangxinfeng3, zhangxuetong, zhangyihui, zhangz0911gm, zhanke, zhanyuan, zhaodezan, zhaoting, zhaozhenlong, zhengjun10, zhongligeng, zhoufeng, zhousiyi, zhouyaqiang, zhouyuanshen, Zichun, Zirui, zjun, zongha, ZPaC, lijiaqi, liangchenghui, wangminggui

Contributions of any kind are welcome!

Last committed message: !5561 Fix C++ coding standard problem
2020-07-31 19:01
lujiale

Release 0.6.0-beta

Major Features and Improvements

Ascend 910 Training and Inference Framework

  • New models
    • There are official, research and community under modelzoo.
      • Official is maintained with the newest APIs by MindSpore team, MaskRCNN are added.
      • Research is uploaded by researchers for official review, and APIs may not be updated in time.
      • Community reprints the relevant links of partner research results.
    • Hub added on the same level as modelzoo, synchronous storage of materials needed for official hub web pages which will be launched soon.
    • Support pre-trained models, few lines of code can be used to download and load pre-trained models, supporting inference or transfer learning.
  • Frontend and user interface
  • Executor and performance optimization
    • Decouple C++ and python, so make the architecture more extensible.
    • Parameter Server for distributed deep learning supported.
    • Serving:a flexible service deployment framework for deep learning models.
    • Memory reuse is enhanced, and the batch size of Bert large model is increased from 96 to 160 on a single server.
  • Data processing, augmentation, and save format
    • Support MindRecord save operator after date processing
    • Support automatic fusion operator, such as decode/resize/crop
    • Support CSV dataset loading

Other Hardware Support

  • GPU platform
    • New model supported: ResNext50, WarpCTC and GoogLeNet.
    • Support hyperparametric search and data enhanced automl on GPU.
    • Support Resnet50 automatic parallel in GPU backend.

Bugfixes

Contributors

Thanks goes to these wonderful people:

Alexey Shevlyakov, avakh, baihuawei, BowenK, buxue, caifubi, caojian05, Cathy Wong, changzherui, chenfei, chengxianbin, chenhaozhe, chenjianping, chentingting, chenzomi, chujinjin, Danish Farid, dayschan, dengwentao, dinghao, etone-chan, fangzehua, fary86, geekun, Giancarlo Colmenares, gong chen, gukecai, guohongzilong, hangangqiang, heleiwang, hesham, He Wei, hexia, hongxing, huangdongrun, huanghui, islam_amin, Jamie Nisbet, Jesse Lee, jiangjinsheng, jiangzhiwen, jinyaohui, jjfeing, jojobugfree, Jonathan Yan, jonyguo, Junhan Hu, Kang, kingfo, kouzhenzhong, kpy, kswang, laiyongqiang, leopz, liangzelang, lichenever, lihongkang, Li Hongzhang, lilei, limingqi107, lirongzhen1, liubuyu, liuchongming74, liuwenhao4, liuxiao, Lixia Chen, liyanliu, liyong, lizhenyu, lvliang, Mahdi, Margaret_wangrui, meixiaowei, ms_yan, nhussain, ougongchang, panfengfeng, panyifeng, peilinwang, Peilin Wang, pkuliuliu, qianlong, rick_sanchez, shibeiji, Shida He, shijianning, simson, sunsuodong, suteng, Tinazhang, Tron Zhang, unknown, VectorSL, wandongdong, wangcong, wangdongxu, wangdongxu6, wanghua, wangnan39, Wei Luning, wenchunjiang, wenkai, wilfChen, WilliamLian, wukesong, Xian Weizhao, Xiaoda Zhang, xiefangqi, xulei2020, xunxue, xutianchun, Yang, yanghaitao, yanghaitao1, yanghaoran, yangjie, yangjie159, YangLuo, Yanjun Peng, yankai, yanzhenxiang2020, yao_yf, Yi Huaijie, yoonlee666, yuchaojie, yujianfeng, zhangzhongpeng, zhangdengcheng, Zhang Qinghua, zhangyinxia, zhangz0911gm, zhaojichen, zhaoting, zhaozhenlong, zhoufeng, zhouneng, zhousiyi, Zirui Wu, Ziyan, zjun, ZPaC, lihongzhang, wangdongxu

Contributions of any kind are welcome!

2020-06-30 19:26
liucunwei

Release 0.5.0-beta

Major Features and Improvements

Ascend 910 Training and Inference Framework

  • New models
    • ResNext50: a simple, highly modularized network architecture using aggregated resdiual transformations for image classification on ImageNet 2012 dataset.
    • MASS: a pre-training method for sequence to sequence based language generation tasks on Text Summarization and Conversational Response Generation using News Crawls 2007-2017 dataset, Gigaword corpus and Cornell movie dialog corpus.
    • Transformer: a neural network architecture for language understanding on WMT 2014 English-German dataset.
    • GCN:Graph Convolutional Networks for the task of classification of nodes in a graph on Cora and Citeseer datasets.
    • GAT:an attention-based graph neural network for node classification on Cora and CiteSeer dataset.
  • Frontend and user interface
  • Executor and performance optimization
    • Heterogeneous execution on CPU and Ascend devices supported, and is verified in Wide&Deep model.
    • Quantitative training of MobileNetV2, Lenet and Resnet50 on Ascend-910 are supported.
    • Support new fusion architecture, which can do fusion optimization across graphs and kernels to improve execution speed.
  • Data processing, augmentation, and save format
    • Support data processing pipeline performance profiling.
    • Support public dataset loading, such as CLUE and Coco.
    • Support more text processing, such as more tokenizers and vocab data.
    • Support MindRecord padded data.

Other Hardware Support

  • GPU platform
    • New model supported: Bert / Wide&Deep.
    • Support setting max device memory.
  • CPU platform
    • New model supported: LSTM.

Bugfixes

Contributors

Thanks goes to these wonderful people:

Alexey Shevlyakov, avakh, baihuawei, BowenK, buxue, caifubi, caojian05, Cathy Wong, changzherui, chenfei, chengxianbin, chenhaozhe, chenjianping, chentingting, chenzomi, chujinjin, Danish Farid, dayschan, dengwentao, dinghao, etone-chan, fangzehua, fary86, geekun, Giancarlo Colmenares, gong chen, gukecai, guohongzilong, hangangqiang, heleiwang, hesham, He Wei, hexia, hongxing, huangdongrun, huanghui, islam_amin, Jamie Nisbet, Jesse Lee, jiangjinsheng, jiangzhiwen, jinyaohui, jjfeing, jojobugfree, Jonathan Yan, jonyguo, Junhan Hu, Kang, kingfo, kouzhenzhong, kpy, kswang, laiyongqiang, leopz, liangzelang, lichenever, lihongkang, Li Hongzhang, lilei, limingqi107, lirongzhen1, liubuyu, liuchongming74, liuwenhao4, liuxiao, Lixia Chen, liyanliu, liyong, lizhenyu, lvliang, Mahdi, Margaret_wangrui, meixiaowei, ms_yan, nhussain, ougongchang, panfengfeng, panyifeng, peilinwang, Peilin Wang, pkuliuliu, qianlong, rick_sanchez, shibeiji, Shida He, shijianning, simson, sunsuodong, suteng, Tinazhang, Tron Zhang, unknown, VectorSL, wandongdong, wangcong, wangdongxu, wangdongxu6, wanghua, wangnan39, Wei Luning, wenchunjiang, wenkai, wilfChen, WilliamLian, wukesong, Xian Weizhao, Xiaoda Zhang, xiefangqi, xulei2020, xunxue, xutianchun, Yang, yanghaitao, yanghaitao1, yanghaoran, yangjie, yangjie159, YangLuo, Yanjun Peng, yankai, yanzhenxiang2020, yao_yf, Yi Huaijie, yoonlee666, yuchaojie, yujianfeng, zhangzhongpeng, zhangdengcheng, Zhang Qinghua, zhangyinxia, zhangz0911gm, zhaojichen, zhaoting, zhaozhenlong, zhoufeng, zhouneng, zhousiyi, Zirui Wu, Ziyan, zjun, ZPaC, lihongzhang, wangdongxu

Contributions of any kind are welcome!

2020-07-09 09:37
6560119 panza 1584156773 zhunaipan

Release 0.3.1-alpha

Major Features and Improvements

Ascend 910 Training and Inference Framework

  • Frontend and User Interface
    • Independent model init interface.
  • Data processing, augmentation, and save format
    • Support sample padding for minddataset.

Bugfixes

Last committed message: !2606 Update version to 0.3.1
2020-05-31 11:24
6560119 panza 1584156773 zhunaipan

Release 0.3.0-alpha

Major Features and Improvements

Ascend 910 Training and Inference Framework

  • New models
    • DeepFM: a factorization-machine based neural network for CTR prediction on Criteo dataset.
    • DeepLabV3: significantly improves over our previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2007 semantic image segmentation benchmark.
    • Faster-RCNN: towards real-time object detection with region proposal networks on COCO 2017 dataset.
    • SSD: a single stage object detection methods on COCO 2017 dataset.
    • GoogLeNet: a deep convolutional neural network architecture codenamed Inception V1 for classification and detection on CIFAR-10 dataset.
    • Wide&Deep: jointly trained wide linear models and deep neural networks for recommender systems on Criteo dataset.
  • Frontend and User Interface
  • Executor and Performance Optimization
    • Support doing evaluation while in training process, so that the accuracy of training can be easily obtained.
    • Enable second-order optimization for resnet50, which can achieve 75.9% accuracy in 45 epochs (Resnet50 @ImageNet).
    • Optimize pynative implementation and improve it's execution performance.
    • Optimize summary record implementation and improve its performance.
  • Data processing, augmentation, and save format
    • Support simple text processing, such as tokenizer/buildvocab/lookup.
    • Support padding batch.
    • Support split or concat dataset.
    • Support MindDataset reading from file list.

Other Hardware Support

  • GPU platform
    • New models supported: MobileNetV2, MobileNetV3.
    • Support mixed precision training.
    • Support device memory swapping.

Bugfixes

Contributors

Thanks goes to these wonderful people:

Alexey Shevlyakov, Amir Lashkari, anthony, baihuawei, biffex, buxue, caifubi, candanzg, caojian05, Cathy Wong, changzherui, chenfei, chengxianbin, chenhaozhe, chenzomi, chujinjin, cristoval, dengwentao, eric, etone-chan, fary86, gaojing, gengdongjie, gongchen, guohongzilong, guozhijian, heleiwang, hesham, He Wei, Hoai Linh Tran h00472437, hongxing, huangdongrun, huanghui, Jamie Nisbet, Jesse Lee, jiangjinsheng, jiangzhiwen, jinyaohui, jjfeing, jonwe, jonyguo, Junhan Hu, Kang, kingfo, kswang, laiyongqiang, leopz, lichenever, lihongkang, limingqi107, liubuyu, liuliyan2, liuwenhao4, liuxiao, liuxiao, liyong, lizhenyu, lvliang, Margaret_wangrui, meixiaowei, ms_yan, Nat Sutyanyong, ougongchang, panfengfeng, panyifeng, Peilin Wang, peixu_ren, qianlong, rick_sanchez, seatea, sheng, shijianning, simson, sunsuodong, Tinazhang, VectorSL, wandongdong, wangcong, wanghua, wangnan39, Wei Luning, wenchunjiang, wilfChen, WilliamLian, wsc, wukesong, wuxuejian, Xiaoda Zhang, xiefangqi, xulei2020, Yang, yangjie159, yangruoqi713, yangyongjie, yangzhenzhang, Yanjun Peng, yanzhenxiang2020, yao_yf, Yi Huaijie, yoonlee666, yujianfeng, YuJianfeng, yvetteliu, z00478463, zhangdengcheng, Zhang Qinghua, zhangz0911gm, zhaojichen, zhaoting, zhaozhenlong, zhoufeng, zhouneng, zhousiyi, zhouyuanshen, Zirui Wu, Ziyan, zjun, ZPaC, lihongzhang

Contributions of any kind are welcome!

Last committed message: !1733 change some settings in SSD
2020-04-30 15:27
6560244 majorzhang 1584157623 zhangzhenghai

Release 0.2.0-alpha

Major Features and Improvements

Ascend 910 Training and Inference Framework

Other Hardware Support

  • GPU platform
    • Use dynamic memory pool by default on GPU.
    • Support parallel execution of computation and communication.
    • Support continuous address allocation by memory pool.
  • CPU platform
    • Support for windows 10 OS.

Bugfixes

Contributors

Thanks goes to these wonderful people:

Alexey_Shevlyakov, Cathy, Chong, Hoai, Jonathan, Junhan, JunhanHu, Peilin, SanjayChan, StrawNoBerry, VectorSL, Wei, WeibiaoYu, Xiaoda, Yanjun, YuJianfeng, ZPaC, Zhang, ZhangQinghua, ZiruiWu, amongo, anthonyaje, anzhengqi, biffex, caifubi, candanzg, caojian05, casgj, cathwong, ch-l, chang, changzherui, chenfei, chengang, chenhaozhe, chenjianping, chentingting, chenzomi, chujinjin, dengwentao, dinghao, fanglei, fary86, flywind, gaojing, geekun, gengdongjie, ghzl, gong, gongchen, gukecai, guohongzilong, guozhijian, gziyan, h.farahat, hesham, huangdongrun, huanghui, jiangzhiwen, jinyaohui, jjfeing, jojobugfree, jonathan_yan, jonyguo, jzw, kingfo, kisnwang, laiyongqiang, leonwanghui, lianliguang, lichen, lichenever, limingqi107, liubuyu, liuxiao, liyong, liyong126, lizhenyu, lupengcheng, lvliang, maoweiyong, ms_yan, mxm, ougongchang, panfengfeng, panyifeng, pengyanjun, penn, qianlong, seatea, simson, suteng, thlinh, vlne-v1, wangchengke, wanghua, wangnan39, wangqiuliang, wenchunjiang, wenkai, wukesong, xiefangqi, xulei, yanghaitao, yanghaoran, yangjie159, yangzhenzhang, yankai10, yanzhenxiang2020, yao_yf, yoonlee666, zhangbuxue, zhangz0911gm, zhangzheng, zhaojichen, zhaoting, zhaozhenlong, zhongligeng, zhoufeng, zhousiyi, zjun, zyli2020, yuhuijun, limingqi107, lizhenyu, chenweifeng.

Contributions of any kind are welcome!

2020-03-27 21:13
6560119 panza 1584156773 zhunaipan

Release 0.1.0-alpha

Main Features

Ascend 910 Training and Inference Framework

  • Recommended OS: Ubuntu 16.04 (or later) or EulerOS 2.5 or EulerOS 2.8
  • Python version: 3.7.5
  • Preset models
    • ResNet-50: residual structure-based convolutional neural network (CNN) for image classification, which is widely used.
    • AlexNet: classic CNN for image classification, achieving historical results in ImageNet LSVRC-2012.
    • LeNet: classic CNN for image classification, which was proposed by Yann LeCun.
    • VGG16: classic CNN for image classification, which was proposed by Oxford Visual Geometry Group.
    • YoloV3: real-time object detection network.
    • NEZHA: BERT-based Chinese pre-training network produced by Huawei Noah's Ark Laboratory.
  • Execution modes
    • Graph mode: provides graph optimization methods such as memory overcommitment, IR fusion, and buffer fusion to achieve optimal execution performance.
    • PyNative mode: single-step execution mode, facilitating process debugging.
  • Debugging capability and methods
    • Save CheckPoints and Summary data during training.
    • Support asynchronous printing.
    • Dump the computing data.
    • Support profiling analysis of the execution process performance.
  • Distributed execution
    • Support AllReduce, AllGather, and BroadCast collective communication.
    • AllReduce data parallel: Each device obtains different training data, which accelerates the overall training process.
    • Collective communication-based layerwise parallel: Models are divided and allocated to different devices to solve the problem of insufficient memory for large model processing and improve the training speed.
    • Automatic parallel mode: The better data and model parallel mode can be predicted based on the cost model. It is recommended that this mode be used on ResNet series networks.
  • Automatic differentiation
    • Implement automatic differentiation based on Source to Source.
    • Support distributed scenarios and automatic insertion of reverse communication operators.
  • Data processing, augmentation, and save format
    • Load common datasets such as ImageNet, MNIST, CIFAR-10, and CIFAR-100.
    • Support common data loading pipeline operations, such as shuffle, repeat, batch, map, and sampler.
    • Provide basic operator libraries to cover common CV scenarios.
    • Support users to customize Python data augmentation operators through the Pyfunc mechanism.
    • Support the access of user-defined datasets through the GeneratorDataset mechanism.
    • Provide the MindSpore data format, data aggregation and storage, random access example, data partition, efficient parallel read, user-defined index, and dataset search.
    • Convert user datasets to the MindSpore data format.
    • After data processing and augmentation, provide training applications in feed and graph modes.
  • FP32/16 mixed precision computation, supporting automatic and manual configuration
  • Provide common operators such as nn, math, and array, which can be customized.

Inference Deployment

  • Deploy models in MindSpore format on the Ascend 310 platform for inference.
  • Save models in ONNX format.
  • Support saving models in LITE format and running models based on the lightweight inference framework.
    • Recommended OS: Android 4.3 or later
    • Supported network type: LeNet
    • Provide the generalization operators generated by TVM and operators generated after specific networks are tuned.

Other Hardware Support

  • GPU platform training
    • Recommended OS: Ubuntu 16.04
    • CUDA version: 9.2 or 10.1
    • CuDNN version: 7.6 or later
    • Python version: 3.7.5
    • NCCL version: 2.4.8-1
    • OpenMPI version: 3.1.5
    • Supported models: AlexNet, LeNet, and LSTM
    • Supported datasets: MNIST and CIFAR-10
    • Support data parallel.
  • CPU platform training
    • Recommended OS: Ubuntu 16.04
    • Python version: 3.7.5
    • Supported model: LeNet
    • Supported dataset: MNIST
    • Provide only the stand-alone operation version.

Peripherals and Tools

Last committed message: initial version
Python
1
https://gitee.com/mindspore/mindspore.git
git@gitee.com:mindspore/mindspore.git
mindspore
mindspore
mindspore

Search

122604 9befe709 551147 122411 94cd1624 551147