1.5K Star 5.5K Fork 1.9K

GVPMindSpore / mindspore

2021-04-17 16:04
lilongfei

MindSpore 1.2.0

MindSpore 1.2.0 Release Notes

Major Features and Improvements

NewModels

  • [STABLE] Add CV models on Ascend: 3D Unet, Unet++, SSD-Resnet50-fpn, SSD-VGG16, crnn_seq2seq_ocr for BSI, CTPN, resnet18, DPN
  • [STABLE] Add CV models on GPU: Faster-RCNN
  • [STABLE] Add NLP models on Ascend: NAML, Fasttext, GRU, LSTM
  • [BETA] Add TPRR: Thinking Path Re-Ranker, an original ranked-base framework for Multi-Hop Question Answering which has won the first place in HotpotQA leaderboard.(Ascend)

FrontEnd

  • [STABLE] Support side effects expression to ensure that the perform order of user's semantics is correct.(Ascend/GPU/CPU)
  • [STABLE] Support calculating the gradient for network that contain non-Tensor input parameters(int, float, bool, mstype,int, mstype.float, mstype.uint, mstype.bool_, tuple, list, dict).(Ascend/GPU/CPU)
  • [STABLE] Support the inverse of a bool Tensor.(Ascend/GPU/CPU)
  • [STABLE] Uniform the interface isinstance.(Ascend/GPU/CPU)
  • [STABLE] Support negative indexes.(Ascend/GPU/CPU)
  • [STABLE] Support 110+ Numpy-like interfaces in mindspore.numpy.(Ascend/GPU/CPU)
  • [STABLE] Support export/load mindir model with a size greater than 2 GB.
  • [STABLE] The optimizer supports gradient centralization.(Ascend)
  • [STABLE] Support support auc metric, rou metric, bleu score metric, confusion matrix metric, cosine similarity metric, dice metric, hausdorff distance metric, occlusion sensitivity metric, perplexity metric, mean surface distance metric, root mean surface distance metric.
  • [STABLE] Support use EmbeddingLookup with cache.(Ascend)

Auto Parallel

  • [STABLE] Support AllGather and ReduceScatter fusion.(Ascend)
  • [STABLE] Support gradient accumulation feature in auto parallel mode.(Ascend/GPU)
  • [STABLE] Support running parallel optimizer with gradient accumulation.(Ascend)
  • [STABLE] Add the configuration of communication operators' fusion.(Ascend)

Executor

  • [STABLE] Support inference with Nvidia GPU.
  • [STABLE] Support data parallelism in PyNative mode.(Ascend/GPU)
  • [STABLE] Optimize LSTM inference memory consumption in Graph mode with CPU.

Sponge

  • [STABLE] Add SPONGE modules for molecular dynamics simulation, including Bond, Angle, Dihedral, Non Bond 14, NeighborList, Particle Mesh Ewald, Langevin MD and LIUJIAN MD.(GPU)

DataSet

  • [STABLE] If the libnuma library is installed in the environment, you can run export DATASET_ENABLE_NUMA=True to configure NUMA binding. In multi-card training scenarios, the training data processing speed can be improved, thereby improving the network training efficiency.
  • [STABLE] Unify API Tensor structure of Training/Inference interfaces in C++ SDK.
  • [STABLE] Optimize duplicated Decode in data preprocess using cache, improve preprocess efficiency.
  • [STABLE] Support eager mode to run data augmentation in Python & C++.
  • [STABLE] Support more data augmentation operators(e.g. Affine, Perspective) in MindSpore-Lite.
  • [STABLE] Support light pipeline to process MindData in MindSpore-Lite training.
  • [STABLE] Support more data preprossing operators based on DVPP hardware module and can be used on on Ascend310 platform.
  • [STABLE] Support copy-free property for data in Ascend310 inference process scenarios.

Running Data Recorder

  • [STABLE] Support running data recorder (RDR) for exception demarcation.
  • [STABLE] Provide records of multi-stage computational graphs, memory allocation information, graph execution order, stream execution order and task debug information when a "run task error" or "distribute task failed" occurs. (Ascend)
  • [STABLE] Provide records of multi-stage computational graphs, memory allocation information and graph execution order when a "SyncStream error" occurs. (GPU)

3D Feature

  • [STABLE] Support 3D ops: Conv3D, Conv3DBackpropInput, Conv3DBackpropFilter, Conv3DTranspose, BiasAdd, BiasAddGrad, PReLU, Transpose, Reshape, transdata, StrideSlice, MaxPool3D, MaxPool3DGrad, BinaryCrossEntropy, SigmoidCrossEntropyWithLogits, SigmoidCrossEntropyWithLogitsGrad, SoftmaxCrossEntropyWithLogits, SigmoidCrossEntropyWithLogits, SigmoidCrossEntropyWithLogitsGrad, BatchNorm3d, BatchNorm3dGrad, Dropout3d.
  • [STABLE] Support RMSELoss loss function, MAELoss loss function, FocalLoss loss function, DiceLoss binary loss function, and MultiClassDiceLoss multi-type loss function for 2D/3D network.
  • [STABLE] Add optimizer: AdamApplyOne(3D), ApplyMomentum(3D), SGD(3D).

API Change

Backwards Incompatible Change

Python API
mindspore.numpy.array(), mindspore.numpy.asarray(), mindspore.numpy.asfarray(), mindspore.numpy.copy() now support GRAPH mode, but cannot accept numpy.ndarray as input arguments anymore(!12726:Add March Numpy interfaces to mindspore)

Previously, these interfaces can accept numpy.ndarray as arguments and convert numpy.ndarray to Tensor, but cannot be used in GRAPH mode.
However, currently MindSpore Parser cannot parse numpy.ndarray in JIT-graph. To support these interfaces in graph mode, we have to remove numpy.ndarray support. With that being said, users can still use Tensor to convert numpy.ndarray to tensors.

1.1.1 1.2.0
>>> import mindspore.numpy as mnp
>>> import numpy
>>>
>>> nd_array = numpy.array([1,2,3])
>>> tensor = mnp.asarray(nd_array) # this line cannot be parsed in GRAPH mode
>>> import mindspore.numpy as mnp
>>> import numpy
>>>
>>> tensor = mnp.asarray([1,2,3]) # this line can be parsed in GRAPH mode
mindspore.numpy interfaces remove support for keyword arguments out and where(!12726:Add March Numpy interfaces to mindspore)

Previously, we have incomplete support for keyword arguments out and where in mindspore.numpy interfaces, however, the out argument is only functional when where argument is also provided, and out cannot be used to pass reference to numpy functions. Therefore, we have removed these two arguments to avoid any confusion users may have. Their original functionality can be found in np.where

1.1.1 1.2.0
>>> import mindspore.numpy as np
>>>
>>> a = np.ones((3,3))
>>> b = np.ones((3,3))
>>> out = np.zeros((3,3))
>>> where = np.asarray([[True, False, True],[False, False, True],[True, True, True]])
>>> res = np.add(a, b, out=out, where=where) # `out` cannot be used as a reference, therefore it is misleading
>>> import mindspore.numpy as np
>>>
>>> a = np.ones((3,3))
>>> b = np.ones((3,3))
>>> out = np.zeros((3,3))
>>> where = np.asarray([[True, False, True],[False, False, True],[True, True, True]])
>>> res = np.add(a, b)
>>> out = np.where(where, x=res, y=out) # instead of np.add(a, b, out=out, where=where)
Turn ops.MakeRefKey into an internal interface (!12010:Convert MakeRefKey to an internal interface)

Previously MakeRefKey is an external interface that is not used, now make it an internal interface with the same usage. We do not recommend users to use this interface, and we will remove the relevant introduction of this interface from the official website.

ops.ApplyFtrl, ops.ApplyMomentum, ops.ApplyRMSProp, ops.ApplyCenteredRMSProp change the output on Ascend backend from multiple to a single. (!11895:unify mindir for different backend: the output num of optimizer ops, the backward of concat)

Previously the number of outputs of these operator is different on different backends. To unify their definition we change their output on Ascend backend from multiple to a single.

P.FusedBatchNorm, P.FusedBatchNormEx deleted (!12115:IR operators of GPU and CPU are unified as batchnorm)

The FusedBatchNorm and FusedBatchNormEx interface has been deleted. Please use the batchnorm operator to replace it.

MetaTensor deleted (!10325:modify MetaTensor and Tensor)

The MetaTensor interface has been deleted. The function of MetaTensor has been integrated into tensor.

ControlDepend is deleted, use Depend instead. The decorator @C.add_flags(has_effect=True) does not work. (!13793:remove control_depend from py file)

Previously, we used ControlDepend to control the execution order of multiple operators. In version 1.2.0, mindspore introduces the auto-monad side effects expression to ensure that the perform order of user's semantics is correct. Therefore, ControlDepend is deleted and Depend is recommended.

In most scenarios, if operators have IO side effects (such as print) or memory side effects (such as assign), they will be executed according to the user's semantics. In some scenarios, if the two operators A and B have no order dependency, and A must be executed before B, we recommend using Depend to specify their execution order. See the API documentation of the Depend operator for specific usage.

1.1.1 1.2.0
    In some side-effect scenarios, we need to ensure the execution order of operators.
    In order to ensure that operator A is executed before operator B, it is recommended
    to insert the Depend operator between operators A and B.

    Previously, the ControlDepend operator was used to control the execution order.
    Since the ControlDepend operator is deprecated from version 1.1, it is recommended
    to use the Depend operator instead. The replacement method is as follows::

        a = A(x)                --->        a = A(x)
        b = B(y)                --->        y = Depend(y, a)
        ControlDepend(a, b)     --->        b = B(y)
    In most scenarios, if operators have IO side effects or memory side effects,
    they will be executed according to the user's semantics. In some scenarios,
    if the two operators A and B have no order dependency, and A must be executed
    before B, we recommend using Depend to specify their execution order. The
    usage method is as follows::

        a = A(x)                --->        a = A(x)
        b = B(y)                --->        y = Depend(y, a)
                                --->        b = B(y)

After the introduction of the auto-monad side effect expression feature, the decorator @C.add_flags(has_effect=True) does not work. If the decorator is used in the script, please modify. Take the overflow identification operator (without side effects) as an example, the modification method is as follows:

1.1.1 1.2.0
@C.add_flags(has_effect=True)
def construct(self, *inputs):
    ...
    loss = self.network(*inputs)
    init = self.allo_status()
    self.clear_status(init)
    ...
def construct(self, *inputs):
    ...
    loss = self.network(*inputs)
    init = self.allo_status()
    init = F.depend(init, loss)
    clear_status = self.clear_status(init)
    ...
C++ API
C++ API support dual ABI now.(!12432:api support dual abi )

1.1.1 supports only the old ABI. Currently, both the new and the old are supported.

1.1.1 1.2.0
add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0)
add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0)  # old ABI are supported
add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=1)  # new ABI are supprrted, too
                                                   # write nothing, use new ABI as default
Context refactor.(!13515:cpp api modify)

The Context class is refactored. For details, see the API docs.

1.1.1 1.2.0
GlobalContext::SetGlobalDeviceTarget(kDeviceTypeAscend310);       // set device target is ascend310
GlobalContext::SetGlobalDeviceID(0);                              // set device id is 0
auto model_context = std::make_shared<ModelContext>();            // create a model context
ModelContext::SetInsertOpConfigPath(model_context, "./aipp.cfg")  // set aipp config file is ./aipp.cfg
auto model_context = std::make_shared<Context>();                 // create a model context
auto ascend310_info = std::make_shared<Ascend310DeviceInfo>();
model_context.MutableDeviceInfo().push_back(ascend310_info );     // set device target is ascend310
ascend310_info->SetDeviceID(0);                                   // set device id is 0
ascend310_info->SetInsertOpConfigPath("./aipp.cfg");              // set aipp config file is ./aipp.cfg
LoadModel interface changes.(!13515:cpp api modify)

LoadModel is renamed Load. No exception is thrown new but the return status should be checked.

1.1.1 1.2.0
try {
  auto graph = Serialization::LoadModel(model_file_path, kMindIR);
} catch (...) { ... }
Graph graph;
auto ret = Serialization::Load(model_file_path, kMindIR, &graph);
if (ret != kSuccess) { ... }
Model ctor changes.(!13515:cpp api modify)

Model uses a non-parameter ctor now, and arguments are passed in through Build.

1.1.1 1.2.0
Model net(net_cell, model_context);
auto ret = net.Build();
if (ret != kSuccess) { ... }
Model net;
auto ret = net.Build(net_cell, model_context);
if (ret != kSuccess) { ... }
MSTensor::CreateTensor returns a native pointer now.(!13515:cpp api modify)

MSTensor::CreateTensor and MSTensor::CreateRefTensor returns a native pointer now, need to be destroy by DestroyTensorPtr.

1.1.1 1.2.0
auto tensor = MSTensor::CreateTensor(xxx, xxx, ...);
auto name = tensor.Name();
auto tensor = MSTensor::CreateTensor(xxx, xxx, ...);
auto name = tensor->Name();
MSTensor::DestroyTensorPtr(tensor);

New features

Python API
  • Add SPONGE functions: mindspore.ops.operations.BondForceWithAtomEnergy, mindspore.ops.operations.AngleForceWithAtomEnergy, mindspore.ops.operations.DihedralForceWithAtomEnergy, mindspore.ops.operations.Dihedral14LJCFForceWithAtomEnergy, mindspore.ops.operations.LJForceWithPMEDirectForce, mindspore.ops.operations.PMEExcludedForce, mindspore.ops.operations.PMEReciprocalForce,mindspore.ops.operations.BondEnergy, mindspore.ops.operations.AngleEnergy,mindspore.ops.operations.DihedralEnergy, mindspore.ops.operations.Dihedral14LJEnergy, mindspore.ops.operations.Dihedral14CFEnergy,mindspore.ops.operations.LJEnergy, mindspore.ops.operations.PMEEnergy. All operators are supported in GPU.

Deprecations

Python API
nn.MatMul is now deprecated in favor of ops.matmul (!12817:numpy-native deprecate nn.MatMul)

ops.matmul follows the API of numpy.matmul as closely as possible. As a function interface, ops.matmul is applied without instantiation, as opposed to nn.MatMul, which should only be used as a class instance.

1.1.1 1.2.0
>>> import numpy as np
>>> from mindspore import Tensor, nn
>>>
>>> x = Tensor(np.ones((2, 3)).astype(onp.float32)
>>> y = Tensor(np.ones((3, 4)).astype(onp.float32)
>>> nn.MatMul()(x, y)
>>> import numpy as np
>>> from mindspore import Tensor, ops
>>>
>>> x = Tensor(np.ones((2, 3)).astype(onp.float32)
>>> y = Tensor(np.ones((3, 4)).astype(onp.float32)
>>> ops.matmul(x, y)

Bug fixes

FrontEnd

Executor

Dataset

MindSpore Lite

Major Features and Improvements

Converter and runtime

  1. Support TensorFlow model in Converter except aware-training model.
  2. Add fusion pattern for same horizontal operators in Converter.
  3. Support Jar in x86_64 system for integrating into server with Java backend conveniently.
  4. Provide unified runtime API for developer reusing their code between cloud side and end side.[BETA]
  5. Improve control-flow capabilities continually: Support GRU fusion in Converter; Support weight-quant for control-flow model; Support control-flow model inference with half precision; Support nested control-flow model.[BETA]

ARM backend optimization

  1. Add NLP dependent float16 operators(like lstm) to enhance inference performance.
  2. Optimize operators: lstm, gru, depthwise.
  3. Add 6 NPU operators(like FullConnection), and fix some bugs about buildIR failed.

OpenCL backend

  1. Add new ops:add 10+ ops,total 72 ops;
  2. Performance optimization:by memory layout optimize,block tiling,Performance improved by 30% compared to version 1.1 at Adreno GPU.
  3. Initialization time optimization:initialization time improve 100% vs MSLITE Version1.1 by store kernel cache as binary.
  4. Support Java call on Mali or Adreno GPU.

Post quantization

  1. Support quantization of gather and lstm ops.
  2. Support quantizatizing TF Lite models with sub-graph node.
  3. Add quantiztion strategy to decide quantize ops or not,less accuracy loss and higher compression rate.

Training on Device

  1. Virtual batching, use mini-batch to minic large batch in theorical with few RAM consumption.
  2. Converter unify, do not compile tod and iod converter separately.
  3. Performance optimization to BWD ops.
  4. TrainLoop with Off-The-Shelf Functionality blocks, like LR scheduler, Loss Monitor, Ckpt Saver, Accuracy Monitor.
  5. Integration of code with Minddata lite.
  6. Support more networks (googlenet, densenet, shufflenetv2, nin, vgg) and operators.

Codegen

  1. Support 79 ops for the ARM platform and all CMSIS ops for Arm Cortex-M Series.
  2. Multiplatform support, including Android, IoT Devices.
  3. Support offline model weight preprocessing while compiling.
  4. Support offline memory reuse computing for minimum runtime buffer size.

API Change

API Incompatible Change

C++ API
Add header file named lite_types.h for some common data structs. (!12262:[MS][LITE][Develop]remove cross dependency in inner headers)

Previously, some common data structs such as CpuBindMode and DeviceType are in context.h, this may cause cross-dependency between headers. So we create a new header named lite_types.h for some common data structs and move CpuBindMode and DeviceType from context.h into lite_types.h.

lite_types.h
namespace mindspore::lite {
/// \brief CpuBindMode defined for holding bind cpu strategy argument.
typedef enum {
  NO_BIND,    /**< no bind */
  HIGHER_CPU, /**< bind higher cpu first */
  MID_CPU     /**< bind middle cpu first */
} CpuBindMode;

/// \brief DeviceType defined for holding user's preferred backend.
typedef enum {
  DT_CPU, /**< CPU device type */
  DT_GPU, /**< GPU device type */
  DT_NPU  /**< NPU device type */
} DeviceType;
}  // namespace mindspore::lite
Add some new interfaces in ms_tensor.h for unified runtime API.(!13515:cpp api modify)

Previously, users could not create MSTensor or modify ``MSTensor, all MSTensor are created and managed by framework. However users need to create or modify MSTensor sometimes such as pre-processing input data. So we provide two new interfaces in ms_tensor.h: `CreateTensor` interface for creating `MSTensor` by user and `set_shape` interface for modifying the shape of `MSTensor`.

CreateTensor
/// \brief Create a MSTensor.
///
/// \return Pointer to an instance of MindSpore Lite MSTensor.
static MSTensor *CreateTensor(const std::string &name, TypeId type, const std::vector<int> &shape, const void *data,
                                size_t data_len);
set_shape
/// \brief Set the shape of MSTensor.
virtual void set_shape(const std::vector<int> &shape) = 0;

Previously, users could access to data of MSTensor by interface named MutableData. However MutableData is not only returning data of tensor but also allocating data for tensor if its data is nullptr. So we provide a new interfaces in ms_tensor.h named data for returning data of tensor without allocating automatically.

data
/// \brief Get the pointer of data in MSTensor.
///
/// \note The data pointer can be used to both write and read data in MSTensor. No memory buffer will be
/// allocated.
///
/// \return the pointer points to data in MSTensor.
virtual void *data() = 0;
Delete DimensionSize() in ms_tensor.h.(!13515:cpp api modify)

The interface named DimensionSize is fuinctionally overlapped with the interface named shape. For the simplicity of the interface, we delete DimensionSize and recommend users to use the new interface named shape instead.

DimensionSize()
/// \brief Get size of the dimension of the MindSpore Lite MSTensor index by the parameter index.
///
/// \param[in] index Define index of dimension returned.
///
/// \return Size of dimension of the MindSpore Lite MSTensor.
virtual int DimensionSize(size_t index) const = 0;
Move allocator from namespace mindspore::lite to namespace lite for unified runtime API.(!13515:cpp api modify)

Previously, class Allocator is in namespace mindspore::lite. Considering unified allocator interface for unified runtime API, we move Allocator to namespace mindspore.

1.1.0 1.2.0
namespace mindspore::lite {
/// \brief Allocator defined a memory pool for malloc memory and free memory dynamically.
///
/// \note List public class and interface for reference.
class Allocator;
}
namespace mindspore {
/// \brief Allocator defined a memory pool for malloc memory and free memory dynamically.
///
/// \note List public class and interface for reference.
class Allocator;
}

Bug fixes

  1. Fix the bug that the array in kernel registrar is not initialized.
  2. Fix segment fault caused by releasing of OpParameter in Crop kernel in mistake.
  3. Fix the bug that the MINDIR aware-training model is finally interpreted as weight-quant model.

Contributors

Thanks goes to these wonderful people:

Adel, AGroupofProbiotocs, anthonyaje, anzhengqi, askmiao, baihuawei, baiyangfan, bai-yangfan, bingyaweng, BowenK, buxue, caifubi, CaoJian, caojian05, caozhou, Cathy, changzherui, chenbo116, chenfei, chengxianbin, chenhaozhe, chenjianping, chenzomi, chenzupeng, chujinjin, cj, cjh9368, Corleone, damon0626, danish, Danish, davidmc, dayschan, doitH, dong-li001, eric, Eric, fary86, fuzhiye, Gaoxiong, GAO_HYP_XYJ, gengdongjie, Gogery, gongdaguo, gray0v0, gukecai, guoqi, gzhcv, hangq, hanhuifeng2020, Harshvardhan, He, heleiwang, hexia, Hoai, HuangBingjian, huangdongrun, huanghui, huangxinjing, huqi, huzhifeng, hwjiaorui, Islam Amin, Jesse, , Jiabin Liu, jianghui58, jiangzhiwen, Jiaqi, jin-xiulang, jinyaohui, jjfeing, John, Jonathan, jonyguo, JulyAi, jzg, kai00, kingfo, kingxian, kpy, kswang, laiyongqiang, leonwanghui, Li, liangchenghui, liangzelang, lichen_101010, lichenever, lihongkang, lilei, limingqi107, ling, linqingke, Lin Xh, liubuyu, liuwenhao4, liuxiao78, liuxiao93, liuyang_655, liuzhongkai, Lixia, lixian, liyanliu, liyong, lizhenyu, luopengting, luoyang, lvchangquan, lvliang, lz, mahdi, Mahdi, maning202007, Margaret_wangrui, mayang, mengyuanli, Ming_blue, nhussain, ougongchang, panfengfeng, panyifeng, Payne, Peilin, peixu_ren, Pengyongrong, qianlong, qianjiahong, r1chardf1d0, riemann_penn, rmdyh, Sheng, shenwei41, simson, Simson, Su, sunsuodong, tao_yunhao, tinazhang, VectorSL, , Wan, wandongdong, wangdongxu, wangmin, wangnan39@huawei.com, wangyue01, wangzhe, wanyiming, Wei, wenchunjiang, wilfChen, WilliamLian, wsc, wudenggang, wukesong, wuweikang, wuxuejian, Xiaoda, xiefangqi, xinyunfan, xuanyue, xulei2020, Xun, xuyongfei, yanghaitao, yanghaitao1, yanghaoran, YangLuo, yangruoqi713, yankai, yanzhenxiang2020, yao_yf, yepei6, yeyunpeng, Yi, yoni, yoonlee666, yuchaojie, yujianfeng, yuximiao, zengzitao, Zhang, zhanghaibo5@huawei.com, zhanghuiyao, zhanghui_china, zhangxinfeng3, zhangyihui, zhangz0911gm, zhanke, zhanyuan, zhaodezan, zhaojichen, zhaoting, zhaozhenlong, zhengjun10, zhiqwang, zhoufeng, zhousiyi, zhouyaqiang, zhouyifengCode, Zichun, Zirui, Ziyan, zjun, ZPaC, zymaa.

Contributions of any kind are welcome!

Last committed message: !15320 fix naml compile error
Python
1
https://gitee.com/mindspore/mindspore.git
git@gitee.com:mindspore/mindspore.git
mindspore
mindspore
mindspore

Search