SpatialOps issueshttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues2015-07-02T21:16:29Zhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/1Clean up & properly document device index arguments2015-07-02T21:16:29ZJames SutherlandClean up & properly document device index argumentsThese are passed as integers, but we should have some more robust way to specify them if possible.
The same goes for ExprLIb interfaces (see James_Research_Group/ExprLib#4).These are passed as integers, but we should have some more robust way to specify them if possible.
The same goes for ExprLIb interfaces (see James_Research_Group/ExprLib#4).https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/2Invalidating ghost cells in Nebo2018-02-25T20:43:49ZJames SutherlandInvalidating ghost cells in NeboWe planned and designed Nebo to invalidate ghost cells that it cannot populate with valid results because of stencil operations. However, invalidating these ghost cells breaks regression tests for pretty much any test that uses stencils....We planned and designed Nebo to invalidate ghost cells that it cannot populate with valid results because of stencil operations. However, invalidating these ghost cells breaks regression tests for pretty much any test that uses stencils.
Thus Nebo currently does NOT invalidate ghost cells.
We need to change this and update all tests that fail as a result.
This is currently implemented on the `invalid-ghost` branch, but needs cleanup & merge.
See also #7 , which is closely related to this issue.https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/3Fix bug in using threads and GPU Nebo backends the same time2018-02-25T20:43:48ZJames SutherlandFix bug in using threads and GPU Nebo backends the same timeFirst reported by Chris Earl in May, 2014
This bug only appears on certain systems (prism and a few laptops). To reproduce the bug, set `ENABLE_THREADS=ON` and `ENABLE_CUDA=ON` during configuration.
Example errors:
```
../libspa...First reported by Chris Earl in May, 2014
This bug only appears on certain systems (prism and a few laptops). To reproduce the bug, set `ENABLE_THREADS=ON` and `ENABLE_CUDA=ON` during configuration.
Example errors:
```
../libspatialops-structured.a(spatialops-structured_generated_CudaMemoryAllocator.cu.o): In function `_GLOBAL__sub_I_tmpxft_000016bb_00000000_3_CudaMemoryAllocator.cudafe1.cpp':
tmpxft_000016bb_00000000-3_CudaMemoryAllocator.cudafe1.cpp:(.text.startup+0x6b): undefined reference to `boost::system::generic_category()'
tmpxft_000016bb_00000000-3_CudaMemoryAllocator.cudafe1.cpp:(.text.startup+0x77): undefined reference to `boost::system::generic_category()'
tmpxft_000016bb_00000000-3_CudaMemoryAllocator.cudafe1.cpp:(.text.startup+0x83): undefined reference to `boost::system::system_category()'
collect2: error: ld returned 1 exit status
```
These errors imply there is a problem with how boost and CudaMemoryAllocator.cu interact.https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/4Do not allow field access outside memory window2015-07-02T21:21:18ZJames SutherlandDo not allow field access outside memory windowhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/5remove = assignment operator on SpatialField2015-07-02T21:21:48ZJames Sutherlandremove = assignment operator on SpatialFieldremove = assignment operator on SpatialFieldremove = assignment operator on SpatialFieldhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/6Particle interpolant operators in Nebo2015-07-02T21:22:45ZJames SutherlandParticle interpolant operators in NeboWe need nebo support for particle interpolant operators:
particle -> cell interpolation
cell -> particle interpolation
There is an existing implementation at `spatialops/particles/ParticleOperators.h`.
Note that this is being...We need nebo support for particle interpolant operators:
particle -> cell interpolation
cell -> particle interpolation
There is an existing implementation at `spatialops/particles/ParticleOperators.h`.
Note that this is being used currently in ODT and Wasatch.https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/7Varying Number of Ghost cells2017-07-18T15:27:53ZJames SutherlandVarying Number of Ghost cellsThis task is "in progress" and there are a few things that remain to be done before it is complete:
- [x] Regression testing to ensure that this is functioning as expected in SpatialOps.
- [ ] Wasatch must handle extra cells vs. ghost ...This task is "in progress" and there are a few things that remain to be done before it is complete:
- [x] Regression testing to ensure that this is functioning as expected in SpatialOps.
- [ ] Wasatch must handle extra cells vs. ghost cells properly. Extra cells are always whatever the component sets.
Ghost cells can be variable by task.
- [ ] LBMS pack and unpack functions need ghost refactor attentionJames SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/8Field reductions are slow on GPU2018-02-25T20:43:48ZJames SutherlandField reductions are slow on GPUNathan indicates that GPU performance of field reductions is very poor (possibly slower than a transfer to CPU and back):
It is giving the correct answer. However, it was slower than copying to the cpu and then doing the reduction the...Nathan indicates that GPU performance of field reductions is very poor (possibly slower than a transfer to CPU and back):
It is giving the correct answer. However, it was slower than copying to the cpu and then doing the reduction there. We should merge it for testing and verification, but it isn't ready for practical applications yet.
Here is an online tutorial on some reduction techniques: [reduction.pdf](https://software.crsim.utah.edu:8443/James_Research_Group/SpatialOps/uploads/f2c02ef24e5e5f2d072531f9f817ee06/reduction.pdf)
[And another link here](http://devblogs.nvidia.com/parallelforall/faster-parallel-reductions-kepler)
Note that Hao implemented some of this on the gpu-reductions branch, but this involved some additional syntax. He never saw this through...James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/9Finish up stencil convention changes2018-02-25T20:43:49ZJames SutherlandFinish up stencil convention changesChris had been working on this prior to his departure.
It changes convention on how we specify extents/offsets for creating stencils
There is a branch `new-stencil-convention` that implements this, but it needs to be tested against all d...Chris had been working on this prior to his departure.
It changes convention on how we specify extents/offsets for creating stencils
There is a branch `new-stencil-convention` that implements this, but it needs to be tested against all downstream apps prior to merging.
Two things to be done here:
- [ ] Document the changes in Doxygen
- [ ] Test downstream apps (coordinate with app owner listed below)
- [ ] ExprLib (James)
- [ ] PoKiTT (Nathan)
- [ ] ODT (James or Josh)
- [ ] LBMS (James or Derek)
- [ ] Wasatch (Tony)
Basic workflow (apply for each downstream project):
1. Build new project that uses SpatialOps with master branch. Run tests - everything should pass.
1. Build new project with new-stencil-convention branch of SpatialOps. Run tests - not everything will pass.
1. Discuss failing tests with developers of that project. Help them fix failing tests.
1. Repeat with a new project.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/10Tiling for improved performance?2016-04-29T21:09:41ZJames SutherlandTiling for improved performance?Rather than our current memory decomposition, we could interleave thread access to memory. This may result in reduced memory contention for reads from main memory when performing stencil operations.
Tiling may also improve serial per...Rather than our current memory decomposition, we could interleave thread access to memory. This may result in reduced memory contention for reads from main memory when performing stencil operations.
Tiling may also improve serial performance.
We may be able to accomplish this through clever use of MemoryWindows.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/11Nebo Marks: slicing arrays2018-02-25T20:43:48ZJames SutherlandNebo Marks: slicing arraysChris implemented most of the internals in the master branch
Needs more testing and hardening, along with API implementation
There are really two parts to this:
1. "Marks" [PlannedNeboFeatures-Marks.pdf](https://software.crsim.ut...Chris implemented most of the internals in the master branch
Needs more testing and hardening, along with API implementation
There are really two parts to this:
1. "Marks" [PlannedNeboFeatures-Marks.pdf](https://software.crsim.utah.edu:8443/James_Research_Group/SpatialOps/uploads/d6456b4c7b4400ae8a2068946a6d90f2/PlannedNeboFeatures-Marks.pdf)
1. "Slices" [PlannedNeboFeatures-slices.pdf](https://software.crsim.utah.edu:8443/James_Research_Group/SpatialOps/uploads/05e4ee6ec3bcdcc5b432fbce05e0ad10/PlannedNeboFeatures-slices.pdf)
James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/12Support dense linear algebra2018-02-25T20:43:48ZJames SutherlandSupport dense linear algebra# Overall Goal
We need to support something like this:
```cpp
Matrix<FieldT> mat;
// matrix assembly:
for( size_t irow=0; irow<nrow; ++irow ){
for( size_t icol=0; icol<ncol; ++icol ){
mat[irow][icol] <<= ...
}
}
/...# Overall Goal
We need to support something like this:
```cpp
Matrix<FieldT> mat;
// matrix assembly:
for( size_t irow=0; irow<nrow; ++irow ){
for( size_t icol=0; icol<ncol; ++icol ){
mat[irow][icol] <<= ...
}
}
// solve pointwise Ax=b problem
// and store the result in a field:
solution <<= mat.solve( rhs );
// alternatively: mat.solve( rhs, solution );
// eigenvalue decomposition
vector<FieldT*> eigVals;
mat.eigenvalues( eigVals );
```
This should dispatch to GPU or CPU as appropriate (similar to what nebo currently does for field operations).
-------
# Milestones/SubTasks
- [ ] regression testing
- [ ] verify consistency of row and column indexes in all class members
- [ ] add support for eigenvalues
- [ ] use an actively developed library; uBlas was easy to add, but is 7 years old
- [ ] improve performance by removing unnecessary data transfers and function calls
- [ ] support parallel CPU execution
- [ ] support GPU execution
James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/13Fix thread bug in Wasatch (happens when SpatialOps is compiled with ENABLE_TH...2018-02-25T20:43:49ZJames SutherlandFix thread bug in Wasatch (happens when SpatialOps is compiled with ENABLE_THREADS=ON)When SpatialOps is compiled with `ENABLE_THREADS=ON`, Wasatch local regression tests fail and/or crash, sometimes. I have observed three types types of failures:
Exception is thrown, claiming some block of memory has been freed twice...When SpatialOps is compiled with `ENABLE_THREADS=ON`, Wasatch local regression tests fail and/or crash, sometimes. I have observed three types types of failures:
Exception is thrown, claiming some block of memory has been freed twice (double free). This exception has appeared in the following tests (not an exhaustive list):
- turb-lid-driven-cavity-3D-WALE
- turb-lid-driven-cavity-3D-SMAGPRINSKY
- turb-lid-driven-cavity-3D-VREMAN
- turb-lid-driven-cavity-3D-scalar
- coal-boiler-mini
- intrusion_flow_past_cylinder_xz
- intrusion_flow_past_cylinder_xy
- turbulent-inlet-test-xminus
- intrusion_flow_past_objects_xy
- intrusion_flow_past_oscillating_cylinder_xy
- intrusion_flow_past_cylinder_yz
- channel-flow-xy-xplus-pressure-outlet
- intrusion_flow_over_icse
- turbulent-flow-over-cavity
- channel-flow-zy-yplus-pressure-outlet
- channel-flow-yz-yminus-pressure-outlet
- lid-driven-cavity-3D-Re1000
- channel-flow-xy-xminus-pressure-outlet
- lid-driven-cavity-3D-Re1000-rk2
- channel-flow-zx-zplus-pressure-outlet
- channel-flow-symmetry-bc
- liddrivencavity3DRe1000rk3 (sic)
- lid-driven-cavity-xy-Re1000
- lid-driven-cavity-yz-Re1000
- hydrostatic-pressure-test
- lid-driven-cavity-xz-Re1000
- channel-flow-xz-zminus-pressure-outlet
- reduction-test
- lid-drive-cavity-xy-Re1000-adaptive (sic)
- convection-test-svol-ydir-bc
- convection-test-svol-zdir-bc
- bc-parabolic-inlet-channel-flow-test
- bc-linear-inlet-channel-flow-test
- bc-test-svol-zdir
Test hangs (test that usually takes ❤ seconds takes longer than a minute). This behavior has appeared in the following tests:
- varden-projection-mms
- varden-projection-xdir
- varden-projection-ydir
- varden-projection-zdir
- varden-projection-xdir-analytic-dens
- qmom-aggregation-test
- Test fails within testing framework with error code 3384. I do not know what this error code means. This behavior has appeared in the following tests:
- bc-test-svol-xdir
- bc-test-svol-ydir
- convection-test-svol-xdir-bc
Do not take these lists as exhaustive. Since these behaviors generally seem intermittent (I think the test hanging was consistent, but I do not remember at the moment), it is hard to tell exactly what is going on. Also, once a test failed in any way, I removed it from the list of tests I was running. In theory, a test could fail in multiple ways, but I have not seen that behavior.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/14Document mask conversion2015-07-07T22:14:58ZJames SutherlandDocument mask conversionMasks are used in BC application. We have the ability to create a mask for one field type and convert it for usage with another field type.
These need to be documented in our doxygen docs.Masks are used in BC application. We have the ability to create a mask for one field type and convert it for usage with another field type.
These need to be documented in our doxygen docs.https://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/15Consider using boost::atomic or boost::lockfree for multithreaded atomic oper...2018-02-25T20:43:48ZJames SutherlandConsider using boost::atomic or boost::lockfree for multithreaded atomic operationsC++11 provides language-level support for this, but [boost::atomic](http://www.boost.org/doc/libs/1_58_0/doc/html/atomic.html) provides a portable way to accomplish this. Doing this could reduce our usage of mutex in a few places (memor...C++11 provides language-level support for this, but [boost::atomic](http://www.boost.org/doc/libs/1_58_0/doc/html/atomic.html) provides a portable way to accomplish this. Doing this could reduce our usage of mutex in a few places (memory pool, for example).
Specifically, look at [spinlock](http://www.boost.org/doc/libs/1_58_0/doc/html/atomic/usage_examples.html#boost_atomic.usage_examples.example_spinlock), which should be a simple replacement for mutex.
Also look at [boost::lockfree](http://www.boost.org/doc/libs/1_58_0/doc/html/lockfree.html). This could be useful for memory pools as well, since it implements a [lock-free queue](http://www.boost.org/doc/libs/1_58_0/doc/html/boost/lockfree/queue.html) and [lock-free stack](http://www.boost.org/doc/libs/1_58_0/doc/html/boost/lockfree/stack.html).James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/16Introspect core count in SpatialOps2018-02-25T20:43:48ZJames SutherlandIntrospect core count in SpatialOps# Compile-time introspection:
CMake provides a way to [determine processor counts](http://www.cmake.org/cmake/help/v3.0/module/ProcessorCount.html). See also [this blog post](http://www.kitware.com/blog/home/post/63).
We could levera...# Compile-time introspection:
CMake provides a way to [determine processor counts](http://www.cmake.org/cmake/help/v3.0/module/ProcessorCount.html). See also [this blog post](http://www.kitware.com/blog/home/post/63).
We could leverage this to help auto-populate the number of threads for SpatialOps. This could, in turn, be used in ExprLib.
# Runtime introspection
Several approaches are given [here](http://stackoverflow.com/questions/150355/programmatically-find-the-number-of-cores-on-a-machine).
# Other considerations
Once the threadcommunicator branch is merged, we have a few things to note:
- The number of threads in ExprLib and SpatialOps are multiplicative, and should never exceed the physical core count on the machine.
- The core count per socket should be divisible by the SpatialOps thread count.
- Thread count should generally not exceed the number of cores per socket if ExprLib is built on top of SpatialOps.
*Note also that execution will halt in the threadcommunicator branch if the number of threads exceeds the number of cores. This could be fixed if we can guarantee that the threadpool is not sized to exceed the physical core count.*James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/17Enable position-independent code flag to be set in SpatialOps2015-07-21T22:17:03ZJames SutherlandEnable position-independent code flag to be set in SpatialOpsCMake has a [portable way to set this](http://www.cmake.org/cmake/help/v3.0/prop_tgt/POSITION_INDEPENDENT_CODE.html#prop_tgt:POSITION_INDEPENDENT_CODE).
We should do this for all of the Wasatch3P libraries. It may be as simple as:
`...CMake has a [portable way to set this](http://www.cmake.org/cmake/help/v3.0/prop_tgt/POSITION_INDEPENDENT_CODE.html#prop_tgt:POSITION_INDEPENDENT_CODE).
We should do this for all of the Wasatch3P libraries. It may be as simple as:
```sh
-DCMAKE_POSITION_INDEPENDENT_CODE=ON
```
which could be set in the Uintah build script.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/21std::isnan is problematic for some NVCC versions2018-02-25T20:43:49ZJames Sutherlandstd::isnan is problematic for some NVCC versions## Problem description
In [FieldComparisons.h](spatialops/structured/FieldComparisons.h), we check for NaN in equality comparison. Nathan wanted this for better behavior.
However, it appears that some versions of NVCC do not suppor...## Problem description
In [FieldComparisons.h](spatialops/structured/FieldComparisons.h), we check for NaN in equality comparison. Nathan wanted this for better behavior.
However, it appears that some versions of NVCC do not support this. Notably, prism fails to compile ExprLib when CUDA builds are active.
| Machine | nvcc Version | Comments |
| :-----: | :----------: | :------: |
| prism | 6.0.1 | fails to compile std::isnan |
| aurora | 6.5.12 | compiles without problem |
## Sample compiler error
Here is a sample compiler error (from building ExprLib on prism):
```
/scratch/local/prism_fast/jcs/ExprLib/buildCuda/so/include/spatialops/structured/FieldComparisons.h(156): error: expected an identifier
detected during instantiation of "__nv_bool SpatialOps::field_equal(const FieldT &, const FieldT &, double) [with FieldT=FieldT]"
/scratch/local/prism_fast/jcs/ExprLib/buildCuda/test/FieldMgr/main.cpp.cu(40): here
```
## Possible workaround
One possible solution is to pull in the NVCC compiler version when compiling SpatialOps and then only performing the `isnan` checks if the compiler version is high enough.James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/24Support for GPU particle interpolants2018-02-25T20:43:49ZTony SaadSupport for GPU particle interpolantsAttached is a cuda file that Sahana developed to implement the apply_to_field particle interpolants on the GPU. However, Sahana was unable to incorporate this into the SpatialOps build system.
[ParticleOperators_gpu.cu](https://softwa...Attached is a cuda file that Sahana developed to implement the apply_to_field particle interpolants on the GPU. However, Sahana was unable to incorporate this into the SpatialOps build system.
[ParticleOperators_gpu.cu](https://software.crsim.utah.edu:8443/James_Research_Group/SpatialOps/uploads/8c1ba768524db9a71690e5bbdb94cfcb/ParticleOperators_gpu.cu)
James SutherlandJames Sutherlandhttps://gitlab.multiscale.utah.edu/common/SpatialOps/-/issues/31add_consumer on BitField and SpatialMask should be replaced with add_device t...2016-04-29T21:09:40ZJames Sutherlandadd_consumer on BitField and SpatialMask should be replaced with add_device to be consistent with SpatialFieldJames SutherlandJames Sutherland